2011-08-18

再帰関数(ハノイの塔)にinline宣言をつけたら・・・

sbcl speed optimize

Forthでハノイの塔 - sileの日記でハノイの塔を解くcommon lispプログラムを載せたら「inline宣言を付けるともっと早くなりますよ」というコメントを頂いたので試してみた。

inline宣言付与結果

再帰関数をinline展開する際の深さはsb-ext:*inline-expansion-limit*変数で制御できるらしい。
今回はデフォルトの設定値をそのまま利用した。

;; sbcl-1.0.49(linux/64bit)
> sb-ext:*inline-expansion-limit*
--> 200 

;; inline宣言
(declaim (inline hanoi-impl))

;; 前回定義したhanoi-impl関数とhanoi関数を再コンパイル 
;; (*inline-expansion-limit*を越えました、という長いメッセージが出力される)
;; ... 略 ...

;; 実行
> (time (hanoi 'a 'b 'c 25))
Evaluation took:
  0.271 seconds of real time   ; 約 2.7 秒
  0.270000 seconds of total run time (0.270000 user, 0.000000 system)
  99.63% CPU
  541,031,826 processor cycles
  0 bytes consed
  
--> (:COUNT 33554431)

確かに速くなっている。

ついでにhanoi関数にもinline宣言をつけてみた。

;; inline宣言
(declaim (inline hanoi))

;; hanoi関数を再コンパイル 
;; ... 略 ...

;; 実行
> (time (hanoi 'a 'b 'c 25))   ; 実行時にinline展開される
;; (*inline-expansion-limit*を越えました、という長いメッセージが出力される)
Evaluation took:
  0.169 seconds of real time  ; 約 0.17 秒
  0.170000 seconds of total run time (0.170000 user, 0.000000 system)
  100.59% CPU
  337,858,167 processor cycles
  0 bytes consed
  
--> (:COUNT 33554431)

hanoi関数を実行時に展開するようにしたら、C++と同じくらいに高速になった。

ただsbclのtime関数はinline関数の展開時間は所要時間に含めていないようで、少し不公平な感があるので、別の方法でも図ってみた。

;; 実行時のinline展開時間も含めて図ってみる
> (get-internal-real-time) (hanoi 'a 'b 'c 25) (get-internal-real-time) (- * ***)
--> 304455 ; 開始時間
--> (:COUNT 33554431)
--> 304730 ; 終了時間
--> 275    ; 終了 - 開始 = 0.275 秒

展開時間も含めると、特別速いということはなさそう。
それでもinline宣言を全く使用しない場合に比べるとだいぶ良い。

表

各言語でのハノイの塔の処理時間をまとめておく。

C++ gcc	Forth gforth	Forth VFX-Forth	common lisp sbcl(宣言無)	common lisp sbcl(hanoi-implをinline宣言)	common lisp sbcl(hanoiとhanoi-implをinline宣言)
0.168s	3.359s	0.190s	0.403s	0.271s	0.169s (0.275s*1 )

教訓

再帰関数でもinline展開すると速くなることがある。
覚えておこう。

*1:実行時のinline展開処理も含めて計測した場合の数値

2011-08-16

Forthでハノイの塔

forth common lisp speed

Forthを触ってみたくなったので、試しにハノイの塔を実装してみた。
ついでにcommon lispとC++でも実装し、Forthとの処理速度を比較してみた。
※ common lispの処理系にはSBCLを、Forthの処理系にはgforth及びVFX-Forthを使用した

Forthでの実装(gforth)

gforthはバイトコードタイプのForth実装。

\ gforth-0.7.0 (linux/64bit)
\ Forth版のハノイの塔

variable cnt             \ 再帰呼び出しの数をカウント用の変数

: hanoi-print ( to from -- to from )
    cr 2dup . ." -> " . ;

: hanoi-impl ( to tmp from level -- to tmp from )
    dup >r 0> if
        rot swap r@ 1- recurse                   ( tmp to from )
        cnt @ 1+ cnt ! 
        ( hanoi-print 計時用にコメントアウト )   ( tmp to from )
        rot r@ 1- recurse                        ( to from tmp )
        swap                                     ( to tmp from )
    then
    r> drop ;

: hanoi ( to tmp from level -- )
    0 cnt !                      \ カウント初期化
    hanoi-impl drop drop drop
    cr ." count: " cnt @ . cr ;  \ 再帰数表示

hanoi-printワードをコメントアウトしなかった場合の実行結果。

$ gforth      # shellからgforthコマンドを起動
1 2 3 3 hanoi 
3 -> 1 
3 -> 2 
1 -> 2 
3 -> 1 
2 -> 3 
2 -> 1 
3 -> 1 
count: 7 
 ok

gforthでの計時。

$ gforth-fast   # 高速版のコマンドを使用する

\ 計時用のワードを定義
: time ( word_pointer -- )
    utime drop >r
    execute
    utime drop r> - 1000 / ." elapsed " . ." ms " cr ;

\ 計時
1 2 3 25 ' hanoi time
count: 33554431 
elapsed 3359 ms  \ 結果: 3.34秒
 ok

Forthでの実装(VFX-Forth)

VFX-ForthはネイティブコードタイプのForth実装(? 不確か)。
ワードの定義はgforthのそれと同様なので計時部分だけ掲載。

# VFX-Forth-4.43 (linux/32bit ?)
$ vfxlin

\ 計時用のワードを定義
: time ( word_pointer -- )
    ticks >r
    execute
    ticks r> - ." elapsed " . ." ms " cr ;

\ 計時
1 2 3 25 ' hanoi time
count: 33554431 
elapsed 190 ms   \ 結果: 0.19秒
 ok

common lispでの実装(SBCL)

実装及び計時結果。

;;;; sbcl-1.0.49 (linux/64bit)

;; 関数定義
(defvar *count*)

(defun hanoi-impl (from tmp to level)
  (declare (fixnum level)
           (optimize (speed 3) (safety 0) (debug 0))
           (sb-ext:unmuffle-conditions sb-ext:compiler-note))
  (when (plusp level)
    (hanoi-impl from to tmp (1- level))
    (incf (the fixnum *count*))     ; (format t "~&~a => ~a~%" from to)
    (hanoi-impl tmp from to (1- level))))

(defun hanoi (from tmp to level)
  (let ((*count* 0))
    (hanoi-impl from tmp to level)
    `(:count ,*count*)))

;; 計時
(time (hanoi 'a 'b 'c 25))

Evaluation took:
  0.403 seconds of real time   ; 結果: 0.40秒
  0.390000 seconds of total run time (0.390000 user, 0.000000 system)
  96.77% CPU
  804,338,623 processor cycles
  0 bytes consed
  
(:COUNT 33554431)

C++での実装(GCC)

ソースコード。

/*  ファイル名: hanoi.cc
 *  コンパイル: g++ -O3 -o hanoi hanoi.cc
 *  バージョン: gcc-4.4.3 (linux/64bit)
 */
#include <iostream>

int count;

void hanoi(int from, int tmp, int to, int level) {
  if(level > 0) {
    hanoi(from, to, tmp, level-1);
    count++;
    hanoi(tmp, from, to, level-1);
  }
}

int main() {
  count=0;
  hanoi(1, 2, 3, 25);
  std::cout << "count: " << count << std::endl;
  return 0;
}

コンパイル & 実行。

$ g++ -O3 -o hanoi hanoi.cc
$ time ./hanoi 
count: 33554431

real	0m0.168s  # 結果: 0.17秒
user	0m0.170s
sys	0m0.000s

ハノイの塔での処理速度比較した結果は、C++(0.17秒)、Forth-VFX(0.20秒)、lisp(0.40秒)、Forth-gforth(3.34秒)、の順となった。
Forthも処理系によっては、C++と同程度の速度がでるみたい。※ 雑な比較なので正確なところは分からないけど・・・

おまけ: 末尾再帰 => ループ変換版

末尾再帰部分を手動でループに変換したソースコードも書いてみたので、載せておく。

variable cnt 

: 3dup ( a b c -- a b c a b c )
    dup 2over rot ;

: hanoi-impl ( to tmp from level -- )
    begin dup >r 0> while
            rot swap 3dup r@ 1- recurse ( tmp to from )
            cnt @ 1+ cnt !
            rot r> 1-                   ( to from tmp )
    repeat
    r> drop drop drop drop ;

: hanoi ( to tmp from level -- )
    0 cnt !                      
    hanoi-impl
    cr ." count: " cnt @ . cr ;

この変換によって、gforthの場合は処理時間が短く(2.32秒)なったけど、VFX-Forthでは逆に長く(0.35秒)なってしまった。

2011-06-19

Igo-0.4.3の辞書引きにDAWGを試す

java library speed

Igo-0.4.3: 若干のパフォーマンス向上 - sileの日記の続き。
Gomoku(0.0.4)では辞書引き部分*1にDAWGを使っているので、それもIgoに取り込んで処理速度の変化を図ってみた。

比較

諸々の条件は前回と同様。
今回は新たにIgoのDAWG版が加わる。

	総処理時間(1)	起動+行読み込み+辞書ロード時間(2)	解析時間(1 - 2)
MeCab(0.98)	15.381s	0.101s	15.280s
Gomoku(0.0.4)	16.458s	0.565s	15.893s
Igo(0.4.2)	20.351s	0.638s	19.713s
Igo(0.4.3)	18.792s	0.638s	18.154s
Igo-DAWG(0.4.3)	17.168s	0.638s	16.529s

DAWGにすると処理速度がだいぶGomokuに近づく*2。
ただこの変更はバイナリ辞書のフォーマットが変わってしまうので、少なくとも当面は正式なリリースとして公開することはしない。

DAWG版作成方法

手順のメモ。
基本的には、従来使っていた辞書引きクラスをdic-androidにあるDAWGのそれに置き換えるだけ。

# 1: DAWG版へのパッチ適用
$ tar zxvf igo-0.4.3-src.tar.gz
$ patch -p0 -E < igo-0.4.3-dawg.patch  # igo-0.4.3-dawg.patch に関しては末尾を参照

# 2: ソースビルド
$ cd igo-0.4.3-src
$ ant

# 3: 辞書構築
#    - 単語名のリストが標準出力に出力されるので、それを保存しておく
$ java -cp igo-0.4.3.jar net.reduls.igo.bin.BuildDic バイナリ辞書 テキスト辞書 > word.list

# 4: 辞書引き用のインデックスを作成する
# 4-1: dic-androidのソースを取得
$ wget https://download.github.com/sile-dic-android-v0.0.1-0-ga45cd32.tar.gz
$ tar zxvf sile-dic-android-v0.0.1-0-ga45cd32.tar.gz

# 4-2: インデックス構築
$ cd sile-dic-android-a45cd32/dicbuilder/
$ sbcl   # sbcl起動 ※common-lisp処理系
> (require :asdf)
> (asdf:load-system :dic)
  ;; 単語リスト読み込み
> (defvar *words* (with-open-file (in "/path/to/word.list") 
                    (loop FOR line = (read-line in nil nil) 
                          WHILE line 
                          COLLECT line)))
  ;; DAWGインデックス作成: Igoバイナリ辞書ディレクトリに追加
> (dic.trie.double-array:build *words* "バイナリ辞書/")  
> (quit)

# 終了

パッチ

ファイル名はigo-0.4.3-dawg.patch:

diff -rcN igo-0.4.3-src-orig/src/net/reduls/igo/dictionary/WordDic.java igo-0.4.3-src/src/net/reduls/igo/dictionary/WordDic.java
*** igo-0.4.3-src-orig/src/net/reduls/igo/dictionary/WordDic.java	2011-06-17 17:24:58.000000000 -0700
--- igo-0.4.3-src/src/net/reduls/igo/dictionary/WordDic.java	2011-06-18 12:15:57.167778148 -0700
***************
*** 3,12 ****
  import java.io.IOException;
  import java.util.List;
  import net.reduls.igo.trie.Searcher;
  import net.reduls.igo.util.FileMappedInputStream;
  
  public final class WordDic {
!     private final Searcher trie;
      private final String   data;
      private final int[]    indices;
  
--- 3,13 ----
  import java.io.IOException;
  import java.util.List;
  import net.reduls.igo.trie.Searcher;
+ import net.reduls.igo.trie.DawgSearcher;
  import net.reduls.igo.util.FileMappedInputStream;
  
  public final class WordDic {
!     private final DawgSearcher trie;
      private final String   data;
      private final int[]    indices;
  
***************
*** 16,22 ****
      public final int[]   dataOffsets; // dataOffsets[単語ID] = 単語の素性データの開始位置
  
      public WordDic(String dataDir) throws IOException {
! 	trie    = new Searcher(dataDir+"/word2id");
  	data    = FileMappedInputStream.getString(dataDir+"/word.dat");
  	indices = FileMappedInputStream.getIntArray(dataDir+"/word.ary.idx");
  	
--- 17,23 ----
      public final int[]   dataOffsets; // dataOffsets[単語ID] = 単語の素性データの開始位置
  
      public WordDic(String dataDir) throws IOException {
! 	trie    = new DawgSearcher(dataDir);
  	data    = FileMappedInputStream.getString(dataDir+"/word.dat");
  	indices = FileMappedInputStream.getIntArray(dataDir+"/word.ary.idx");
  	
diff -rcN igo-0.4.3-src-orig/src/net/reduls/igo/dictionary/build/WordDic.java igo-0.4.3-src/src/net/reduls/igo/dictionary/build/WordDic.java
*** igo-0.4.3-src-orig/src/net/reduls/igo/dictionary/build/WordDic.java	2010-03-29 02:35:02.000000000 -0700
--- igo-0.4.3-src/src/net/reduls/igo/dictionary/build/WordDic.java	2011-06-18 12:15:57.167778148 -0700
***************
*** 50,56 ****
  	// 単語辞書からキーを集める
  	for(File csvFile : new File(inputDir).listFiles(new onlyCsv()))
  	    collectKey(csvFile.getPath(), keyList, "");
! 	
  	final Builder bld = Builder.build(keyList);
  	bld.save(outputDir+"/word2id");
      }
--- 50,56 ----
  	// 単語辞書からキーを集める
  	for(File csvFile : new File(inputDir).listFiles(new onlyCsv()))
  	    collectKey(csvFile.getPath(), keyList, "");
!         
  	final Builder bld = Builder.build(keyList);
  	bld.save(outputDir+"/word2id");
      }
diff -rcN igo-0.4.3-src-orig/src/net/reduls/igo/trie/Builder.java igo-0.4.3-src/src/net/reduls/igo/trie/Builder.java
*** igo-0.4.3-src-orig/src/net/reduls/igo/trie/Builder.java	2010-03-29 02:35:02.000000000 -0700
--- igo-0.4.3-src/src/net/reduls/igo/trie/Builder.java	2011-06-18 12:15:57.167778148 -0700
***************
*** 23,30 ****
  	java.util.Collections.sort(keyList);
  	String prev=null;
  	for(String k : keyList)
! 	    if(k.equals(prev)==false)
! 		ksList.add(new KeyStream(prev=k));
      }
      
      /**
--- 23,33 ----
  	java.util.Collections.sort(keyList);
  	String prev=null;
  	for(String k : keyList)
! 	    if(k.equals(prev)==false) {
!                 System.out.println(k);
!                 ksList.add(new KeyStream(prev=k));
!             }
!         
      }
      
      /**
diff -rcN igo-0.4.3-src-orig/src/net/reduls/igo/trie/Char.java igo-0.4.3-src/src/net/reduls/igo/trie/Char.java
*** igo-0.4.3-src-orig/src/net/reduls/igo/trie/Char.java	1969-12-31 16:00:00.000000000 -0800
--- igo-0.4.3-src/src/net/reduls/igo/trie/Char.java	2011-06-18 12:15:57.167778148 -0700
***************
*** 0 ****
--- 1,39 ----
+ package net.reduls.igo.trie;
+ 
+ import java.io.DataInputStream;
+ import java.io.FileInputStream;
+ import java.io.IOException;
+ import java.util.List;
+ import java.util.ArrayList; 
+ 
+ public final class Char {
+     private final char[] charCode;
+     private final List<Character> arcs;
+     
+     public Char(String dictionaryDirectory) throws IOException {
+         final DataInputStream in =
+             new DataInputStream(new FileInputStream(dictionaryDirectory+"/code-map.bin"));
+         try {
+             final int codeLimit = in.readInt();
+             charCode = new char[codeLimit];
+             
+             arcs = new ArrayList<Character>();
+ 
+             for(int i=0; i < codeLimit; i++) {
+                 charCode[i] = in.readChar();
+                 if(charCode[i] != 0)
+                     arcs.add(charCode[i]);
+             }
+         } finally  {
+             in.close();
+         }
+     }
+ 
+     public char code(char ch) {
+         return charCode[ch];
+     }
+ 
+     public List<Character> arcs() {
+         return arcs;
+     }
+ }
diff -rcN igo-0.4.3-src-orig/src/net/reduls/igo/trie/DawgSearcher.java igo-0.4.3-src/src/net/reduls/igo/trie/DawgSearcher.java
*** igo-0.4.3-src-orig/src/net/reduls/igo/trie/DawgSearcher.java	1969-12-31 16:00:00.000000000 -0800
--- igo-0.4.3-src/src/net/reduls/igo/trie/DawgSearcher.java	2011-06-18 12:15:57.167778148 -0700
***************
*** 0 ****
--- 1,70 ----
+ package net.reduls.igo.trie;
+ 
+ import java.io.FileInputStream;
+ import java.io.IOException;
+ import java.nio.ByteBuffer;
+ import java.nio.channels.FileChannel;
+ 
+ 
+ /**
+  * DoubleArray検索用のクラス
+  */
+ public final class DawgSearcher {
+     private final long[] nodes;
+     private final Char charcode;
+ 
+     public DawgSearcher(String dictionaryDirectory) throws IOException {
+         final FileChannel cnl = 
+             new FileInputStream(dictionaryDirectory+"/surface-id.bin").getChannel();
+         try {
+             final ByteBuffer buf = cnl.map(FileChannel.MapMode.READ_ONLY, 0, cnl.size());
+             final int nodeCount = buf.getInt();
+             nodes = new long[nodeCount];
+             buf.asLongBuffer().get(nodes);
+         } finally {
+             cnl.close();
+         }
+ 
+         charcode = new Char(dictionaryDirectory);
+     }
+ 
+     public void eachCommonPrefix(CharSequence text, int start, Searcher.Callback fn) {
+         int node = 0;
+         int id = -1;
+         
+         for(int i=start;; i++) {
+             if(isTerminal(node)) 
+                 fn.call(start, i-start, id);
+             
+             if(i==text.length())
+                 return;
+             
+             final char arc = charcode.code(text.charAt(i));
+             final int next = base(node)+arc;
+             if(chck(next) != arc)
+                 return;
+             node = next;
+             id = nextId(id,node);
+         }
+     }  
+ 
+     private char chck(int node) {
+         return (char)((nodes[node]>>24) & 0xFFFF);
+     }
+     
+     private int base(int node) {
+         return (int)(nodes[node] & 0xFFFFFF);
+     }
+ 
+     private boolean isTerminal(int node) {
+         return ((nodes[node]>>40) & 0x1) == 0x1;
+     }
+ 
+     private int siblingTotal(int node) {
+         return (int)(nodes[node]>>41);
+     }  
+ 
+     private int nextId(int id, int node) {
+         return id + siblingTotal(node) + (isTerminal(node) ? 1 : 0);
+     }
+ }

*1:単語名(表層形)からIDを取得する部分

*2:ついでにバイナリ辞書のサイズも3MB程度減る

2011-06-18

Igo-0.4.3: 若干のパフォーマンス向上

Java library speed

Gomoku(0.0.4)で得た知見の一部をIgo(0.4.3)に取り込んでみた。
形態素解析の部分が少し速くなっている。

比較

MeCab(0.98)、Gomoku(0.0.4)、Igo(0.4.2,0.4.3)の処理速度の比較*1。
以前とはマシンも変わっているので、全部計り直した。
計時には約80MBの日本語テキストデータを用いた。
※ その詳細と計時に使用したプログラムに関しては後述
辞書には全てMeCabのサイトより入手可能なmecab-ipadic-2.7.0-20070801を使用している。

	総処理時間(1)	起動+行読み込み+辞書ロード時間(2)	解析時間(1 - 2)
MeCab(0.98)	15.381s	0.101s	15.280s
Gomoku(0.0.4)	16.458s	0.565s	15.893s
Igo(0.4.2)	20.351s	0.638s	19.713s
Igo(0.4.3)	18.792s	0.638s	18.154s

上の場合だと0.4.2よりは0.4.3の方が8%程度速くなっている。
MeCab、Gomokuよりはまだ一段遅いけど。

テキストデータ

計時に用いたデータは、次のようにして取得可能。

$ wget http://file.reduls.net/blog/20110618/20110618.txt.tar.xz  # ダウンロード
$ tar Jxvf 20110618.txt.tar.xz                                   # 解凍
$ ls -lh 20110618.txt
-rw-r--r-- 1 user user 78M 2011-06-18 00:33 20110618.txt

内容的は以下の二種類のデータの混合:

青空文庫から適当に収集したテキストデータ
Yahoo!ブログから適当に収集したテキストデータ

計時方法

MeCab:

# 計時用ソースコードを取得
# - 使い方等に関しては、ソースコードのコメントも参照のこと
$ wget http://file.reduls.net/blog/20110618/mec.cc 
$ wget http://file.reduls.net/blog/20110618/read_line.h

# コンパイル  ※ MeCab本体は既にインストール済みと仮定する
$ g++ -O3 -omec mec.cc `mecab-config --libs`

# 計時
$ time ./mec 20110618.txt

Gomoku:

# 計時用ソースコードを取得
$ wget http://file.reduls.net/blog/20110618/GomokuBench.java

# コンパイル
$ javac -cp gomoku-0.0.4.jar GomokuBench.java

# 計時
$ time java -server -cp .:gomoku-0.0.4.jar GomokuBench < 20110618.txt

Igo:

# 計時用ソースコードを取得
$ wget http://file.reduls.net/blog/20110618/IgoBench.java

# コンパイル
$ javac -cp igo-0.4.2.jar IgoBench.java

# 計時
$ time java -server -cp .:igo-0.4.2.jar IgoBench Igoバイナリ辞書 < 20110618.txt  # ver0.4.2用
$ time java -server -cp .:igo-0.4.3.jar IgoBench Igoバイナリ辞書 < 20110618.txt  # ver0.4.3用

*1:例によって厳密では全くない。参考程度。

2011-03-21

Igo: GAE版の辞書データ読み込み速度向上

Java library speed

igo-gaeを修正して、辞書データ読み込みを速度を向上させた。
※ igo-gaeの現在のバージョンは0.0.2。

修正内容概要

オリジナルのIgoでは、データ読み込みにはnio系パッケージのjava.nio.channels.FileChannelとjava.nio.MappedByteBufferを使っていた。

// MappedByteBufferを使ったデータ(int列)読み込み例
FileChannel in = new FileInputStream("/path/to/dic-data").getChannel();
MappedByteBuffer buffer = in.map(FileChannel.MapMode.READ_ONLY, 0, in.size());

int[] data = new int[(int)in.size()/4];
buffer.asIntBuffer().get(data);

この方法だとバイナリファイルのデータを高速に読み出すことができる。
ただGoogle App EngineではMappedByteBufferクラスが使用不可となっていたため、初期のigo-gae(ver0.0.1)ではjava.io.DataInputStreamを使った方法で代替していた。

// DataInputStreamを使ったデータ(int列)読み込み例
DataInputSream in = new DataInputStream(new BufferedInputStream(new FileInputStream("/path/to/dic-data")));

int[] data = new int[new File("/path/to/dic-data").length()/4];
for(int i=0; i < data.length; i++) 
  data[i] = in.readInt();

今回のver0.0.2では、データ読み込み部に再びnio系のパッケージを採用。
MappedByteBufferは使えないので、代わりにjava.nio.ByteBufferを使用。
ByteBufferに対して必要な分(= ファイルサイズ)の領域を明示的に確保し、そこに(コンストラクタ内で)あらかじめファイルの全データを読み込んでおくようにした。
こうしておけば、後の使い方はMappedByteBufferとただのByteBufferの間にほとんど差異はない。

// (マッピングなしの)ByteBufferを使ったデータ(int列)読み込み
FileChannel in = new FileInputStream("/path/to/dic-data").getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect((int)in.size()/4);
in.read(buffer);  // ここでファイルのデータを全て読み込んでおく
buffer.flip(); 

int[] data = new int[(int)in.size()/4];
buffer.asIntBuffer().get(data);

この方法は、初めのMappedByteBufferを使うものに比べると若干速度は落ちるが、DataInputStreamを使う方法比べればだいぶ高速。

現時点のigo-gaeのサンプルWebアプリ(http://igo-morp.appspot.com/)はver0.0.2のソースで動いているが、スピンアップ時間がver0.0.1では5秒前後掛かっていたのが、現在は2.5秒前後、と半分程度に短縮されていた。

2011-01-26

Gomoku: MeCabと形態素解析速度比較

Java speed

Igoの時と同じように、Gomoku(0.0.4)とMeCab(0.98)の形態素解析速度を比較してみた。

計時結果

テキストには青空文庫より取得の夏目漱石の『こころ』(x256. 136M. UTF-8)を、辞書にはMeCabのサイトより入手可能なmecab-ipadic-2.7.0-20070801*1を使用。

	総処理時間(1)	行読み込み時間(2)	1 - 2
MeCab	34.781s	0.134s	34.647s
Gomoku	33.313s	0.989s	32.324s

例によって適当な比較なので何とも云えないけど、今回測った限りではMeCabよりも速くなっている*2 *3。

計時用のプログラム

MeCabの計時に用いたプログラム:

/**
 * ファイル名: mec.cc
 * コンパイル: g++ -O3 -omec mec.cc `mecab-config --libs`
 * 計時方法: time mec <対象ファイル>
 *
 * gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
 */
#include <mecab.h>
#include "read_line.h"
#include <cstdio>

int main(int argc, char** argv) {
  mecab_t* m = mecab_new2("-Owakati");
  ReadLine rl(argv[1]);
  const char* line;

  // 以下、Igoの時はmecab_sparse_tostr関数を使っていたがtonode関数の方が若干高速なので、こちらを使用
  while(line=rl.read()) {
    // 行読み込み自体の処理時間を計る時は、以下の一行をコメントアウトする
    mecab_sparse_tonode(m,line);  
  }

  mecab_destroy(m);
  return 10;
}

// read_line.h
#ifndef READ_LINE_H
#define READ_LINE_H

#include <cstring>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

class ReadLine {
public:
  ReadLine(const char* filepath) : buf(NULL),index(0) {
    int f = open(filepath, O_RDONLY);
    if(f==-1)
      return;
    
    struct stat statbuf;
    fstat(f, &statbuf);

    buf = new char[statbuf.st_size+1];
    ::read(f, buf, statbuf.st_size);
    close(f);

    buf[statbuf.st_size]='\0';

    init(buf,buf+statbuf.st_size-1);
  }
  ~ReadLine() {
    delete [] buf;
  }

  operator bool() const { 
    return buf!=NULL;
  }

  void reset() {
    index = 0;
  }

  const char* read() {
    if(index == lines.size())
      return NULL;

    return lines[index++];
  }

private:
  void init(char* cur,const char* end) {
    lines.push_back(cur);
    for(cur=strchr(cur,'\n'); cur; cur=strchr(cur,'\n')) {
      *cur = '\0';
      cur++;
      if(cur < end)
        lines.push_back(cur);
    }
  }

private:
  char* buf;
  std::size_t index;
  std::vector<const char*> lines;
};

#endif

Gomokuの計時に用いたプログラム:

/**
 * ファイル名: GomokuBench.java
 * コンパイル: javac -cp gomoku-0.0.4.jar GomokuBench.java
 * 実行　方法: java -cp .:gomoku-0.0.4.jar GomokuBench  <  入力ファイル
 */
import java.io.IOException;
import net.reduls.gomoku.Tagger;
import net.reduls.gomoku.Morpheme;
import net.reduls.gomoku.util.ReadLine;

public final class GomokuBench {
    public static void main(String[] args) throws IOException {
	final ReadLine rl = new ReadLine(System.in);
        for(String s=rl.read(); s != null; s=rl.read()) {
            // 行読み込み自体の処理時間を計る時は、以下の一行をコメントアウトする
            Tagger.wakati(s);
        }
    }   
}

*1:gomoku-0.0.4.jarに組み込まれているのもこの辞書

*2:今回のGomokuの開発で得た知見をIgoにもフィードバックすれば、もしかしたらIgoの方も速くなるのかな？

*3:2011/01/27追記: 別の環境、別のデータで試したら、Gomokuが11.440秒、MeCabが10.228秒、とMeCabの方が速かった。

2010-10-27

マルチキークイックソートとstd::sortの比較

C++ algorithm speed

以前にCommon Lispで実装したマルチキークイックソートをC++で書き直して、STLのstd::sortと速度を比較してみた。

使用データ

Wikipediaの記事タイトル約500万行を使用。
データ作成方法などはここを参照。

$ wc -l wiki.title.500
5340378 wiki.title.500

$ ls -lh wiki.title.500
-rw-r--r-- 1 user user 103M 2010-10-27 02:14 wiki.title.500

これを次のRuby スクリプトでシャッフルする。

# ファイル名: shuffle.rb
lines=open(ARGV[0]).read.split("\n")

2.times{|k|
  lines.size.times{|i|
    j = rand(lines.size)
    tmp = lines[i]
    lines[i] = lines[j]
    lines[j] = tmp
  }
}

lines.each{|l|
  puts l
}

$ ruby shuffle.rb wiki.title.500 > wiki.shuffle

$ head wiki.shuffle
西村知道
カービィのピンボール
Hudson_K〓rfezi
De_Sanctis-Cacchionen_syndrooma
倍数性
Arto_Kulmala

比較結果

実行速度の比較結果。
比較に用いたソースコードは末尾に掲載。

# マルチキークイックソート
$ ./mqsort 0 < wiki.shuffle > /dev/null
MULTIKEY QUICK SORT:
Elapsed: 4.58632 sec

# std::sort
$ ./mqsort 1 < wiki.shuffle > /dev/null
std::sort:
Elapsed: 9.90535 sec

この例ではマルチキークイックソートの方がニ倍程度早かった。

ソースコード

実装ソースコード。
以前のCommon LispのそれをC++に直訳したもの。
チューニングすればもう少し早くなるかも*1。

/**
 * ファイル名: mqsort.cc
 * コンパイル: g++ -O2 -o mqsort mqsort.cc   # gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
 *
 * 標準入力から行を読み込み、ソートした結果を出力する。
 * ソート方法はマルチキークイックソートかstd::sort。
 * ※ コマンドの第一引数に0を渡せば前者、それ以外なら後者。
 *
 * Usage: mqsort [0 or 1] < [入力ファイル]  > [出力ファイル]
 */
#include <iostream>
#include <vector>
#include <string>
#include <cstring>
#include <algorithm>

/**
 * 計時用関数
 */
#include <sys/time.h>
inline double gettime(){
  timeval tv;
  gettimeofday(&tv,NULL);
  return static_cast<double>(tv.tv_sec)+static_cast<double>(tv.tv_usec)/1000000.0;
}


/**
 * マルチキークイックソート
 */
struct Range {
  Range(const char** b, const char** e) : begin(b), end(e) {}
  const char** begin;
  const char** end;
};

inline void swap_range(const char** beg1, const char** beg2, int count) {
  for(int i=0; i < count; i++)
    std::swap(beg1[i], beg2[i]);
}

inline unsigned char code(const char** ptr, int depth) {
  return (unsigned char)(*ptr)[depth];
}

inline void set_pivot_at_front(const char** beg, const char** end, int depth) {
  const char** mid = beg + (end-beg)/2;
  const char** las = end-1;
  unsigned char a = code(beg,depth);
  unsigned char b = code(mid,depth);
  unsigned char c = code(las,depth);

  if(a<b) {
    if(a<c) {
      b<c ? std::swap(*beg,*mid) : std::swap(*beg,*las);
    }
  } else {
    if(b<c) {
      std::swap(*beg,*mid);
    } else {
      if(a<c)
        std::swap(*beg,*las);
    }
  }
}

Range partition(const char** beg, const char** end, int depth) {
  set_pivot_at_front(beg, end, depth);
  
  unsigned char pivot = code(beg,depth);
  const char** ls_front = beg+1;
  const char** ls_last  = beg+1;
  const char** gt_front = end-1;
  const char** gt_last  = end-1;

  for(;;) {
    for(; ls_last <= gt_front && code(ls_last,depth) <= pivot; ls_last++)
      if(pivot == code(ls_last,depth)) {
        std::swap(*ls_front, *ls_last);
        ls_front++;
      }
    for(; ls_last <= gt_front && code(gt_front,depth) >= pivot; gt_front--)
      if(pivot == code(gt_front,depth)) {
        std::swap(*gt_front, *gt_last);
        gt_last--;
      }

    if(ls_last > gt_front)
      break;
    std::swap(*ls_last, *gt_front);
    ls_last++;
    gt_front--;
  }

  const char** ls_beg = ls_front;
  const char** ls_end = ls_last;
  const char** gt_beg = ls_last;
  const char** gt_end = gt_last+1;
  
  int len1 = std::min(ls_beg-beg, ls_end-ls_beg);
  swap_range(beg, ls_end-len1, len1);

  int len2 = std::min(end-gt_end, gt_end-gt_beg);
  swap_range(gt_beg, end-len2, len2);

  return Range(ls_end-(ls_beg-beg),
               gt_beg+(end-gt_end));
}

inline void swap_if_greater(const char*& x, const char*& y, int depth) {
  if(strcmp(x+depth, y+depth) > 0) 
    std::swap(x, y);
}

void mqsort_impl(const char** begin, const char** end, int depth) {
  int len = end-begin;
  if(len <= 2) {
    if(len == 2) 
      swap_if_greater(*begin, *(begin+1), depth);
  } else {
    Range r = partition(begin, end, depth);
    mqsort_impl(begin, r.begin, depth);
    if(code(r.begin,depth) != '\0')
      mqsort_impl(r.begin, r.end, depth+1);
    mqsort_impl(r.end, end, depth);
  }
}

void mqsort(const char** begin, const char** end) {
  mqsort_impl(begin, end, 0);
}

// std::sortに渡す比較関数
inline bool str_less_than(const char* a, const char* b) {
  return strcmp(a,b) < 0;
}

/**
 * main関数
 */
int main(int argc, char** argv) {
  // 標準入力から行読み込み
  std::vector<std::string> lines;
  std::string line;
  while(getline(std::cin,line)) 
    lines.push_back(line);

  // char*の配列に変換
  const char** words = new const char*[lines.size()];
  for(int i=0; i < lines.size(); i++)
    words[i] = lines[i].c_str();
  
  // ソート
  double beg_t = gettime();
  if(strcmp(argv[1],"0")==0) {
    std::cerr << "MULTIKEY QUICK SORT:" << std::endl;
    mqsort(words, words+lines.size());
  } else {
    std::cerr << "std::sort:" << std::endl;
    std::sort(words, words+lines.size(), str_less_than);
  }
  std::cerr << "Elapsed: " << gettime()-beg_t << " sec" << std::endl;

  // 結果出力
  for(int i=0; i < lines.size(); i++) 
    std::cout << words[i] << std::endl;

  return 0;
};

*1:クイックソートでは定番の、範囲が狭くなったら挿入ソートを使う、とか。