Packrat Parsing
『Packrat Parsing: Simple, Powerful, Lazy, Linear Time』という論文の流し読みが終わったので、現状での理解を軽く書いておく。
概要
※ 以下、テキトウな理解なので、間違っている可能性有り
Packrat Parsingというパース手法。
再帰下降型の一種。
再帰下降型のパーサは、主に「先読み型」*1と「バックトラック型」*2に分類され、前者は高速(入力テキストに対して線形時間)だけど表現力に乏しく、後者は表現力が高いが低速(入力テキストに対して指数時間になりうる)、という特徴がある。
Packrat Parsingでは「バックトラック型」とメモ化を組み合わせることで、表現力が高くかつ線形時間で終了するパーサを実現。
ただし、メモ化故にメモリ消費量は大きい。
Haskell(の用に遅延評価がある言語)を使うとキレイ(かつ効率的?)に書ける。
Common Lispでの実装例
※ 以下、テキトウな理解に基づいた実装なので、間違っている可能性有り
実装例。
四則演算用のパーサを「バックトラック型の再帰下降」と「Packrat Parsing(バックトラック+メモ化)」を用いて実装する。
文法:
以下、一応の文法定義。※ 演算子の結合順序が一般的なものとは逆で、右結合となっているので注意
Additive <- Multitive '+' Additive | Multitive '-' Additive | Multitive Multitive <- Primary '*' Multitive | Primary '/' Multitive | Primary Primary <- '(' Additive ')' | Decimal Decimal <- '0' | ... | '9'
補助関数:
まず最初は補助関数等を作成。
;;;; 文字列操作用関数群: ;; 範囲外アクセス時のエラー送出を抑えて、デフォルト値を返すようにする (defun char* (s i) (or (ignore-errors (char s i)) #\Null)) (defun subseq* (s &optional start end) (or (ignore-errors (subseq s start end)) "")) ;;;; 型定義: ;; '(return ...) 形式のリストを型として認識するようにする ;; 一つ後のマクロ定義内で使用 (defun return-exp-p (exp) (and (listp exp) (eq (car exp) 'return))) (deftype return-exp () '(satisfies return-exp-p)) ;;;; パーサ定義補助用マクロ: ;; 今回のパーサに必要な以下の三つの機能を簡潔に記述可能なようにする ;; 1] 入力文字列の先頭文字のチェック ;; 2] パース結果の返却 ;; 3] 他のパーサの呼び出し (defmacro with-parser ((input) &body tokens) (labels ((recur (tokens) (destructuring-bind (token . rest) tokens (etypecase token (string ; 1] 入力文字列の現在位置の文字をチェックする `(and (char= ,(char token 0) (char* ,input 0)) (let ((,input (subseq* ,input 1))) ; 一文字進める ,(recur rest)))) (return-exp ; 2] パース結果の値を返す `(list t ,(second token) ,input)) (list ; 3] 別のパーサを呼び出し、その結果を変数に格納する (destructuring-bind (var <- parser) token (declare (ignore <-)) ; 単なる目印用変数なので実際には利用しない `(destructuring-bind (#1=#:succeed? ,var ,input) (,parser ,input) (and #1# ,(recur rest))))))))) (recur tokens)))
バックトラック版:
バックトラック版の実装。
;; パース関数 ;; => (list パースに成功したかどうか パース結果 残り(未パース)の文字列) (defun parse (s) (additive s)) ;; Additive (defun additive (s) (labels ((alt1 () ; Additive <- Multitive '+' Additive (with-parser (s) (v1 <- multitive) "+" (v2 <- additive) (return (+ v1 v2)))) (alt2 () ; Additive <- Multitive '-' Additive (with-parser (s) (v1 <- multitive) "-" (v2 <- additive) (return (- v1 v2)))) (alt3 () ; Additive <- Multitive (multitive s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2) ; 一番目が失敗なら、二番目 (alt3)))) ; 二番目も失敗なら、三番目 ※ 分かりにくいけど、これは必ず非nilを返すようになっている ;; Multitive (defun multitive (s) (labels ((alt1 () ; Multitive <- Primary '*' Multitive (with-parser (s) (v1 <- primary) "*" (v2 <- multitive) (return (* v1 v2)))) (alt2 () ; Multitive <- Primary '/' Multitive (with-parser (s) (v1 <- primary) "/" (v2 <- multitive) (return (/ v1 v2)))) (alt3 () ; Multitive <- Primary (primary s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2) ; 一番目が失敗なら、二番目 (alt3)))) ; 二番目も失敗なら、三番目 ※ 分かりにくいけど、これは必ず非nilを返すようになっている ;; Primary (defun primary (s) (labels ((alt1 () ; Primary <- '(' Additive ')' (with-parser (s) "(" (v <- additive) ")" (return v))) (alt2 () ; Primary <- Decimal (decimal s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2)))) ; 一番目が失敗なら、二番目 ;; Decimal ;; - 他のパーサがwith-parserマクロを使用している行っていることを、ここだけ手書きで書いている (defun decimal (s) (or (and (char<= #\0 (char* s 0) #\9) ; Decimal <- '0' | ... | '9' (list t (parse-integer s :end 1) (subseq* s 1))) (list nil nil s))) ; 不正な入力
実行例。
(parse "1+2") --> (T 3 "") ; ニ番目の要素がパース結果の値 (parse "4*2-((3+1)-(9/3))*2") --> (T 6 "") ;; 間違った入力文字列 (parse "4**2-3/5") --> (T 4 "**2-3/5") ; 一文字目まででパースが終了している
試しにmultitive関数をtraceしてみると、何故バックトラック版が非効率かが分かる。
(trace multitive) ;; 実行 (parse "4*2-((3+1)-(9/3))*2") 0: (MULTITIVE "4*2-((3+1)-(9/3))*2") 1: (MULTITIVE "2-((3+1)-(9/3))*2") 1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2") 0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2") 0: (MULTITIVE "4*2-((3+1)-(9/3))*2") 1: (MULTITIVE "2-((3+1)-(9/3))*2") 1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2") 0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2") 0: (MULTITIVE "((3+1)-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "2") 1: MULTITIVE returned (T 2 "") 0: MULTITIVE returned (T 2 "") 0: (MULTITIVE "((3+1)-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "2") 1: MULTITIVE returned (T 2 "") 0: MULTITIVE returned (T 2 "") 0: (MULTITIVE "((3+1)-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "2") 1: MULTITIVE returned (T 2 "") 0: MULTITIVE returned (T 2 "") --> (T 6 "") ; 結果 (untrace) ; trace解除
20文字くらいの文字列をパースするだけでも、multitive関数がかなりの回数呼び出されている。
Packrat Parsing版:
上で実装したバックトラック版パーサに、メモ化を追加したのがPackrat Parsing版(だと理解している、違うかも...)。
バックトラック版のtrace結果を見ると、multitive関数呼び出しのかなりの割合が同じ引数(入力文字列)に対するものなので、その再計算を避けるようにすれば、計算量を大幅に減らすことができる。
実際に、メモ化により同じ引数(入力文字列)に対して再計算を行わないようにすれば、一つのパーサ関数の呼び出し回数*3は、入力文字列の長さ分に抑えられることになる*4。
そのため、全体としての回数呼び出し回数は最大でも「パーサ関数の数 * 入力文字列の長さ」となり、入力文字列に対して線形時間で処理可能となる、らしい。
以下は、その実装。
各パーサ関数の本体を(memo ...)で囲っている以外は、バックトラック版と全く同じ*5。
;;;; メモ化マクロ ;; bodyを評価する前にargをチェックして、以前同じargを渡されたことがあった場合は、(bodyは評価せずに)その時の結果を返すようにする (defmacro memo ((arg &key (test #'equal)) &body body) (let ((memo (make-hash-table :test test))) `(let ((#1=#:key ,arg)) (multiple-value-bind (#2=#:val #3=#:suceeded?) (gethash #1# ,memo) (if #3# #2# (setf (gethash #1# ,memo) (progn ,@body))))))) ;;;;;;;;;;; ;;;; パーサ ;; パース関数 ;; => (list パースに成功したかどうか パース結果 残り(未パース)の文字列) (defun parse (s) (additive s)) (defmacro memo ((arg &key (test #'equal)) &body body) (declare (sb-ext:muffle-conditions warning)) (let ((memo (make-hash-table :test test))) `(let ((#1=#:key ,arg)) (multiple-value-bind (#2=#:val #3=#:suceeded?) (gethash #1# ,memo) (if #3# #2# (setf (gethash #1# ,memo) (progn ,@body))))))) ;; Additive (defun additive (s) (memo (s) (labels ((alt1 () ; Additive <- Multitive '+' Additive (with-parser (s) (v1 <- multitive) "+" (v2 <- additive) (return (+ v1 v2)))) (alt2 () ; Additive <- Multitive '-' Additive (with-parser (s) (v1 <- multitive) "-" (v2 <- additive) (return (- v1 v2)))) (alt3 () ; Additive <- Multitive (multitive s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2) ; 一番目が失敗なら、二番目 (alt3))))); 二番目も失敗なら、三番目 ※ 分かりにくいけど、これは必ず非nilを返すようになっている ;; Multitive (defun multitive (s) (memo (s) (labels ((alt1 () ; Multitive <- Primary '*' Multitive (with-parser (s) (v1 <- primary) "*" (v2 <- multitive) (return (* v1 v2)))) (alt2 () ; Multitive <- Primary '/' Multitive (with-parser (s) (v1 <- primary) "/" (v2 <- multitive) (return (/ v1 v2)))) (alt3 () ; Multitive <- Primary (primary s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2) ; 一番目が失敗なら、二番目 (alt3))))); 二番目も失敗なら、三番目 ※ 分かりにくいけど、これは必ず非nilを返すようになっている ;; Primary (defun primary (s) (memo (s) (labels ((alt1 () ; Primary <- '(' Additive ')' (with-parser (s) "(" (v <- additive) ")" (return v))) (alt2 () ; Primary <- Decimal (decimal s))) ;; try and backtrack (or (alt1) ; 一番目を試す (alt2))))); 一番目が失敗なら、二番目 ;; Decimal ;; - 他のパーサがwith-parserマクロを使用している行っていることを、ここだけ手書きで書いている (defun decimal (s) (memo (s) (or (and (char<= #\0 (char* s 0) #\9) ; Decimal <- '0' | ... | '9' (list t (parse-integer s :end 1) (subseq* s 1))) (list nil nil s)))) ; 不正な入力
実行例とmultitive関数のtrace。
(parse "1+2") --> (T 3 "") (parse "4**2-3/5") --> (T 4 "**2-3/5") (trace multitive) ;; trace (parse "4*2-((3+1)-(9/3))*2") 0: (MULTITIVE "4*2-((3+1)-(9/3))*2") 1: (MULTITIVE "2-((3+1)-(9/3))*2") 1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2") 0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2") 0: (MULTITIVE "4*2-((3+1)-(9/3))*2") 0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2") 0: (MULTITIVE "((3+1)-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 2: (MULTITIVE "3+1)-(9/3))*2") 2: MULTITIVE returned (T 3 "+1)-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 2: (MULTITIVE "1)-(9/3))*2") 2: MULTITIVE returned (T 1 ")-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(3+1)-(9/3))*2") 1: MULTITIVE returned (T 4 "-(9/3))*2") 1: (MULTITIVE "(9/3))*2") 2: (MULTITIVE "9/3))*2") 3: (MULTITIVE "3))*2") 3: MULTITIVE returned (T 3 "))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 2: MULTITIVE returned (T 3 "))*2") 2: (MULTITIVE "9/3))*2") 2: MULTITIVE returned (T 3 "))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "(9/3))*2") 1: MULTITIVE returned (T 3 ")*2") 1: (MULTITIVE "2") 1: MULTITIVE returned (T 2 "") 0: MULTITIVE returned (T 2 "") 0: (MULTITIVE "((3+1)-(9/3))*2") 0: MULTITIVE returned (T 2 "") 0: (MULTITIVE "((3+1)-(9/3))*2") 0: MULTITIVE returned (T 2 "") --> (T 6 "") (untrace)
大分短くなっている。
感想
線形時間かつ表現力が高い、という点は納得。
あと、実装も簡単。
問題は、オーダーが同じとはいえ、最適化された先読み版の再帰下降パーサや他のボトムアップパーサ等と比べて、実際の場面でどの程度処理時間に差がでるか。
メモリ使用量が大きいことも合わせて考えると、大量のテキストデータのパースというよりは、比較的少量だけど文法が複雑なデータのパースに向いている感じかな?
今回な実装はかなり非効率なものになっているけど、機会があればもう一度ちゃんと論文を読んで、しっかりと実装してみても良いかもしれない。
*1:「入力文字列の先頭N文字を読み込んで、次の構文要素を予測(決定)し、パースを行う」ということを繰り返す。バックトラック型のような後戻りはないので、入力文字列を先頭から末尾まで一回走査するだけでパースが完了する。
*2:「あるパースを試してみて、失敗したら入力文字列の元の場所に戻り、別のパースを試す」ということを成功する(or 全て失敗する)まで繰り返す。
*3:正確には呼び出した後の本体処理の実行回数
*4:ただし「パーサ関数の返り値は引数のみによって決定される(同じ引数なら同じ結果を返す)」ということが前提としてある
*5:ちなみに今回の実装では簡単のために、メモ化にハッシュテーブルを利用しているが、これは非効率。冒頭の論文では、ハッシュテーブルを使う変わりに、各パーサ(additive,multitive,primary,decimal)の遅延されたパース結果をフィールドとして保持する構造体を用意し、各パーサ関数は(パーサ関数を呼び出す変わりに)その構造体のフィールドから値を取得するようになっている。こうした場合、各フィールドの(遅延された)値が評価済み(メモ化済み)かどうかをチェックするだけで良いので、ハッシュテーブルに比べて格段に高速となる