Packrat Parsing

Packrat Parsing: Simple, Powerful, Lazy, Linear Time』という論文の流し読みが終わったので、現状での理解を軽く書いておく。

概要

※ 以下、テキトウな理解なので、間違っている可能性有り
Packrat Parsingというパース手法。
再帰下降型の一種。
再帰下降型のパーサは、主に「先読み型」*1と「バックトラック型」*2に分類され、前者は高速(入力テキストに対して線形時間)だけど表現力に乏しく、後者は表現力が高いが低速(入力テキストに対して指数時間になりうる)、という特徴がある。
Packrat Parsingでは「バックトラック型」とメモ化を組み合わせることで、表現力が高くかつ線形時間で終了するパーサを実現。
ただし、メモ化故にメモリ消費量は大きい。
Haskell(の用に遅延評価がある言語)を使うとキレイ(かつ効率的?)に書ける。

Common Lispでの実装例

※ 以下、テキトウな理解に基づいた実装なので、間違っている可能性有り
実装例。
四則演算用のパーサを「バックトラック型の再帰下降」と「Packrat Parsing(バックトラック+メモ化)」を用いて実装する。


文法:
以下、一応の文法定義。演算子の結合順序が一般的なものとは逆で、右結合となっているので注意

Additive  <- Multitive '+' Additive
           | Multitive '-' Additive
           | Multitive 

Multitive <- Primary '*' Multitive
           | Primary '/' Multitive
           | Primary 

Primary   <- '(' Additive ')'
           | Decimal

Decimal   <- '0' | ... | '9'

補助関数:
まず最初は補助関数等を作成。

;;;; 文字列操作用関数群:
;; 範囲外アクセス時のエラー送出を抑えて、デフォルト値を返すようにする
(defun char* (s i)
  (or (ignore-errors (char s i)) #\Null))

(defun subseq* (s &optional start end)
  (or (ignore-errors (subseq s start end)) ""))

;;;; 型定義:
;; '(return ...) 形式のリストを型として認識するようにする
;; 一つ後のマクロ定義内で使用
(defun return-exp-p (exp) (and (listp exp) (eq (car exp) 'return)))
(deftype return-exp () '(satisfies return-exp-p))

;;;; パーサ定義補助用マクロ:
;; 今回のパーサに必要な以下の三つの機能を簡潔に記述可能なようにする
;; 1] 入力文字列の先頭文字のチェック
;; 2] パース結果の返却
;; 3] 他のパーサの呼び出し
(defmacro with-parser ((input) &body tokens)
  (labels
   ((recur (tokens)
      (destructuring-bind (token . rest) tokens
        (etypecase token
          (string                                          ; 1] 入力文字列の現在位置の文字をチェックする  
           `(and (char= ,(char token 0) (char* ,input 0))   
                 (let ((,input (subseq* ,input 1))) ; 一文字進める
                   ,(recur rest))))
          (return-exp                                      ; 2] パース結果の値を返す
           `(list t ,(second token) ,input))               
          (list                                            ; 3] 別のパーサを呼び出し、その結果を変数に格納する
           (destructuring-bind (var <- parser) token       
             (declare (ignore <-)) ; 単なる目印用変数なので実際には利用しない
             `(destructuring-bind (#1=#:succeed? ,var ,input) (,parser ,input)
                (and #1# ,(recur rest)))))))))
   (recur tokens)))

バックトラック版:
バックトラック版の実装。

;; パース関数
;; => (list パースに成功したかどうか  パース結果  残り(未パース)の文字列)
(defun parse (s)
  (additive s))

;; Additive
(defun additive (s)
  (labels ((alt1 ()  ; Additive <- Multitive '+' Additive
             (with-parser (s)
               (v1 <- multitive)
               "+"
               (v2 <- additive)
               (return (+ v1 v2))))
           (alt2 ()  ; Additive <- Multitive '-' Additive
             (with-parser (s)
               (v1 <- multitive)
               "-"
               (v2 <- additive)
               (return (- v1 v2))))
           (alt3 ()  ; Additive <- Multitive
             (multitive s)))
    ;; try and backtrack
    (or (alt1)    ; 一番目を試す
        (alt2)    ; 一番目が失敗なら、二番目
        (alt3)))) ; 二番目も失敗なら、三番目  ※ 分かりにくいけど、これは必ず非nilを返すようになっている

;; Multitive
(defun multitive (s)
  (labels ((alt1 ()  ; Multitive <- Primary '*' Multitive
             (with-parser (s)
               (v1 <- primary)
               "*"
               (v2 <- multitive)
               (return (* v1 v2))))
           (alt2 ()  ; Multitive <- Primary '/' Multitive
             (with-parser (s)
               (v1 <- primary)
               "/"
               (v2 <- multitive)
               (return (/ v1 v2))))
           (alt3 ()  ; Multitive <- Primary
             (primary s)))
    ;; try and backtrack
    (or (alt1)    ; 一番目を試す
        (alt2)    ; 一番目が失敗なら、二番目
        (alt3)))) ; 二番目も失敗なら、三番目  ※ 分かりにくいけど、これは必ず非nilを返すようになっている

;; Primary
(defun primary (s)
  (labels ((alt1 ()  ; Primary <- '(' Additive ')'
             (with-parser (s)
               "("
               (v <- additive)
               ")"
               (return v)))
           (alt2 ()  ; Primary <- Decimal
             (decimal s)))
    ;; try and backtrack
    (or (alt1)    ; 一番目を試す
        (alt2)))) ; 一番目が失敗なら、二番目

;; Decimal
;;  - 他のパーサがwith-parserマクロを使用している行っていることを、ここだけ手書きで書いている
(defun decimal (s)
  (or (and (char<= #\0 (char* s 0) #\9)  ; Decimal <- '0' | ... | '9'
           (list t (parse-integer s :end 1) (subseq* s 1)))
      (list nil nil s)))                 ; 不正な入力

実行例。

(parse "1+2")
--> (T 3 "")  ; ニ番目の要素がパース結果の値

(parse "4*2-((3+1)-(9/3))*2")
--> (T 6 "")

;; 間違った入力文字列
(parse "4**2-3/5")
--> (T 4 "**2-3/5")  ; 一文字目まででパースが終了している

試しにmultitive関数をtraceしてみると、何故バックトラック版が非効率かが分かる。

(trace multitive)

;; 実行
(parse "4*2-((3+1)-(9/3))*2")
  0: (MULTITIVE "4*2-((3+1)-(9/3))*2")
    1: (MULTITIVE "2-((3+1)-(9/3))*2")
    1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2")
  0: (MULTITIVE "4*2-((3+1)-(9/3))*2")
    1: (MULTITIVE "2-((3+1)-(9/3))*2")
    1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2")
  0: (MULTITIVE "((3+1)-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "2")
    1: MULTITIVE returned (T 2 "")
  0: MULTITIVE returned (T 2 "")
  0: (MULTITIVE "((3+1)-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "2")
    1: MULTITIVE returned (T 2 "")
  0: MULTITIVE returned (T 2 "")
  0: (MULTITIVE "((3+1)-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "2")
    1: MULTITIVE returned (T 2 "")
  0: MULTITIVE returned (T 2 "")
--> (T 6 "")  ; 結果

(untrace) ; trace解除

20文字くらいの文字列をパースするだけでも、multitive関数がかなりの回数呼び出されている。


Packrat Parsing版:
上で実装したバックトラック版パーサに、メモ化を追加したのがPackrat Parsing版(だと理解している、違うかも...)
バックトラック版のtrace結果を見ると、multitive関数呼び出しのかなりの割合が同じ引数(入力文字列)に対するものなので、その再計算を避けるようにすれば、計算量を大幅に減らすことができる。
実際に、メモ化により同じ引数(入力文字列)に対して再計算を行わないようにすれば、一つのパーサ関数の呼び出し回数*3は、入力文字列の長さ分に抑えられることになる*4
そのため、全体としての回数呼び出し回数は最大でも「パーサ関数の数 * 入力文字列の長さ」となり、入力文字列に対して線形時間で処理可能となる、らしい。


以下は、その実装。
各パーサ関数の本体を(memo ...)で囲っている以外は、バックトラック版と全く同じ*5

;;;; メモ化マクロ
;; bodyを評価する前にargをチェックして、以前同じargを渡されたことがあった場合は、(bodyは評価せずに)その時の結果を返すようにする
(defmacro memo ((arg &key (test #'equal)) &body body)
  (let ((memo (make-hash-table :test test)))
    `(let ((#1=#:key ,arg))
       (multiple-value-bind (#2=#:val #3=#:suceeded?) (gethash #1# ,memo)
         (if #3#
             #2#
           (setf (gethash #1# ,memo) (progn ,@body)))))))

;;;;;;;;;;;
;;;; パーサ
;; パース関数
;; => (list パースに成功したかどうか  パース結果  残り(未パース)の文字列)
(defun parse (s)
  (additive s))

(defmacro memo ((arg &key (test #'equal)) &body body)
  (declare (sb-ext:muffle-conditions warning))
  (let ((memo (make-hash-table :test test)))
    `(let ((#1=#:key ,arg))
       (multiple-value-bind (#2=#:val #3=#:suceeded?) (gethash #1# ,memo)
         (if #3#
             #2#
           (setf (gethash #1# ,memo) (progn ,@body)))))))


;; Additive
(defun additive (s)
  (memo (s)
    (labels ((alt1 ()  ; Additive <- Multitive '+' Additive
               (with-parser (s)
                 (v1 <- multitive)
                 "+"
                 (v2 <- additive)
                 (return (+ v1 v2))))
             (alt2 ()  ; Additive <- Multitive '-' Additive
               (with-parser (s)
                 (v1 <- multitive)
                 "-"
                 (v2 <- additive)
                 (return (- v1 v2))))
             (alt3 ()  ; Additive <- Multitive
               (multitive s)))
      ;; try and backtrack
      (or (alt1)    ; 一番目を試す
          (alt2)    ; 一番目が失敗なら、二番目
          (alt3))))); 二番目も失敗なら、三番目  ※ 分かりにくいけど、これは必ず非nilを返すようになっている

;; Multitive
(defun multitive (s)
  (memo (s)
    (labels ((alt1 ()  ; Multitive <- Primary '*' Multitive
               (with-parser (s)
                 (v1 <- primary)
                 "*"
                 (v2 <- multitive)
                 (return (* v1 v2))))
             (alt2 ()  ; Multitive <- Primary '/' Multitive
               (with-parser (s)
                 (v1 <- primary)
                 "/"
                 (v2 <- multitive)
                 (return (/ v1 v2))))
             (alt3 ()  ; Multitive <- Primary
               (primary s)))
      ;; try and backtrack
      (or (alt1)    ; 一番目を試す
          (alt2)    ; 一番目が失敗なら、二番目
          (alt3))))); 二番目も失敗なら、三番目  ※ 分かりにくいけど、これは必ず非nilを返すようになっている

;; Primary
(defun primary (s)
  (memo (s)
    (labels ((alt1 ()  ; Primary <- '(' Additive ')'
               (with-parser (s)
                 "("
                 (v <- additive)
                 ")"
                 (return v)))
             (alt2 ()  ; Primary <- Decimal
               (decimal s)))
      ;; try and backtrack
      (or (alt1)    ; 一番目を試す
          (alt2))))); 一番目が失敗なら、二番目

;; Decimal
;;  - 他のパーサがwith-parserマクロを使用している行っていることを、ここだけ手書きで書いている
(defun decimal (s)
  (memo (s)
    (or (and (char<= #\0 (char* s 0) #\9)  ; Decimal <- '0' | ... | '9'
             (list t (parse-integer s :end 1) (subseq* s 1)))
        (list nil nil s))))                ; 不正な入力

実行例とmultitive関数のtrace。

(parse "1+2")
--> (T 3 "")  

(parse "4**2-3/5")
--> (T 4 "**2-3/5") 

(trace multitive)

;; trace
(parse "4*2-((3+1)-(9/3))*2")
  0: (MULTITIVE "4*2-((3+1)-(9/3))*2")
    1: (MULTITIVE "2-((3+1)-(9/3))*2")
    1: MULTITIVE returned (T 2 "-((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2")
  0: (MULTITIVE "4*2-((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 8 "-((3+1)-(9/3))*2")
  0: (MULTITIVE "((3+1)-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
      2: (MULTITIVE "3+1)-(9/3))*2")
      2: MULTITIVE returned (T 3 "+1)-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
      2: (MULTITIVE "1)-(9/3))*2")
      2: MULTITIVE returned (T 1 ")-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(3+1)-(9/3))*2")
    1: MULTITIVE returned (T 4 "-(9/3))*2")
    1: (MULTITIVE "(9/3))*2")
      2: (MULTITIVE "9/3))*2")
        3: (MULTITIVE "3))*2")
        3: MULTITIVE returned (T 3 "))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
      2: MULTITIVE returned (T 3 "))*2")
      2: (MULTITIVE "9/3))*2")
      2: MULTITIVE returned (T 3 "))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "(9/3))*2")
    1: MULTITIVE returned (T 3 ")*2")
    1: (MULTITIVE "2")
    1: MULTITIVE returned (T 2 "")
  0: MULTITIVE returned (T 2 "")
  0: (MULTITIVE "((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 2 "")
  0: (MULTITIVE "((3+1)-(9/3))*2")
  0: MULTITIVE returned (T 2 "")
--> (T 6 "")

(untrace)

大分短くなっている。

感想

線形時間かつ表現力が高い、という点は納得。
あと、実装も簡単。
問題は、オーダーが同じとはいえ、最適化された先読み版の再帰下降パーサや他のボトムアップパーサ等と比べて、実際の場面でどの程度処理時間に差がでるか。
メモリ使用量が大きいことも合わせて考えると、大量のテキストデータのパースというよりは、比較的少量だけど文法が複雑なデータのパースに向いている感じかな?
今回な実装はかなり非効率なものになっているけど、機会があればもう一度ちゃんと論文を読んで、しっかりと実装してみても良いかもしれない。

*1:「入力文字列の先頭N文字を読み込んで、次の構文要素を予測(決定)し、パースを行う」ということを繰り返す。バックトラック型のような後戻りはないので、入力文字列を先頭から末尾まで一回走査するだけでパースが完了する。

*2:「あるパースを試してみて、失敗したら入力文字列の元の場所に戻り、別のパースを試す」ということを成功する(or 全て失敗する)まで繰り返す。

*3:正確には呼び出した後の本体処理の実行回数

*4:ただし「パーサ関数の返り値は引数のみによって決定される(同じ引数なら同じ結果を返す)」ということが前提としてある

*5:ちなみに今回の実装では簡単のために、メモ化にハッシュテーブルを利用しているが、これは非効率。冒頭の論文では、ハッシュテーブルを使う変わりに、各パーサ(additive,multitive,primary,decimal)の遅延されたパース結果をフィールドとして保持する構造体を用意し、各パーサ関数は(パーサ関数を呼び出す変わりに)その構造体のフィールドから値を取得するようになっている。こうした場合、各フィールドの(遅延された)値が評価済み(メモ化済み)かどうかをチェックするだけで良いので、ハッシュテーブルに比べて格段に高速となる