ANTLR (or alternative): decoupling parsing from evaluation

I have a relatively simple DSL that I would like to handle more robustly than a bunch of manually-coded java.util.regex.Pattern statements + parsing logic.

The most-quoted tool seems to be ANTLR. I'm not familiar with it and am willing to give it a try. However I get a little leery when I look at the examples (eg the ANTLR expression evaluator example, or Martin Fowler's HelloAntlr, or this other Q on stackoverflow). The reason for this is that the grammar files seem like they are a hodgepodge of grammar definitions interspersed with fragments of the implementation language (eg Java) that are imperative in nature.

What I would really prefer is to separate out the imperative / evaluation part of the parser. Is there a way to use ANTLR (or some other tool) to define a grammar & produce a set of Java source files so that it compiles into classes that I can use to parse input into a structure w/o acting upon that structure?

for example, if I wanted to use expression evaluation with just the + and * and () operators, and I had the input

3 * (4 + 7 * 6) * (3 + 7 * (4 + 2))

then what I would like to do is write a grammar to convert that to a hierarchical structure like

Product
  Term(3)
  Sum
     Term(4)
     Product
        Term(7)
        Term(6)
  Sum
     Term(3)
     Product
        Term(7)
        Sum
            Term(4)
            Term(2)

where I can use classes like

interface Expression<T> {
    public T evaluate();
}

class Term implements Expression<Double> {
    final private double value;
    @Override public Double evaluate() { return value; }
}

class Product implements Expression<Double> {
    final private List<Expression<Double>> terms;
    @Override public Double evaluate() {
        double result = 1;
        for (Expression<Double> ex : terms)
            result *= ex.evaluate();
        return result;
    }
}

class Sum implements Expression<Double> {
    final private List<Expression<Double>> terms;
    @Override public Double evaluate() {
        double result = 0;
        for (Expression<Double> ex : terms)
            result += ex.evaluate();
        return result;
    }
}

and use ANTLR to construct the structure. Is there a way to do this? I would really rather pursue this approach, as it lets me (and other software engineers) edit and visualize complete Java classes without having to have those classes fragmented into weird pieces in ANTLR grammar files.

Is there a way to do this?


clarification: I want to spend as much of my effort as possible in two ways: defining the grammar itself, and in ANTLR-independent Java (eg my Product/Sum/Term classes). I want to minimize the amount of time/experience I have to spend learning ANTLR syntax, quirks and API. I don't know how to create and manipulate an AST from ANTLR grammar. Because this is only a small part of a large Java project, it's not just me, it's anyone in my team that has to review or maintain my code.

(I don't mean to sound impertinent: I'm willing to make the investment of time and energy to use a tool, but only if the tool becomes a useful tool and does not continue to become a stumbling block.)


Jason S wrote:

Is there a way to do this?

Yes.

First define your grammar (I took your example of an expression parser with only the + and * and () operators):

grammar Exp;

// parser rules
parse
  :  additionExp
  ;

additionExp
  :  multiplyExp (Add multiplyExp)*
  ;

multiplyExp
  :  atomExp (Mult atomExp)* 
  ;

atomExp
  :  Number
  |  LParen additionExp RParen
  ;

// lexer rules
Add    : '+' ;
Mult   : '*' ;
LParen : '(' ;
RParen : ')' ;   
Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Spaces : (' ' | 't' | 'r'| 'n') {$channel=HIDDEN;} ;

If you want to let ANTLR generate a proper AST from the grammar above, you must put the following at the top of your grammar (under the grammar declaration):

options { 
  output=AST; 
}

and you must indicate what the root of each of your parse rules should be. This can be done in two ways:

  • by using rewrite rules;
  • or by placing one of the "inline tree-operators" ^ and ! after the tokens:
  • ^ means: make this token the root;
  • ! means: exclude this token from the AST.
  • Now your grammar would look like this:

    grammar Exp;
    
    options { 
      output=AST; 
    }
    
    // parser rules
    parse
      :  additionExp
      ;
    
    additionExp
      :  multiplyExp (Add^ multiplyExp)*
      ;
    
    multiplyExp
      :  atomExp (Mult^ atomExp)* 
      ;
    
    atomExp
      :  Number
      |  LParen! additionExp RParen!
      ;
    
    // lexer rules
    Add    : '+' ;
    Mult   : '*' ;
    LParen : '(' ;
    RParen : ')' ;   
    Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
    Spaces : (' ' | 't' | 'r'| 'n') {$channel=HIDDEN;} ;
    

    As you can see, I made the Add and Mult roots, and excluded the parenthesis.

    Now generate a lexer & parser from the grammar:

    java -cp antlr-3.2.jar org.antlr.Tool Exp.g 
    

    create a little test harness:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.*;
    import java.util.*;
    
    public class Main {
    
        private static void preOrder(CommonTree tree, int depth) {
            for(int i = 0; i < depth; i++) {
                System.out.print("- ");
            }
            System.out.println("> "+tree + " :: " + ExpParser.tokenNames[tree.getType()]);
            List children = tree.getChildren();
            if(children == null) return;
            for(Object o : children) {
                preOrder((CommonTree)o, depth+1);
            }
        }
    
        public static void main(String[] args) throws Exception {
            ANTLRStringStream in = new ANTLRStringStream("3 * (4 + 7 * 6) * (3 + 7 * (4 + 2))");
            ExpLexer lexer = new ExpLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            ExpParser parser = new ExpParser(tokens);
            CommonTree tree = (CommonTree)parser.parse().getTree();
            preOrder(tree, 0);
        }
    }
    

    compile everything:

    javac -cp antlr-3.2.jar *.java
    

    and run the Main class:

    // *nix/Mac OS
    java -cp .:antlr-3.2.jar Main
    
    // Windows
    java -cp .;antlr-3.2.jar Main
    

    which produces the following:

    > * :: Mult
    - > * :: Mult
    - - > 3 :: Number
    - - > + :: Add
    - - - > 4 :: Number
    - - - > * :: Mult
    - - - - > 7 :: Number
    - - - - > 6 :: Number
    - > + :: Add
    - - > 3 :: Number
    - - > * :: Mult
    - - - > 7 :: Number
    - - - > + :: Add
    - - - - > 4 :: Number
    - - - - > 2 :: Number
    

    As you can see, the parse rule (method) returns a CommonTree object you can use to create your own walker/visitor leaving the grammar as is.

    HTH


    What about using ANTLR AST (Abstract Syntax Tree) and constructing a mirrored tree with your classes by visiting each tree node.


    @Giuseppe Cardone added some great links that I post here:

    http://www.antlr.org/article/1100569809276/use.tree.grammars.tml

    http://www.antlr.org/article/1170602723163/treewalkers.html

    An example can be found at:

    http://sagarsunkle.spaces.live.com/blog/cns!E07F3B561597E4EE!664.entry?sa=97619042


    The examples you mention embed parser actions right inside the grammar for the sake of conciseness. This works fine for small projects. For bigger ones, you'd prefer to make an AST first and then do whatever you want with it. You can do this, hehe, by embedding actions that create the tree, but antlr provides a nicer, declarative way:

    http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

    You can then use a Tree Grammar to generate code, eg with StringTemplate. I've used this toolchain for my thesis and it worked like a charm. But I bet I'd have suffered a lot without having the Anlr3 Reference Book ( http://pragprog.com/titles/tpantlr/the-definitive-antlr-reference )

    I also found the lecture notes linked on the antlr page really useful: http://www.antlr.org/wiki/display/CS652/CS652+Home

    Also, make use of AntlrWorks to test your grammar. There's also a grammar unit testing suite available. Plus, the antlr mailing list is really active, and Terence Parr actively responds to most posts. Plus, it's a lot of fun.

    链接地址: http://www.djcxy.com/p/43700.html

    上一篇: 通过语法解析为AST(或.y + .lang => xml)的工具

    下一篇: ANTLR(或其他):解析与评估解耦