abstract syntax tree
I am currently looking for a Java 6/7 parser, which generates some (possibly standartized) form abstract syntax tree.
I have already found that ANTLR has a Java 6 grammar, but it seems, that it only generates parse tree, but not syntax tree. I have also read about Java Compiler API - but all the soources mentioned, that it is overdesigned and poorly documented (and I havent found, if it really generates the AST).
Do you know about any good parser library, with possibly as standardized output as possible?
Thanks
Basically JavaCC and ANTLR are the best tools out there at the moment.
You can find a usable Java 6 grammar in the project's grammar repository. JavaCC is a bit oldschool, rarely updated, but easy to start with, Java-oriented, and generates the AST (search for JJTree). It's a bit, well... strange on the first sight, but you can get used to it.
Both tools have a nice IDE support (eg, Eclipse plug-ins), but I think (based on your description) what you need is JavaCC. Give it a try.
Our DMS Software Reengineering Toolkit with its Java front end can provide an AST (example at SO).
The distinction you draw beween "needed for semantics" (AST) and "is an accident of the grammar" ("Concrete" or "Parse" tree) is interesting. It takes additional effort, somewhere, to drop the CST information to obtain an AST.
You can do that by hand coding the AST construction as semantic actions on rules. That takes effort, and likely gives you a pretty good answer. But this process can pretty much automated completely by observing that literal tokens don't need to be kept in the tree, that unary production chains are unnecessary (except where a unary production introduces semantics), and that lists can be formed automatically. (You can read more about this here: https://stackoverflow.com/a/5732290/120163)
This is the approach taken by DMS. You write the grammar. DMS parses and builds the AST using these idea. No additional work/semantic actions on your part.
For a stone-stable grammer that already has this done for you, there's not a clear advantage, and if all you want is an AST than using JavaCC or ANTLR will work. If the grammar can change, then it is easier with DMS's approach.
But, nobody wants just an AST. Its the first step in a long series of steps that leads to whatever tool you are imagining. As a practical matter with real tools, you will almost surely need "symbol tables" and the abiliy to determine which symbol table entry an identifier node selects. You may need control and data flow analysis. You may need to modify the AST to make changes if your tool is a "change" and not just an analysis tool, and for that you might want something that can match/patch arbitrary chunks of the AST using the surface syntax of your langauge (eg, Java). Finally, you may want to regenerate source code from you AST as legal, compilable text.
These are not easy mechanisms to build. We think we are competent engineers; it took us some several months on and off over the last 5 years to get the Java grammars (1.3 to 6 and 7) right. It took us about a year to build the symbol table machinery for Java; how symbols are resolved are a lot more complicated than you think; go read the langauge standard.
DMS provides all of these capabilities for many langauges, including Java, out of the box. For those languages with lesser support, it has parsing, prettyprinting, tree transformations, and attribute evaluation out of the box.
I've been hearing, for the last 20 years, If I just had a parser.... My experience (and the reason I built DMS) is that an AST is just not enough, by a long shot.
And I think what DMS provides (far) above and beyond "mere parsing" sets it far apart from "JavaCC and ANTLR". I do not believe they are "the best tools out there at the moment", unless you are optimizing on "free" and not "getting the job done". (If you want a free tool closer to the mark, consider using Eclipse's Java parsing machinery. At least it has, AFAIK, symbol table lookup).
I know two open source project to create and manipulate the Java AST:
上一篇: 解析器和抽象语法树
下一篇: 抽象语法树