通过语法解析为AST(或.y + .lang => xml)的工具

给定一个词法分析器的定义文件,语法文件(比如,PostgreSQL的.y.l flex和野牛方案从它的源代码树),以及由这些词法和语法分析器(比方说,一个SQL查询)定义的文件,以获得在一些AST标准格式(比如XML的JSON)。

这个工具最重要的方面是 - 输入格式的灵活性。 在我的例子中,我可以在ANTLR中重新创建postgres SQL语法 - 但我不想。 我宁愿只是使用任何postgres正在使用。 因此,尽管.y文件包含的不仅仅是解析规则 - 我正在寻找的工具只需稍作修改即可理解它们。

有没有一个通用的工具可以做到这一点?

这里是我的虚拟工具ly2xml的命令行会话:

$ git clone git://postgres-git-url pg
$ find pg -iname *.[yl] -exec cp '{}' ~/ ;
$ echo 'SELECT * FROM (SELECT 1)'|ly2xml -parser=*.y -lexer=*.l - -O-
<SELECT>
  <ARGS>*</ARGS>
  <FROM>
    <SELECT><ARGS>1</ARGS></SELECT>
  </FROM>
</SELECT>

(注意-意味着它从标准输入中读取, -O-意味着它写入标准输出中)。


不错的想法。 你正在假设一个或多个:

 a) that each tool that has a grammar, uses a canonical parsing engine type (e.g., everybody uses bison)
 b) that there is some parsing tool that understands the zillion grammar specification schemes that exist
 c) that whatever the parser is, it will parse language fragments (perhaps well formed).

a)显然是错误的。 我从未见过b)。 实际上,没有一个解析引擎可以做c); 他们只能解析“完整的程序”。

你唯一的希望就是使用一个具有大量经过良好测试的语言定义的解析器生成器。

ANTLR可以说是一个; 它肯定有一长串贡献的语言定义。 而且他们在一个地方都很容易找到。 但是,我不知道这些语言片段。 怀疑它是否具有所有分析树的XML导出。

野牛可以说是一个; 有许多使用Bison构建的语言处理器。 但是这些定义到处都是分散的,收集它们非常困难。 也不会做语言片段。 很确定它没有XML导出。

我们的DMS软件再造工具包可说是其中之一。 有很多的语言定义。 他们都收集在一个地方(我们的公司)。 它确实为每个解析生成AST,并且具有内置的XML导出。 DMS还可以解析任何语言的非终结符。

DMS可以很好地模拟你的例子,给出一个DMS .lex,.atg(“归属语法”)和一个兼容的源文件。

以下是一个DMS词法分析器/解析器的构建和运行(带有XML导出),用于在代数中发现的代数语法为DMS Domain(在示例的下半部分是++ XML,是告诉导出XML的解析步骤):

C:DMSDomainsAlgebraToolsParserSource>make
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -lexer
MakeDMSTool: Selected domain "Algebra".
LexerGenerator V2.1a
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing lexical specification ...
Processing mode Algebra ...
Exiting with final status 0
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -tool %Temporaries
MakeDMSTool: Selected domain "Algebra".
Using attribute grammar in "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/Syntax/Algebra.atg"
AttributeEvaluatorGenerator V3.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing attribute grammar ...
Generating attribute evaluator(s) ...
Exiting with final status 0

rm -rf /cygdrive/c/DMS/Domains/Algebra/Tools/%Temporaries
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -prettyprinter
MakeDMSTool: Selected domain "Algebra".
PrettyPrinterGenerator V2.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved

Parsing pretty printer specification ...
Generating pretty printer ...
Exiting with final status 0

AttributeEvaluatorGenerator V3.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing attribute grammar ...
Generating attribute evaluator(s) ...
......................

Exiting with final status 0
cd /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/%Generated; 
    perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -weave-preserve-productions %PreserveProductions.*.par
MakeDMSTool: Selected domain "Algebra".
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -parser
MakeDMSTool: Selected domain "Algebra".
export PARLANSEINCLUDEDIRECTORIES=`perl -e '($_ = $ARGV[0].";/cygdrive/c/DMS/Domains/PARLANSE/Library/Arrays;/cygdrive/c/DMS/Domains
/PARLANSE/Library/Bags;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTables;/cygdrive/c/DMS/Domains/PARLANSE/Library/Pipes;/cygdrive/
c/DMS/Domains/PARLANSE/Library/Sequences;/cygdrive/c/DMS/Domains/PARLANSE/Library/Sets;/cygdrive/c/DMS/Domains/PARLANSE/Library/Stac
ks;/cygdrive/c/DMS/Domains/PARLANSE/Library/Utilities;/cygdrive/c/DMS/Domains/PARLANSE/Library/Algorithms/Source;/cygdrive/c/DMS/Dom
ains/PARLANSE/Library/Booleans/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Characters/Source;/cygdrive/c/DMS/Domains/PARLANSE/Li
brary/Graphics/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Numbers/Sou
rce;/cygdrive/c/DMS/Domains/PARLANSE/Library/References/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SQL/Source;/cygdrive/c/DMS/D
omains/PARLANSE/Library/Streams/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SuffixTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/
Library/System/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Search/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/TestSupport/So
urce") =~ s!//(.)/!$1:/!g; $_ =~ s!/cygdrive/(.)/!$1:/!g; print $_' "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source;/cygdrive/c
/DMS/Domains/Algebra/Tools/Parser/Source/Components;/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/%Generated;/cygdrive/c/DMS/D
omains/DMSStringGrammar/Tools/DomainParser/Source;/cygdrive/c/DMS/Domains/Algebra/Tools/Lexer/Source;/cygdrive/c/DMS/Domains/Algebra
/Tools/Lexer/Source/%Generated;/cygdrive/c/DMS/Domains/DMSLexical/Tools/DomainLexer/Source;/cygdrive/c/DMS/Infrastructure/HyperGraph
/Source;/cygdrive/c/DMS/Domains"`; 
    cd `echo /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source`; 
    nice /cygdrive/c/DMS/Domains/PARLANSE/Tools/Compiler/p0c.exe  DomainParser.par
PARLANSE0 Compiler V19.16.40
Semantic Designs, Inc. *** Confidential Information
128/485/133408 smallest/average/largest activation record/grain stack space required.
Largest stack space required by function at Line    1533
 in file FFIModule.par
89 grains.
3775 functions/procedures.
223447 lines of source code read.
7160772 bytes of object code.
No errors detected.
mv -f /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/DomainParser.P0B /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/DomainParser
.P0B

C:DMSDomainsAlgebraToolsParserSource>run ../DomainParser ++XML C:DMSDomainsAlgebraToolsLexerTestCasealgebraformula.txt
Domain Parser for Algebra 2.3.3
Copyright (C) Semantic Designs 1996-2010; All Rights Reserved
31 tree nodes in tree.
<DMSForest>
 <tree node="formula" type="1" domain="1" id="10qx0" parents="0" line="1" column="1" file="1">
  <tree node="product" type="4" domain="1" id="10qwx" line="1" column="1" file="1">
   <tree node="term" type="10" domain="1" id="10qwy" line="1" column="1" file="1">
<tree node="'D'" type="19" domain="1" id="10qw5" literal="0" line="1" column="1" file="1"/>
<tree node="'['" type="20" domain="1" id="10qw6" literal="0" line="1" column="2" file="1"/>
<tree node="formula" type="1" domain="1" id="10qwt" line="1" column="4" file="1">
 <tree node="product" type="4" domain="1" id="10qws" line="1" column="4" file="1">
  <tree node="term" type="9" domain="1" id="10qwr" line="1" column="4" file="1">
   <tree node="'('" type="17" domain="1" id="10qw7" literal="0" line="1" column="4" file="1"/>
   <tree node="formula" type="3" domain="1" id="10qwp" line="1" column="5" file="1">
    <tree node="formula" type="2" domain="1" id="10qwk" line="1" column="5" file="1">
     <tree node="formula" type="1" domain="1" id="10qwf" line="1" column="5" file="1">
      <tree node="product" type="5" domain="1" id="10qwe" line="1" column="5" file="1">
       <tree node="product" type="4" domain="1" id="10qwa" line="1" column="5" file="1">
    <tree node="term" type="7" domain="1" id="10qw9" line="1" column="5" file="1">
     <tree node="VARIABLE" type="15" domain="1" id="10qw8" line="1" column="5" file="1">
      <literal>x</literal>
     </tree>
    </tree>
       </tree>
       <tree node="'*'" type="13" domain="1" id="10qwb" literal="0" line="1" column="7" file="1"/>
       <tree node="term" type="8" domain="1" id="10qwd" line="1" column="8" file="1">
    <tree node="NUMBER" type="16" domain="1" id="10qwc" literal="23" line="1" column="8" file="1"/>
       </tree>
      </tree>
     </tree>
     <tree node="'+'" type="11" domain="1" id="10qwg" literal="0" line="1" column="10" file="1"/>
     <tree node="product" type="4" domain="1" id="10qwj" line="1" column="12" file="1">
      <tree node="term" type="7" domain="1" id="10qwi" line="1" column="12" file="1">
       <tree node="VARIABLE" type="15" domain="1" id="10qwh" line="1" column="12" file="1">
    <literal>y</literal>
       </tree>
      </tree>
     </tree>
    </tree>
    <tree node="'-'" type="12" domain="1" id="10qwl" literal="0" line="1" column="13" file="1"/>
    <tree node="product" type="4" domain="1" id="10qwo" line="1" column="14" file="1">
     <tree node="term" type="7" domain="1" id="10qwn" line="1" column="14" file="1">
      <tree node="VARIABLE" type="15" domain="1" id="10qwm" line="1" column="14" file="1">
       <literal>z</literal>
      </tree>
     </tree>
    </tree>
   </tree>
   <tree node="')'" type="18" domain="1" id="10qwq" literal="0" line="1" column="15" file="1"/>
  </tree>
 </tree>
</tree>
<tree node="','" type="21" domain="1" id="10qwu" literal="0" line="1" column="16" file="1"/>
<tree node="VARIABLE" type="15" domain="1" id="10qwv" line="1" column="18" file="1">
 <literal>x</literal>
</tree>
<tree node="']'" type="22" domain="1" id="10qww" literal="0" line="1" column="19" file="1"/>
   </tree>
  </tree>
 </tree>
 <FileIndex>
  <File index="1">C:/DMS/Domains/Algebra/Tools/Lexer/TestCase/algebraformula.txt</File>
 </FileIndex>
 <DomainIndex>
  <Domain index="1">Algebra</Domain>
 </DomainIndex>
</DMSForest>
Exiting with final status 0

C:DMSDomainsAlgebraToolsParserSource>

如果你真的想要一个理解许多语法符号的引擎,那么使用DMS来构建这样的引擎可能是最简单的。 简单地将每种语法形式(例如,ANTLR或bison)定义为DSL到DMS,使用DMS解析特定的语法形式主义实例(例如,ANLTR bnf实例),应用DMS重写规则将其转换为DMS语法,然后构建一个DMS解析器。 (你也必须对词法分析器也这样做)。

链接地址: http://www.djcxy.com/p/43701.html

上一篇: Tool to parse by grammar to AST (or .y+.lang => xml)

下一篇: ANTLR (or alternative): decoupling parsing from evaluation