Grammar for a recognizer of a spice

2018-06-15 09:40:41

I'm trying to build a grammar for a recognizer of a spice-like language using Antlr-3.1.3 (I use this version because of the Python target). I don't have experience with parsers. I've found a master thesis where the student has done the syntactic analysis of the SPICE 2G6 language and built a parser using the LEX and YACC compiler writing tools. (http://digitool.library.mcgill.ca/R/?func=dbin-jump-full&object_id=60667&local_base=GEN01-MCG02) In chapter 4, he describes a grammar in Backus-Naur form for the SPICE 2G6 language, and appends to the work the LEX and YACC code files of the parser.

I'm basing myself in this work to create a simpler grammar for a recognizer of a more restrictive spice language.

I read the Antlr manual, but could not figure out how to solve two problems, that the code snippet below illustrates.

grammar Najm_teste;

resistor
    :   RES NODE NODE VALUE 'G2'? COMMENT? NEWLINE
    ;

// START:tokens
RES :   ('R'|'r') DIG+;

NODE    :   DIG+;   // non-negative integer
VALUE   :   REAL;   // non-negative real

fragment
SIG :   '+'|'-';
fragment
DIG :   '0'..'9';
fragment
EXP :   ('E'|'e') SIG? DIG+;
fragment
FLT :   (DIG+ '.' DIG+)|('.' DIG+)|(DIG+ '.');
fragment
REAL    :   (DIG+ EXP?)|(FLT EXP?);

COMMENT :   '%' ( options {greedy=false;} : . )* NEWLINE;
NEWLINE :   'r'? 'n';
WS  :   (' '|'t')+ {$channel=HIDDEN;};
// END:tokens

In the grammar above, the token NODE is a subset of the set represented by the VALUE token. The grammar correctly interprets an input like "R1 5 0 1.1/n", but cannot interpret an input like "R1 5 0 1/n", because it maps "1" to the token NODE, instead of mapping it to the token VALUE, as NODE comes before VALUE in the tokens section. Given such inputs, does anyone has an idea of how can I map the "1" to the correct token VALUE, or a suggestion of how can I alter the grammar so that I can correctly interpret the input?

The second problem is the presence of a comment at the end of a line. Because the NEWLINE token delimits: (1) the end of a comment; and (2) the end of a line of code. When I include a comment at the end of a line of code, two newline characters are necessary to the parser correctly recognize the line of code, otherwise, just one newline character is necessary. How could I improve this?

Thanks!

Problem 1

The lexer does not "listen" to the parser. The lexer simply creates tokens that contain as much characters as possible. In case two tokens match the same amount of characters, the token defined first will "win". In other words, "1" will always be tokenized as a NODE , even though the parser is trying to match a VALUE .

You can do something like this instead:

resistor
    :   RES NODE NODE value 'G2'? COMMENT? NEWLINE
    ;

value : NODE | REAL;

// START:tokens
RES  : ('R'|'r') DIG+;    
NODE : DIG+;
REAL : (DIG+ EXP?) | (FLT EXP?);

...

Eg, I removed VALUE , added value and removed fragment from REAL

Problem 2

Do not let the comment match the line break:

COMMENT :   '%' ~('r' | 'n')*;

where ~('r' | 'n')* matches zero or more chars other than line break characters.

链接地址: http://www.djcxy.com/p/43710.html

上一篇: 使用ANTLR和Java创建数据绑定代码生成器

下一篇: 用于识别香料的语法