Flex newline scanning for bison

2018-06-14 16:16:52

I'd like to use the same flex/bison scanner/parser for an interpreter and for loading a file to be interpreted. I can not get the newline parsing to work correctly in both cases.

Interpreter: There is a prompt and I can enter commands terminated by pressing ENTER.

File: Here is an example input file:

-----cut---------

begin(
    print("well done"), 1)

----cut-------

So, there is a newline in the first line and after the '(' that should be eaten.

In my scanner.l I have

%%
[ t]                       {   errorLineCol += strlen(yytext); }

n                          {   errorLineNumber++;
                                errorLineCol = 0; }

("-"?[0-9])[0-9]*           {   errorLineCol += strlen(yytext);
                                yylval = stringToInteger(yytext);
                                return TINTEGER; }

.....

This then works for the file scenario but not for the interpreter. I the have to press and additional Ctrl+D after the ENTER. If I change to

n                          {   errorLineNumber++;
                                errorLineCol = 0;
                                return 0; }

Then the interpreter works but not the file reading; which then stops after the first newline it encounters. What is a good way to tackle this issue?

Edit:

Here is the top level of the parser:

input: uexpr                        {   parseValue = $1; }
    | /* empty */                   {   parseValue = myNull; }
    | error                         {   parseValue = myNull; }
    ;

uexpr: list                          
    | atom                         
    ;

Possible Solution: seems to be to use

n                          {   errorLineNumber++;
                                errorLineCol = 0;
                                if (yyin == stdin) return 0; }

The main problem is that your parser function ypparse does not return until it reduces the entire language to the start symbol.

If the top level of your grammar is something like:

language : commands ;

commands : command commands | /* empty */ ;

of course the machine will expect a complete script (terminated by you hitting Ctrl-D). If your interpreter is this logic:

loop:
  print("prompt>")
  yyparse()
  if (empty statement)
    break

it won't work since yyparse is consuming the whole script before returning.

The return 0; solves the problem for this interactive mode because the token value 0 indicates EOF to the parser, making it think the script has ended.

I do not agree with the solution of making n a token. It will only complicate the grammar (a hitherto insignificant piece of whitespace is now significant) and ultimately not work because the yyparse function will still want to process the complete grammar. That is to say, if you have newline as a token, but the grammar's start symbol represents the entire script, yyparse will still not return to your interactive prompt loop.

A quick and dirty hack is to let the lexer know whether interactive mode is in effect. Then it can conditionaly return 0; for every instance of a newline if it is in interactive mode. If the input isn't a complete statement, there will be a syntax error since the script as a whole ends at the newline. In normal file reading mode, your lexer can eats all whitespace without returning, as before allowing the whole file to be processed with a single yyparse .

If you want interactive input and file reading without implementing two modes of behavior in the lexer, what you can do is change the grammar so it only parses one statement of the language: the yyparse function returns for every top level statement of your language. (And the lexer eats newlines like before, no returning 0). Ie the start symbol of the grammar is just one statement (possibly empty). Then your file parser must be implemented as a loop (written by you) which calls yyparse to get all the statements from the file until yyparse encounters an empty input. The downside of this approach is that if the user types incomplete syntax (eg dangling open parenthesis), the parser will keep scanning the input until it is satisfied. This is unfriendly, like programs that use scanf for interactive user input (it's the same problem: scanf is a parser that doesn't return until it is satisified).

Another possibility is to have an interactive mode which performs its own user input rather than calling yyparse to get the input and parse it. In this mode, you read the user's input into a line buffer. Then you have the parser process the line buffer. To process a line buffer instead of a FILE * stream is perfectly possible. You just have to write custom input handling (your own definition of the YY_INPUT macro). This is the approach you will end up needing anyway if you implement a decent interactive mode with line editing and history recall, eg using libedit or GNU readline .

If pressing ENTER terminates a command then the lexer should return a token for n. Returning 0 tells the parser the input source is complete (end of file for a file or ^D for a terminal). Add an end-of-line token to your grammar and have the lexer return that when it sees n.

ETA: But don't forget to handle the case of the last line not ending in ENTER. Have your lexer return an end-of-line token at the end of file unless the last character is n.

链接地址: http://www.djcxy.com/p/41738.html

上一篇: 数字灵活和野牛

下一篇: 灵活的新线扫描野牛