Parser Generator: How to use GPLEX and GPPG together?

After looking through posts for good C# parser generators, I stumbled across GPLEX and GPPG. I'd like to use GPLEX to generate tokens for GPPG to parse and create a tree (similar to the lex/yacc relationship). However, I can't seem to find an example on how these two interact together. With lex/yacc, lex returns tokens that are defined by yacc, and can store values in yylval. How is this done in GPLEX/GPPG (it is missing from their documentation)?

Attached is the lex code I would like to convert over to GPLEX:

%{
 #include <stdio.h>
 #include "y.tab.h"
%}
%%
[Oo][Rr]                return OR;
[Aa][Nn][Dd]            return AND;
[Nn][Oo][Tt]            return NOT;
[A-Za-z][A-Za-z0-9_]*   yylval=yytext; return ID;
%%

Thanks! Andrew


First: include the reference "QUT.ShiftReduceParser.dll" in your Project. It is provided in the download-package from GPLEX.

Sample-Code for Main-Program:

using System;
using ....;
using QUT.Gppg;
using Scanner;
using Parser;

namespace NCParser
{
class Program
{
    static void Main(string[] args)
    {
        string pathTXT = @"C:temptestFile.txt";
        FileStream file = new FileStream(pathTXT, FileMode.Open);
        Scanner scanner = new Scanner();
        scanner.SetSource(file, 0);
        Parser parser = new Parser(scanner);            
    }
}
}    

Sample-Code for GPLEX:

%using Parser;           //include the namespace of the generated Parser-class
%Namespace Scanner       //names the Namespace of the generated Scanner-class
%visibility public       //visibility of the types "Tokens","ScanBase","Scanner"
%scannertype Scanner     //names the Scannerclass to "Scanner"
%scanbasetype ScanBase   //names the Scanbaseclass to "ScanBase"
%tokentype Tokens        //names the Tokenenumeration to "Tokens"

%option codePage:65001 out:Scanner.cs /*see the documentation of GPLEX for further Options you can use */

%{ //user-specified code will be copied in the Output-file
%}

OR [Oo][Rr]
AND [Aa][Nn][Dd]
Identifier [A-Za-z][A-Za-z0-9_]*

%% //Rules Section
%{ //user-code that will be executed before getting the next token
%}

{OR}           {return (int)Tokens.kwAND;}
{AND}          {return (int)Tokens.kwAND;}
{Identifier}   {yylval = yytext; return (int)Tokens.ID;}

%% //User-code Section

Sample-Code for GPPG-input-file:

%using Scanner      //include the Namespace of the scanner-class
%output=Parser.cs   //names the output-file
%namespace Parser  //names the namespace of the Parser-class

%parsertype Parser      //names the Parserclass to "Parser"
%scanbasetype ScanBase  //names the ScanBaseclass to "ScanBase"
%tokentype Tokens       //names the Tokensenumeration to "Tokens"

%token kwAND "AND", kwOR "OR" //the received Tokens from GPLEX
%token ID

%% //Grammar Rules Section

program  : /* nothing */
         | Statements
         ;

Statements : EXPR "AND" EXPR
           | EXPR "OR" EXPR
           ;

EXPR : ID
     ;

%% User-code Section
// Don't forget to declare the Parser-Constructor
public Parser(Scanner scnr) : base(scnr) { }

c#parsegppggplex


I had a similar issue - not knowing how to use my output from GPLEX with GPPG due to an apparent lack of documentation. I think the problem stems from the fact that the GPLEX distribution includes gppg.exe along with gplex.exe, but only documentation for GPLEX.

If you go the GPPG homepage and download that distribution, you'll get the documentation for GPPG, which describes the requirements for the input file, how to construct your grammar, etc. Oh, and you'll also get both binaries again - gppg.exe and gplex.exe.

It almost seems like it would be simpler to just include everything in one package. It could definitely clear up some confusion, especially for those who may be new to lexical analysis (tokenization) and parsing (and may not be 100% familiar yet with the differences between the two).

So anyways, for those who may doing this for the first time:

GPLEX http://gplex.codeplex.com - used for tokenization/scanning/lexical analysis (same thing)

GPPG http://gppg.codeplex.com/ - takes output from a tokenizer as input to parse. For example, parsers use grammars and can do things a simple tokenizer cannot, like detect whether sets of parentheses match up.


Have you considered using Roslyn? (This isn't a proper answer but I don't have enough reputation to post this as a comment)

链接地址: http://www.djcxy.com/p/10828.html

上一篇: 可以将Bamboo CI Server配置为显示Jenkins这样的工作区吗?

下一篇: 解析器生成器:如何一起使用GPLEX和GPPG?