Using ANTLR and Java to create a data binding code generator

I would create a data binding code generator for a specified programming language and for a specified serialization format: given a specification for the structure of data to be serialized or deserialized, the intended code generator should generate the classes (in the specified programming language) that represent the given vocabulary as well as the methods for serialization and deserialization using the specified format. The intended code generator could require the following inputs:

  • the target programming language, that is the programming language for generating the code;
  • the target serialization format, that is the serialization format for the data;
  • the specification of the structure of data to be serialized or deserialized.
  • Since initially I would like to create a simple code generator, the first version of this software could require only define the specification of the structure of data to be serialized or deserialized, so I choose C# as target programming language and XML as target serialization format. Essentially, the intended code generator should be a Java software which reads the specification of the structure of data to be serialized or deserialized (this specification must be written in according to a given grammar), and generates the C# classes that represent the given vocabulary: these classes should have the methods for serialization and deserialization in XML format. The purpose of the intended code generator is to generate one or more classes, so that they could be embedded in a C# project.

    Regarding the specification of the structure of data to be serialized or deserialized, it could be defined as in the following example:

    simple type Message: int id, string content
    

    Given the specification in the above example, the intended code generator could generate the following C# class:

    public class Message
    {
        public int Id { get; set; }
    
        public string Content { get; set; }
    
        public byte[] Serialize()
        {
            // ...
        }
    
        public void Deserialize(byte[] data)
        {
            // ...
        }
    }
    

    I read about ANTLR and I believe that this tool is perfect for the just explained purpose. As explained in this answer, I should first create a grammar for the specification of the structure of data to be serialized or deserialized.

    The above example is very simple, because it defines only a simple type, but the specification of the structure of data could be more complex, so we could have a compound type which includes one or more simple types, or lists, etc., like in the following example:

    simple type LogInfo: DateTime time, String message
    simple type LogSource: String class, String version
    compound type LogEntry: LogInfo info, LogSource source
    

    Moreover, the specification of the data could include also one or more constraints, like in the following example:

    simple type Message: int id (constraint: not negative), string content
    

    In this case, the intended code generator could generate the following C# class:

    public class Message
    {
        private int _id;
        private string _content;
    
        public int Id
        {
            get { return _id; }
            set
            {
                if (value < 0)
                    throw new ArgumentException("...");
    
                _id = value;
            }
        }
    
        public string Content
        {
            get { return _content; }
            set { _content = value; }
        }
    
        public byte[] Serialize()
        {
            // ...
        }
    
        public void Deserialize(byte[] data)
        {
            // ...
        }
    }
    

    Essentially, the intended code generator should find all user-defined types, any constraints, etc .. Is there some simple example?


    Always a good starting point is the example grammars in the Antl4 repo. Simple grammars, like abnf, json, and less, might provide relevant starting points for your specification grammar. More complex grammars, like the several sql grammars, can give insights into how to handle more difficult or involved specification constructs -- each line of your specification appears broadly analogous to an sql statement.

    Of course, Antlr 4 -- both its grammar and implementation -- is the most spot on example of reading a specification and generating a derived source output.


    If you want to look at an opensource data interchange system with roughly the characteristics you are proposing (multi-platform, multi-language, data definition language), you could do worse than looking at Google's Protocol Buffers, more commonly known as protobuf.

    The data description language's compiler is not, unfortunately, generated from a grammar; but it is a relatively readable recursive-descent parser written in C++. Code generators for several languages are included, and many more are available.

    An interesting feature is the interchange format can be described in itself. In addition, it is possible to code and decode data based on the description of the interchange format, so it is also possible to interchange format descriptions and use them ad hoc without the need of code generation. (This is less efficient, obviously, but is nonetheless often useful.)

    链接地址: http://www.djcxy.com/p/43712.html

    上一篇: 测试语法的歧义

    下一篇: 使用ANTLR和Java创建数据绑定代码生成器