the trick to nested structures in pyparsing
I am struggling to parse nested structures with PyParsing. I've searched many of the 'nested' example uses of PyParsing, but I don't see how to fix my problem.
Here is what my internal structure looks like:
texture_unit optionalName
{
texture required_val
prop_name1 prop_val1
prop_name2 prop_val1
}
and here is what my external structure looks like, but it can contain zero or more of the internal structures.
pass optionalName
{
prop_name1 prop_val1
prop_name2 prop_val1
texture_unit optionalName
{
// edit 2: showing use of '.' character in value
texture required_val.file.name optional_val // edit 1: forgot this line in initial post.
// edit 2: showing potentially multiple values
prop_name3 prop_val1 prop_val2
prop_name4 prop_val1
}
}
I am successfully parsing the internal structure. Here is my code for that.
prop_ = pp.Group(pp.Word(pp.alphanums+'_')+pp.Group(pp.OneOrMore(pp.Word(pp.alphanums+'_'+'.'))))
texture_props_ = pp.Group(pp.Literal('texture') + pp.Word(pp.alphanums+'_'+'.')) + pp.ZeroOrMore(prop_)
texture_ = pp.Forward()
texture_ << pp.Literal('texture_unit').suppress() + pp.Optional(pp.Word(pp.alphanums+'_')).suppress() + pp.Literal('{').suppress() + texture_props_ + pp.Literal('}').suppress()
Here is my attempt to parse the outer structure,
pass_props_ = pp.ZeroOrMore(prop_)
pass_ = pp.Forward()
pass_ << pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() + pp.Literal('{').suppress() + pass_props_ + pp.ZeroOrMore(texture_) + pp.Literal('}').suppress()
When I say: pass_.parseString( testPassStr )
I see errors in the console that "}" was expected.
I see this as very similar to the C struct example, but I'm not sure what is the missing magic. I'm also curious how to control the resulting data structure when using the nestedExpr.
There are two problems:
texture
literal as required in texture_unit
block, but there is no texture
in your second example. pass_props_
coincides with texture_unit optionalName
. After it, pp.Literal('}')
expects }
, but gives {
. This is the reason for the error. We can check it by changing the pass_
rule like this:
pass_ << pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() +
pp.Literal('{').suppress() + pass_props_
print pass_.parseString(s2)
It gives us follow output:
[['prop_name', ['prop_val', 'prop_name', 'prop_val', 'texture_unit', 'optionalName']]]
We can see that pass_props_
coincides with texture_unit optionalName
.
So, what we want to do: prop_
can contains alphanums
, _
and .
, but can not match with texture_unit
literal. We can do it with regex
and negative lookahead:
prop_ = pp.Group( pp.Regex(r'(?!texture_unit)[a-z0-9_]+')+ pp.Group(pp.OneOrMore(pp.Regex(r'(?!texture_unit)[a-z0-9_.]+'))) )
Finally, working example will look like this:
import pyparsing as pp
s1 = '''texture_unit optionalName
{
texture required_val
prop_name prop_val
prop_name prop_val
}'''
prop_ = pp.Group( pp.Regex(r'(?!texture_unit)[a-z0-9_]+')+ pp.Group(pp.OneOrMore(pp.Regex(r'(?!texture_unit)[a-z0-9_.]+'))) )
texture_props_ = pp.Group(pp.Literal('texture') + pp.Word(pp.alphanums+'_'+'.')) + pp.ZeroOrMore(prop_)
texture_ = pp.Forward()
texture_ = pp.Literal('texture_unit').suppress() + pp.Word(pp.alphanums+'_').suppress() +
pp.Literal('{').suppress() + pp.Optional(texture_props_) + pp.Literal('}').suppress()
print texture_.parseString(s1)
s2 = '''pass optionalName
{
prop_name1 prop_val1.name
texture_unit optionalName1
{
texture required_val1
prop_name2 prop_val12
prop_name3 prop_val13
}
texture_unit optionalName2
{
texture required_va2l
prop_name2 prop_val22
prop_name3 prop_val23
}
}'''
pass_props_ = pp.ZeroOrMore(prop_ )
pass_ = pp.Forward()
pass_ = pp.Literal('pass').suppress() + pp.Optional(pp.Word(pp.alphanums+'_'+'.')).suppress() +
pp.Literal('{').suppress() + pass_props_ + pp.ZeroOrMore(texture_ ) + pp.Literal('}').suppress()
print pass_.parseString(s2)
Output:
[['texture', 'required_val'], ['prop_name', ['prop_val', 'prop_name', 'prop_val']]]
[['prop_name1', ['prop_val1.name']], ['texture', 'required_val1'], ['prop_name2', ['prop_val12', 'prop_name3', 'prop_val13']], ['texture', 'required_va2l'], ['prop_name2', ['prop_val22', 'prop_name3', 'prop_val23']]]
The answer I was looking for is related to the use of the 'Forward' parser, shown in the Cstruct example (linked in OP).
The hard part of defining grammar for nested strcture is to define all the possible member types of the structure, which needs to include the structure itself, which is still not defined.
The "trick" to defining the pyparsing grammar for a nested structure is to delay the definition of the structure, but include a "forward declared" version of the structure when defining the structure members, so the members can also include a structure. Then complete the structure grammar as a list of members.
struct = Forward()
member = blah | blah2 | struct
struct << ZeroOrMore( Group(member) )
This is also discussed over here: Pyparsing: Parsing semi-JSON nested plaintext data to a list
The OP (mine) described test data and grammar that was not specific enough and matched when it should have failed. @NorthCat correctly spotted the undesired matches in the grammar. However, the suggestion to define many 'negative lookaheads' seemed unmanageable.
Instead of defining what should not match, my solution instead explicitly listed the possible matches. The matches were member keywords, using 'oneOf('list of words separated by space'). Once I specified all the possible matches, I realized my structure was not a nested structure, but actually a structure with finite depth and different grammars described each depth. So, my member definition did not require the Forward declaration trick.
The terminator of my member definitions was different than in the Cstruct example. Instead of terminating with a ';' (semi-colon) like in C++, my member definitions needed to terminate at the end of the line. In pyparsing, you can specify the end of the line with 'LineEnd' parser. So, I defined my members as a list of values NOT including the 'LineEnd', like this, notice the use of the "Not" (~) operator in the last definition:
EOL = LineEnd().suppress()
ident = Word( alphas+"_", alphanums+"_$@#." )
integer = Word(nums)
real = Combine(Optional(oneOf('+ -')) + Word(nums) + '.' + Optional(Word(nums)))
propVal = real | integer | ident
propList = Group(OneOrMore(~EOL + propVal))
链接地址: http://www.djcxy.com/p/79282.html
上一篇: 在mplot3D中设置zlim
下一篇: 在pyparsing中嵌套结构的技巧