How to generate multiple parse trees for an ambiguous sentence in NLTK?

I have the following code in Python.

sent = [("very","ADJ"),("colourful","ADJ"),("ice","NN"),("cream","NN"),("van","NN")] 
patterns= r"""
  NP:{<ADJ>*<NN>+}  

"""
NPChunker=nltk.RegexpParser(patterns) # create chunk parser
for s in NPChunker.nbest_parse(sent):
    print s.draw()

The output is:

(S (NP very/ADJ colourful/ADJ ice/NN cream/NN van/NN))

But the output should have another 2 parse trees.

(S (NP very/ADJ colourful/ADJ ice/NN) (NP cream/NN) (NP van/NN))
(S (NP very/ADJ colourful/ADJ ice/NN cream/NN) van/NN)

The problem is that only the first regular expression is taken by the RegexpParser. How can I generate all possible parse trees at once?


This is not possible with the RegexpParser class. It inherits the nbest_parse method from the ParserI interface, and looking at the source code (https://github.com/nltk/nltk/blob/master/nltk/parse/api.py) it can be seen that it just defaults to running the parse method of the base class and returning that as an iterable.

As someone tried to explain in Chunking with nltk, the chunking classes are not the tool to use for this purpose (yet!), have a look at http://nltk.org/book/ch08.html, there are some quick examples, which would only take you halfway with what you want to achieve, necessitating a lot of pre-processing and smart design.

链接地址: http://www.djcxy.com/p/16330.html

上一篇: C#确保有效的枚举值

下一篇: 如何为NLTK中的歧义句子生成多个分析树?