Chunk grammar doesn't read commas

2018-07-02 21:54:59

from nltk.chunk.util import tagstr2tree
from nltk import word_tokenize, pos_tag
text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
tagged_text = pos_tag(text.split())

grammar = "NP:{<NNP>+}"

cp = nltk.RegexpParser(grammar)
result = cp.parse(tagged_text)

print(result)

Output:

(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin./NNP)
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas/NNP ,Nike/NNP ,Reebok/NNP Center./NNP))

The grammar i use for chunking only works on nnp tags but if words are sequential with commas they will still on the same line.I want my chunk like this:

(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin./NNP)
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas,/NNP)
  (NP Nike,/NNP)
  (NP Reebok/NNP Center./NNP))

What should i write in the "grammar=" or can i edit the output like i wrote above?As you can see i only parse proper nouns for my named entity project pls help me out.

使用word_tokenize(string)而不是string.split() ：

>>> import nltk
>>> from nltk.chunk.util import tagstr2tree
>>> from nltk import word_tokenize, pos_tag
>>> text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
>>> tagged_text = pos_tag(word_tokenize(text))
>>> 
>>> grammar = "NP:{<NNP>+}"
>>> 
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(tagged_text)
>>> 
>>> print(result)
(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin/NNP)
  ./.
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas/NNP)
  ,/,
  (NP Nike/NNP)
  ,/,
  (NP Reebok/NNP Center/NNP)
  ./.)

链接地址: http://www.djcxy.com/p/91724.html

上一篇: 如何提取数字（连同比较形容词或范围）

下一篇: 块语法不读取逗号