如何从NLTK中的文本中提取关系

2018-06-03 05:59:20

您好，我试图根据第二个最后一个例子，从一串文本中提取关系：https：//web.archive.org/web/20120907184244/http：//nltk.googlecode.com/svn/trunk/doc /howto/relextract.html

从“出版商周刊”的迈克尔詹姆斯编辑等字符串中，我期望的结果是具有如下输出：

[PER：'Michael James']'，'ORG：'出版商周刊'的编辑]

做这件事的最好方法是什么？ extract_rels期望的格式是什么，以及如何格式化输入以满足该要求？

试图自己做，但它没有奏效。这是我从书中改编的代码。我没有得到任何打印结果。我究竟做错了什么？

class doc():
 pass

doc.headline = ['this is expected by nltk.sem.extract_rels but not used in this script']

def findrelations(text):
roles = """
(.*(                   
analyst|
editor|
librarian).*)|
researcher|
spokes(wo)?man|
writer|
,sofsthe?s*  # "X, of (the) Y"
"""
ROLES = re.compile(roles, re.VERBOSE)
tokenizedsentences = nltk.sent_tokenize(text)
for sentence in tokenizedsentences:
    taggedwords  = nltk.pos_tag(nltk.word_tokenize(sentence))
    doc.text = nltk.batch_ne_chunk(taggedwords)
    print doc.text
    for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ieer', pattern=ROLES):
        print relextract.show_raw_rtuple(rel) # doctest: +ELLIPSIS

文本=“迈克尔詹姆斯出版社周刊编辑”

findrelations（文本）

这里有一个基于你的代码（只有很少的调整），这很好用;）

import nltk
import re 
from nltk.chunk import ne_chunk_sents
from nltk.sem import relextract


def findrelations(text):
    roles = """
    (.*(                   
    analyst|
    editor|
    librarian).*)|
    researcher|
    spokes(wo)?man|
    writer|
    ,sofsthe?s*  # "X, of (the) Y"
    """
    ROLES = re.compile(roles, re.VERBOSE)

    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences = nltk.ne_chunk_sents(tagged_sentences)


    for doc in chunked_sentences:
        print doc
        for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ace', pattern=ROLES):
            #it is a tree, so you need to work on it to output what you want
            print relextract.show_raw_rtuple(rel) 

findrelations('Michael James editor of Publishers Weekly')

（S / PERSON迈克尔/ NNP）（PERSON詹姆斯/ NNP）编辑/ NN的/ IN（ORGANIZATION Publishers / NNS Weekly / NNP））

链接地址: http://www.djcxy.com/p/11179.html

上一篇: How to extract relationship from text in NLTK

下一篇: what does 'this' refer to in Javascript functions