在动词/名词/形容词形式之间转换单词

2018-06-26 06:22:04

我想要一个能够翻译/转换不同词类的python库函数。有时它应该输出多个单词（例如，“编码器”和“代码”都是动词“代码”中的名词，其中一个是另一个的对象）

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我主要关心动词<=>名词，因为我想写一个记笔记的程序。即我可以写出“咖啡因拮抗A1”或“咖啡因是A1拮抗剂”，并且在一些NLP中，它可以发现它们意味着同样的事情。（我知道这并不容易，而且它需要NLP解析并且不仅仅是标记，但是我想破解一个原型）。

类似的问题......将形容词和副词转换成它们的名词形式（这个答案只能归结于根POS，我想要在POS之间）。

ps在语言学上称为转换http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

这更像是一种启发式的方法。我刚刚编码它的风格appologies。它使用来自wordnet的derivationally_related_forms（）。我已经实施了名词化。我猜verbates作品是类似的。从我测试过的东西中工作得很好：

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets 
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) 
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms 
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

我明白这不能回答你的整个问题，但它确实回答了很大一部分问题。我会检查出http://nodebox.net/code/index.php/Linguistics#verb_conjugation这个python库能够结合动词，并识别一个单词是动词，名词还是形容词。

示例代码

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

它也可以对单词进行分类。

print en.is_noun("banana")
>>> True

下载位于链接的顶部。

一种方法可能是使用带有他们的POS标签和字形映射的单词词典。如果您得到或创建了这样的字典（如果您有权访问任何传统字典的数据，所有字典列出字词的POS标签以及所有派生形式的基本形式，那么这很有可能），您可以使用类似下面的内容：

def is_verb(word):
    if word:
        tags = pos_tags(word)
        return 'VB' in tags or 'VBP' in tags or 'VBZ' in tags 
               or 'VBD' in tags or 'VBN' in tags:

def verbify(word):
    if is_verb(word):
        return word
    else:
       forms = []
       for tag in pos_tags(word):
           base = word_form(word, tag[:2])
           if is_verb(base):
              forms.append(base)
       return forms

链接地址: http://www.djcxy.com/p/73471.html

上一篇: Convert words between verb/noun/adjective forms

下一篇: NLTK extracting terms of chunker parse tree