Convert words between verb/noun/adjective forms

i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (eg "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

i mostly care about verbs <=> nouns, for a note taking program i want to write. ie i can write "caffeine antagonizes A1" or "caffeine is an A1 antagonist" and with some NLP it can figure out they mean the same thing. (i know that's not easy, and that it will take NLP that parses and doesn't just tag, but i want to hack up a prototype).

similar questions ... Converting adjectives and adverbs to their noun forms (this answer only stems down to the root POS. i want to go between POS.)

ps called Conversion in linguistics http://en.wikipedia.org/wiki/Conversion_%28linguistics%29


This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets 
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) 
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms 
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

I understand that this doesn't answer your whole question, but it does answer a large part of it. I would check out http://nodebox.net/code/index.php/Linguistics#verb_conjugation This python library is able to conjugate verbs, and recognize whether a word is a verb, noun, or adjective.

EXAMPLE CODE

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

It can also categorize words.

print en.is_noun("banana")
>>> True

The download is at the top of the link.


One approach may be to use a dictionary of words with their POS tags and a wordforms mapping. If you get or create such dictionary (which is quite possible if you have access to any conventional dictionary's data, as all the dictionaries list word's POS tags, as well as base forms for all derived forms), you can use something like the following:

def is_verb(word):
    if word:
        tags = pos_tags(word)
        return 'VB' in tags or 'VBP' in tags or 'VBZ' in tags 
               or 'VBD' in tags or 'VBN' in tags:

def verbify(word):
    if is_verb(word):
        return word
    else:
       forms = []
       for tag in pos_tags(word):
           base = word_form(word, tag[:2])
           if is_verb(base):
              forms.append(base)
       return forms
链接地址: http://www.djcxy.com/p/73472.html

上一篇: 如何在OpenNLP中实现一个好的代词解析器算法?

下一篇: 在动词/名词/形容词形式之间转换单词