Convert words between verb/noun/adjective forms
i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (eg "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object)
# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']
i mostly care about verbs <=> nouns, for a note taking program i want to write. ie i can write "caffeine antagonizes A1" or "caffeine is an A1 antagonist" and with some NLP it can figure out they mean the same thing. (i know that's not easy, and that it will take NLP that parses and doesn't just tag, but i want to hack up a prototype).
similar questions ... Converting adjectives and adverbs to their noun forms (this answer only stems down to the root POS. i want to go between POS.)
ps called Conversion in linguistics http://en.wikipedia.org/wiki/Conversion_%28linguistics%29
This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:
from nltk.corpus import wordnet as wn
def nounify(verb_word):
""" Transform a verb to the closest noun: die -> death """
verb_synsets = wn.synsets(verb_word, pos="v")
# Word not found
if not verb_synsets:
return []
# Get all verb lemmas of the word
verb_lemmas = [l for s in verb_synsets
for l in s.lemmas if s.name.split('.')[1] == 'v']
# Get related forms
derivationally_related_forms = [(l, l.derivationally_related_forms())
for l in verb_lemmas]
# filter only the nouns
related_noun_lemmas = [l for drf in derivationally_related_forms
for l in drf[1] if l.synset.name.split('.')[1] == 'n']
# Extract the words from the lemmas
words = [l.name for l in related_noun_lemmas]
len_words = len(words)
# Build the result in the form of a list containing tuples (word, probability)
result = [(w, float(words.count(w))/len_words) for w in set(words)]
result.sort(key=lambda w: -w[1])
# return all the possibilities sorted by probability
return result
I understand that this doesn't answer your whole question, but it does answer a large part of it. I would check out http://nodebox.net/code/index.php/Linguistics#verb_conjugation This python library is able to conjugate verbs, and recognize whether a word is a verb, noun, or adjective.
EXAMPLE CODE
print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives
It can also categorize words.
print en.is_noun("banana")
>>> True
The download is at the top of the link.
One approach may be to use a dictionary of words with their POS tags and a wordforms mapping. If you get or create such dictionary (which is quite possible if you have access to any conventional dictionary's data, as all the dictionaries list word's POS tags, as well as base forms for all derived forms), you can use something like the following:
def is_verb(word):
if word:
tags = pos_tags(word)
return 'VB' in tags or 'VBP' in tags or 'VBZ' in tags
or 'VBD' in tags or 'VBN' in tags:
def verbify(word):
if is_verb(word):
return word
else:
forms = []
for tag in pos_tags(word):
base = word_form(word, tag[:2])
if is_verb(base):
forms.append(base)
return forms
链接地址: http://www.djcxy.com/p/73472.html
上一篇: 如何在OpenNLP中实现一个好的代词解析器算法?
下一篇: 在动词/名词/形容词形式之间转换单词