Comparing sentences according to their meaning
Python provides the NLTK
library which is a vast resource of text and corpus, along with a slew of text mining and processing methods. Is there any way we can compare sentences based on the meaning they convey for a possible match? That is, an intelligent sentence matcher?
For example, a sentence like giggling at bad jokes
and I like to laugh myself silly at poor jokes
. Both convey the same meaning, but the sentences don't remotely match (words are different, Levenstein Distance
would fail badly!).
Now imagine we have an API which exposes functionality such as found here. So based on that, we have mechanisms to find out that the word giggle
and laugh
do match in the meaning they convey. Bad
won't match up to poor
, so we may need to add further layers (like they match in the context of words like joke
, since bad joke
is generally same as poor joke
, although bad person
is not same as poor person
!).
A major challenge would be to discard stuff that don't much alter the meaning of the sentence. So, the algorithm should return the same degree of matchness between the the first sentence and this: I like to laugh myself silly at poor jokes, even though they are completely senseless, full of crap and serious chances of heart-attack!
So with that available, is there any algorithm like this that has been conceived yet? Or do I have to invent the wheel?
You will need a more advanced topic modeling algorithm, and of course some corpora to train your model, so that you can easily handle synonyms like giggle and laugh !
In python, you can try this package : http://radimrehurek.com/gensim/ I never used it but it includes classic semantic vector spaces methods like lsa/lsi, random projection and even lda.
My personal favourite is random projection, because it is faster and still very efficient (I'm doing it in java with another library though).
链接地址: http://www.djcxy.com/p/69320.html上一篇: 在PHP中封装匿名函数中的变量
下一篇: 根据它们的含义来比较句子