从单词集中获取最可能的颜色

2018-06-05 18:19:12

是否有任何图书馆存在或可以让你找出词汇集最可能出现的颜色？例如，黄瓜，苹果，草，它给我绿色。以前有没有人在这个方向工作？

如果我必须这样做，我会尝试使用谷歌图片或其他字词搜索图片，并识别最常见的最常见的颜色。

这听起来像一个非常合理的NLP问题，并且通过map-reduce很容易处理。

确定您称为颜色的单词和短语列表['蓝'，'绿'，'红'，...]。在句子的大部分语料库中，对于提及特定颜色的句子，对于该句子中的其他单词，记下文件中的(word, color_name) 。（地图步骤）

然后对于你在你的语料库中看到的每一个单词，将你所看到的所有颜色汇总成{'cucumber': {'green': 300, 'yellow': 34, 'blue': 2}, 'tomato': {'red': 900, 'green': 430'}...} （减少步骤）

假如你使用足够大的语料库（类似维基百科），并且弄清楚如何修剪真正的小数字和稀有词汇，那么你应该能够制作相当全面和强大的字典，将数百万个项目映射到它们的颜色。

另一种方法是在谷歌中进行文本搜索，查找颜色和单词组合，并将结果数量最多的组合。这里有一个快速的Python脚本：

import urllib
import json
import itertools

def google_count(q):
      query = urllib.urlencode({'q': q})
      url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
      search_response = urllib.urlopen(url)
      search_results = search_response.read()
      results = json.loads(search_results)
      data = results['responseData']
      return int(data['cursor']['estimatedResultCount'])

colors = ['yellow', 'orange', 'red', 'purple', 'blue', 'green']

# get a list of google search counts
res = [google_count('"%s grass"' % c) for c in colors]
# pair the results with their corresponding colors
res2 = list(itertools.izip(res, colors))
# get the color with the highest score
print "%s is %s" % ('grass', sorted(res2)[-1][1])

这将打印：

grass is green

链接地址: http://www.djcxy.com/p/18139.html

上一篇: Get the most probable color from a words set

下一篇: How to find all input words in a given dictionary?