NLTK extracting terms of chunker parse tree

John Edward Grey started running now that he knows he is fat She was listening to smack that by that awful singer I want to extract interesting terms from a sentence. I currently use POS tagging to identify grammatical types of each entity. Then I update each token to a counter (with different weights for nouns, verbs and adjectives). I now wish to use a chunker for this. I think the lea

NLTK提取chunker解析树的术语

现在约翰爱德华格雷开始跑步 ,他知道他很胖 她是听由可怕的歌手 嫌那 我想从句子中提取有趣的术语。 我目前使用POS标记来识别每个实体的语法类型。 然后我将每个标记更新到一个计数器(对于名词,动词和形容词使用不同的权重)。 我现在希望为此使用一个chunker。 我认为分析树的叶节点包含所有有趣的单词和短语 。 我如何从chunker输出中提取术语? 在语言学中,“有趣的单词”被称为open class words 。 而你所指

Combining a Tokenizer into a Grammar and Parser with NLTK

I am making my way through the NLTK book and I can't seem to do something that would appear to be a natural first step for building a decent grammar. My goal is to build a grammar for a particular text corpus. (Initial question: Should I even try to start a grammar from scratch or should I start with a predefined grammar? If I should start with another grammar, which is a good one to star

将一个Tokenizer与NLTK组合成一个语法和解析器

我正在通过NLTK书的方式,我似乎无法做一些似乎是建立一个体面的语法的自然的第一步。 我的目标是为特定的文本语料库构建语法。 (最初的问题:我是否应该尝试从头开始语法,或者我应该从一个预定义的语法开始?如果我应该从另一个语法开始,这对于英语来说是一个好语法?) 假设我有以下简单的语法: simple_grammar = nltk.parse_cfg(""" S -> NP VP PP -> P NP NP -> Det N | Det N PP VP -> V NP | VP PP

Comparing lists containing NaNs

I am trying to compare two different lists to see if they are equal, and was going to remove NaNs, only to discover that my list comparisons still work, despite NaN == NaN -> False . Could someone explain why the following evaluate True or False , as I am finding this behavior unexpected. Thanks, I have read the following which don't seem to resolve the issue: Why in numpy nan == na

比较包含NaN的列表

我试图比较两个不同的列表以查看它们是否相等,并且要删除NaN,但发现列表比较仍然有效,尽管NaN == NaN -> False 。 有人可以解释为什么以下评估为True或False ,因为我发现这种行为出乎意料。 谢谢, 我已阅读以下内容,似乎无法解决问题: 为什么在numpy nan == nan是false而nan在nan是真的? 为什么NaN不等于NaN? [重复] (Python 2.7.3,numpy-1.9.2) 最后我用*标出了惊人的评价 >>> nan = np.

Python: sort function breaks in the presence of nan

sorted([2, float('nan'), 1]) returns [2, nan, 1] (At least on Activestate Python 3.1 implementation.) I understand nan is a weird object, so I wouldn't be surprised if it shows up in random places in the sort result. But it also messes up the sort for the non-nan numbers in the container, which is really unexpected. I asked a related question about max , and based on that I und

Python:在nan存在的情况下,sort函数会中断

sorted([2, float('nan'), 1])返回[2, nan, 1] (至少在Activestate Python 3.1实现上。) 我了解nan是一个奇怪的对象,所以如果它出现在排序结果的随机位置,我不会感到惊讶。 但它也混淆了容器中的非nan数字,这实在是出乎意料。 我问了一个关于max的相关问题,并基于这个我明白了为什么sort是这样的。 但是,这应该被视为一个错误? 文档只是说“返回一个新的排序列表[...]”,而没有指定任何细节。 编辑:

Calling external c++ template functions within Cython

I have a number of c++ template functions declared and implemented in a c++ header file and I want to access some of the functions within Cython. Suppose the c++ code is in header.hpp as follows template <class T> T doublit(T& x) { return 2*x; } What do I need to write in the .pyx file and in the setup.py file so that I can use the function in Python as >>> import mod

在Cython中调用外部c ++模板函数

我有一些c ++模板函数在c ++头文件中声明和实现,我想访问Cython中的一些函数。 假设c ++代码位于header.hpp ,如下所示 template <class T> T doublit(T& x) { return 2*x; } 我需要在.pyx文件和setup.py文件中写入什么内容才能使用Python中的函数 >>> import modname >>> print modname.doublit(3) 6 PS:是否可以在PYPY中访问相同的功能? 如果是,如何? 感谢您的帮助。 但当我

Python NameError: global name 'assertEqual' is not defined

I'm following Learn Python the Hard Way and I'm on Exercise 47 - Automated Testing (http://learnpythonthehardway.org/book/ex47.html) I am using Python3 (vs the book's use of Python 2.x) and I realize that assert_equals (which is used in the book) is deprecated. I am using assertEqual. I am trying to build a test case but for some reason, when using nosetests in cmd, I get the err

Python NameError:未定义全局名称'assertEqual'

我正在学习Python的难题,我正在练习47 - 自动测试(http://learnpythonthehardway.org/book/ex47.html) 我使用的是Python3(与本书使用的Python 2.x相同),我意识到assert_equals(本书中使用的)已被弃用。 我正在使用assertEqual。 我试图构建一个测试用例,但由于某些原因,在cmd中使用nosetests时,出现错误: NameError: global name 'assertEqual' is not defined 代码如下: from nose.tools import *

resources within a project directory targeting python 2.5.1

I have python .egg files that are stored in a relative location to some .py code. The problem is, I am targeting python 2.5.1 computers which require my project be self contained in a folder (hundreds of thousands of OLPC XO 8.2.1 release laptops running Sugar). This means I cannot just ./ez_install to perform a system-wide setuptools/pkg_resources installation. Example directory structure:

针对python 2.5.1的项目目录中的资源

我有python .egg文件存储在一些.py代码的相对位置。 问题是,我针对python 2.5.1计算机,这些计算机需要将我的项目自包含在一个文件夹中(数十万个运行Sugar的OLPC XO 8.2.1版本的笔记本电脑)。 这意味着我不能只用./ez_install来执行系统范围的setuptools / pkg_resources安装。 示例目录结构: My Application/ My Application/library1.egg My Application/libs/library2.egg My Application/test.py 我想知道如何从te

HOME Error with rpy2

I know there are quite a few posts on getting up and running with rpy2 on windows 7 32 bit. I have referenced a good number of them and attempted their solutions, including the use of PypeR . I dont explicitly have a R_HOME variable set in my path, but per this question, I confirmed that R is in my PATH (I can type R at the command line and get R to run) and even copied all of the files from t

HOME rpy2错误

我知道在Windows 7 32位上启动和运行rpy2的文章有很多。 我已经参考了其中很多人并尝试了他们的解决方案,包括使用PypeR 。 我没有明确地在我的路径中设置R_HOME变量,但是通过这个问题,我确认了R在我的PATH中(我可以在命令行键入R并让R运行),甚至从i386中复制所有文件文件夹父bin文件夹。 我的问题粘贴在下面。 有什么想法吗? In [5]: from rpy2 import robjects -------------------------------------------------

trivial sums of outer products without temporaries in numpy

The actual problem I wish to solve is, given a set of N unit vectors and another set of M vectors calculate for each of the unit vectors the average of the absolute value of the dot product of it with every one of the M vectors. Essentially this is calculating the outer product of the two matrices and summing and averaging with an absolute value stuck in-between. For N and M not too large this

没有临时装饰的外层产品的小数目

我希望解决的实际问题是,给定一组N个单位矢量,并且另一组M矢量针对每个单位矢量计算其与每个M矢量的点积的绝对值的平均值。 基本上,这是计算两个矩阵的外积,并求中值之间的绝对值求和和求平均值。 对于N和M不太大,这并不难,有很多方法可以继续(见下文)。 问题是,当N和M很大时,创建的临时对象是巨大的,并为所提供的方法提供了实际的限制。 这个计算可以完成而不需要创建临时对象吗? 我所遇到的主要困难是由于绝

What substitutes xreadlines() in Python 3?

In Python 2, file objects had an xreadlines() method which returned an iterator that would read the file one line at a time. In Python 3, the xreadlines() method no longer exists, and realines() still returns a list (not an iterator). Does Python 3 has something similar to xreadlines()? I know I can do for line in f: instead of for line in f.xreadlines(): But I would also like to use xrea

在Python 3中替代xreadlines()是什么?

在Python 2中,文件对象有一个xreadlines()方法,它返回一个可以一次读取一行文件的迭代器。 在Python 3中,xreadlines()方法不再存在,realines()仍然返回一个列表(不是迭代器)。 Python 3是否有类似于xreadlines()的东西? 我知道我能做到 for line in f: 代替 for line in f.xreadlines(): 但我也想使用没有for循环的xreadlines(): print(f.xreadlines()[7]) #read lines 0 to 7 and prints line 7 文