解析大型NTriples文件Python

2018-06-23 11:42:33

我正尝试使用解析Python中的大型RDF的代码解析一个相当大的NTriples文件

我安装了猛禽和Python的红土绑定。

import RDF
parser=RDF.Parser(name="ntriples") #as name for parser you can use ntriples, turtle, rdfxml, ...
model=RDF.Model()
stream=parser.parse_into_model(model,"file:./mybigfile.nt")
for triple in model:
    print triple.subject, triple.predicate, triple.object

然而，程序挂起，我怀疑它试图将整个文件加载到内存或其他东西，因为它不会立即开始。

任何人都知道如何解决这个问题？

这很慢，因为您正在读入内存中的存储（RDF.Model（）默认值），该存储没有索引。所以它变得越来越慢。 N-Triples的解析不会从文件中流出，它不会将它全部吸收到内存中。

有关存储模型的概述，请参阅Redland存储模块文档。在这里你可能需要存储type '哈希'和hash-type内存。

s = RDF.HashStorage("abc", options="hash-type='memory'")
model = RDF.Model(s)

（未测试）

链接地址: http://www.djcxy.com/p/65851.html

上一篇: Parsing large NTriples File Python

下一篇: Read Disconected Graph in igraph for python