Extract alias from Freebase dump
I have downloaded the Freebase dump from https://developers.google.com/freebase/data?hl=en, but I am confused about the relation of the file.
I know the format of the dump is <subject> <predicate> <object> .
. If I want to extract the alias subset of Freebase, like http://www.freebase.com/common/topic/alias?instances&lang=en, how can I do for this? I have tried to filter the lines that contains the mid or '/common/topic/alias', but the result is not what I want.
Is there any library to parse Freebase? Thanks!
Follow up:
I have two more questions.
type.object.name
is the name of object) The Freebase data dump is RDF, so any RDF parsing library should work, but zgrep
would be a lot quicker. One little twist is that the predicate for the Freebase property /common/topic/alias
is <http://rdf.freebase.com/ns/common.topic.alias>
with the slashes converted to periods/dots.
To filter just the English aliases, you can use a command like:
$ zgrep -E "common.topic.alias>.*@ent.$" freebase-rdf-2015-04-19-00-00.gz
Which will give you output looking like:
<http://rdf.freebase.com/ns/m.0100c5g> <http://rdf.freebase.com/ns/common.topic.alias> "Pulska yo"@en .
<http://rdf.freebase.com/ns/m.0101107q> <http://rdf.freebase.com/ns/common.topic.alias> "Unforgiven 2002"@en .
<http://rdf.freebase.com/ns/m.01016v4g> <http://rdf.freebase.com/ns/common.topic.alias> "Ain't Nuthin' But A "G" Thang, Rene"@en .
...
If you want aliases in all languages, you can just use:
$ zgrep -E "common.topic.alias>" freebase-rdf-2015-04-19-00-00.gz
链接地址: http://www.djcxy.com/p/64434.html
下一篇: 从Freebase转储中提取别名