Extract alias from Freebase dump

I have downloaded the Freebase dump from https://developers.google.com/freebase/data?hl=en, but I am confused about the relation of the file.

I know the format of the dump is <subject> <predicate> <object> . . If I want to extract the alias subset of Freebase, like http://www.freebase.com/common/topic/alias?instances&lang=en, how can I do for this? I have tried to filter the lines that contains the mid or '/common/topic/alias', but the result is not what I want.

Is there any library to parse Freebase? Thanks!

Follow up:

I have two more questions.

  • Is there a list that shows all the namespace in freebase? (eg type.object.name is the name of object)
  • How can extract all the 'type of (IS A)' relations? (eg C++ IS A programming language)

  • The Freebase data dump is RDF, so any RDF parsing library should work, but zgrep would be a lot quicker. One little twist is that the predicate for the Freebase property /common/topic/alias is <http://rdf.freebase.com/ns/common.topic.alias> with the slashes converted to periods/dots.

    To filter just the English aliases, you can use a command like:

    $ zgrep -E "common.topic.alias>.*@ent.$" freebase-rdf-2015-04-19-00-00.gz
    

    Which will give you output looking like:

    <http://rdf.freebase.com/ns/m.0100c5g>  <http://rdf.freebase.com/ns/common.topic.alias> "Pulska yo"@en  .
    <http://rdf.freebase.com/ns/m.0101107q> <http://rdf.freebase.com/ns/common.topic.alias> "Unforgiven 2002"@en    .
    <http://rdf.freebase.com/ns/m.01016v4g> <http://rdf.freebase.com/ns/common.topic.alias> "Ain't Nuthin' But A "G" Thang, Rene"@en  .
    ...
    

    If you want aliases in all languages, you can just use:

    $ zgrep -E "common.topic.alias>" freebase-rdf-2015-04-19-00-00.gz
    
    链接地址: http://www.djcxy.com/p/64434.html

    上一篇: 从已弃用的freebase中查找所有实体名称

    下一篇: 从Freebase转储中提取别名