Storing structured data in Lucene

I have seen many references pointing to the use of Lucene or Solr as a NoSQL data store, not just the indexing engine: NoSQL (MongoDB) vs Lucene (or Solr) as your database http://searchhub.org/2010/04/29/for-the-guardian-solr-is-the-new-database/

However, because Lucene only provides a "flat" document structure, where each field can be multi-value (scalar), I can't seem to fully understand how people are mapping complex structured data into Lucene for index and store. For example:

{
"firstName": "Joe",
"lastName": "Smith",
"addresses" : [
    {
        "type" : "home", 
        "line1" : "1 Main Street",
        "city" : "New York",
    },
    {
        "type" : "office",
        "line1" : "P.O. Box 1234",
        "zip:“10000”
    }
]
}

Things can obviously get more complex. Ie what if the object has two collections: addresses and phone numbers? what if address itself has a collection?

I can think of two ways to map this two lucene "document":

  • Create a stored but not indexed field to store a JSON/BSON version of the object, and then create other index but don't store fields for indexing/searching.

  • Find a smart way to somehow fit the object into Lucene way of storing data. Ie use dot notation to flat the fields, use multi-value fields to store individual collection value and then somehow recreate the object on its way back...

  • I wonder if people have dealt with similar problems before and what solution have you used?


    看看我的愚蠢的Lucene技巧:一种方法的层次结构。


    It depends what the usage is. If you only need them for display, you can the complex value (addresses) as a JSON string and store it as multiple value field, if you need to use them as index, you can choose following struture:

    
        "addresses_type": [
        "home",
        "office"
        ],
        "addresses_line1": [
        "1 Main Street",
        "P.O. Box 1234"
        ],
        "addresses_city": [
        "New York",
        ""
        ],
        "addresses_zip": [
        "",
        "10000"
        ]
    
    
    链接地址: http://www.djcxy.com/p/86394.html

    上一篇: PHP mongo查找字段开头

    下一篇: 在Lucene中存储结构化数据