Storing structured data in Lucene
I have seen many references pointing to the use of Lucene or Solr as a NoSQL data store, not just the indexing engine: NoSQL (MongoDB) vs Lucene (or Solr) as your database http://searchhub.org/2010/04/29/for-the-guardian-solr-is-the-new-database/
However, because Lucene only provides a "flat" document structure, where each field can be multi-value (scalar), I can't seem to fully understand how people are mapping complex structured data into Lucene for index and store. For example:
{
"firstName": "Joe",
"lastName": "Smith",
"addresses" : [
{
"type" : "home",
"line1" : "1 Main Street",
"city" : "New York",
},
{
"type" : "office",
"line1" : "P.O. Box 1234",
"zip:“10000”
}
]
}
Things can obviously get more complex. Ie what if the object has two collections: addresses and phone numbers? what if address itself has a collection?
I can think of two ways to map this two lucene "document":
Create a stored but not indexed field to store a JSON/BSON version of the object, and then create other index but don't store fields for indexing/searching.
Find a smart way to somehow fit the object into Lucene way of storing data. Ie use dot notation to flat the fields, use multi-value fields to store individual collection value and then somehow recreate the object on its way back...
I wonder if people have dealt with similar problems before and what solution have you used?
看看我的愚蠢的Lucene技巧:一种方法的层次结构。
It depends what the usage is. If you only need them for display, you can the complex value (addresses) as a JSON string and store it as multiple value field, if you need to use them as index, you can choose following struture:
"addresses_type": [ "home", "office" ], "addresses_line1": [ "1 Main Street", "P.O. Box 1234" ], "addresses_city": [ "New York", "" ], "addresses_zip": [ "", "10000" ]链接地址: http://www.djcxy.com/p/86394.html
上一篇: PHP mongo查找字段开头
下一篇: 在Lucene中存储结构化数据