How do I create a solr core with the data from an existing one?

Solr 1.4 Enterprise Search Server recommends doing large updates on a copy of the core, and then swapping it in for the main core. I am following these steps:

  • Create prep core: http://localhost:8983/solr/admin/cores?action=CREATE&name=prep&instanceDir=main
  • Perform index update, then commit/optimize on prep core.
  • Swap main and prep core: http://localhost:8983/solr/admin/cores?action=SWAP&core=main&other=prep
  • Unload prep core: http://localhost:8983/solr/admin/cores?action=UNLOAD&core=prep
  • The problem I am having is, the core created in step 1 doesn't have any data in it. If I am going to do a full index of everything and the kitchen sink, that would be fine, but if I just want to update a (large) subset of the documents - that's obviously not going to work.

    (I could merge the cores, but part of what I'm trying to do is get rid of any deleted documents without trying to make a list of them.)

    Is there some flag to the CREATE action that I'm missing? The Solr Wiki page for CoreAdmin is a little sparse on details.

    Possible Solution: Replication

    Someone on solr-user suggested using replication. To use it in this scenario would (to my understanding) require the following steps:

  • Create a new PREP core based off the config of the MAIN core
  • Change the config of the MAIN core to be a master
  • Change the config of the PREP core to be a slave
  • Cause/wait for a sync?
  • change the config of the PREP core to no longer be a slave
  • Perform index update, then commit/optimize on PREP core.
  • Swap PREP and MAIN cores
  • A simpler replication-based setup would be to configure a permanent PREP core that is always the master. The MAIN core (on as many servers as needed) could then be a slave of the PREP core. Indexing could happen on the PREP core as fast or as slow as necessary.

    Possible Solution: Permanent PREP core and double-updating

    Another idea I came up with was this (also involving a permanent PREP core):

  • Perform index update, then commit/optimize on PREP core.
  • Swap PREP and MAIN cores.
  • Re-perform index update, then commit/optimize on what is now the PREP core. It now has the same data as the MAIN core (in theory) and will be around, ready for the next index operation.

  • I created this idea of clone operation that does a filesystem copy of the indexes and config data, and then CREATEs a new one. There are some locking issues, and you have to have filesystem access to the indexes, but it did work. This does give you a nice copy that you can muck around with the config files.

    The more I think about it, you could CREATE a new core and then do this:

    Force a fetchindex on slave from master command : http://slave_host:port/solr/replication?command=fetchindex It is possible to pass on extra attribute 'masterUrl' or other attributes like 'compression' (or any other parameter which is specified in the tag) to do a one time replication from a master. This obviates the need for hardcoding the master in the slave.

    And populate the new one from the production one, then apply your updates, and then swap back!

    链接地址: http://www.djcxy.com/p/67032.html

    上一篇: 如何从apache solr的pdf内容中获取日期字符串

    下一篇: 如何使用现有数据创建solr内核?