Apache Solr Index Bechmarking
I recently started playing around with Apache Solr and currently trying to figure out the best way to benchmark the indexing of a corpus of XML documents. I am basically interested in the throughput (documents indexed/second) and index size on disk.
I am doing all this on Ubuntu.
Benchmarking Technique
* Run the following 5 times& get average total time taken *
curl http://localhost:8983/solr/core/dataimport?command=full-import
] curl http://localhost:8983/solr/core/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8'
curl http://localhost:8983/solr/core/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8'
] curl http://localhost:8983/solr/w5/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
curl http://localhost:8983/solr/w5/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
] Questions
QTime
and Time taken
values. * XML Response Used to Get Throughput *
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">w5-data-config.xml</str>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Total Requests made to DataSource">0</str>
<str name="Total Rows Fetched">3200</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-12-11 14:06:19</str>
<str name="">Indexing completed. Added/Updated: 1600 documents. Deleted 0 documents.</str>
<str name="Total Documents Processed">1600</str>
<str name="Time taken">0:0:10.233</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
To question 1:
I would suggest you should try to index more than 1 XML (with different dataset) file and compare the given results. Thats the way you will know if it´s ok to simply divide the taken time with your number of documents.
To question 2:
I didn´t find any of these tools, I did it by my own by developing a short Java application
To question 3:
Which approach you mean? I would link to my answer to question 1...
To question 4:
The size of the index folder gives you the correct size of the whole index, why don´t you want to use it?
To question 5:
The results you get in the posted XML is transfered through a XSL file. You can find it in the /bin/solr/conf/xslt folder. You can look up what the termes exactly means AND you can write your own XSL to display the results and informations. Note: If you create a new XSL file, you have to change the settings in your solrconfig.xml. If you don´t want to make any changes, edit the existing file.
edit: I think the difference is, that the Qtime is the rounded value of the taken time value. There are only even numbers in Qtime.
Best regards
链接地址: http://www.djcxy.com/p/67036.html