MongoDb and NoSQL Databases Comparision (working with XML Documents)
I am working on a project using Java and Spring 3. There is a new task for me. There will be Xml files and I get that files and convert them into Objects. After that I will put them into a database.
The main topic for me to examine nosql databases. CouchDb
and MongoDb
are the databases I should search. I will make search
on that objects(one of the index type will be date and I will make date between selects) at database. Performance
is so important for me and
I will work on a huge data
thats why I should search nosql databases.
What do you suggest according to my scenario, what are pros/cons of them and which one I should choose and why?
I searched and see that Couch DB uses a REST API and Mongo DB uses drivers and it is performance plus for Mongo according to here: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
However Couch DB uses replication a way to scale(is it a performance plus?)
Also I realize that there are BaseX and eXist. According to my need what do you suggest did anyone worked with them?
PS:Also I will get XML files as like logs. They will not change and I won't manipulate data on it.
This is a pretty big question but I will do my best to tackle it. A company I work for was making the change from developing our applications with Mysql to NoSQL and i was the lead on the first NoSQL database, we were deciding which NoSQL database to work with. I was between MongoDB, CouchDB and Cassandra. One important factor I had to look at was, how easy will it be to write base line functions to work with the database so u don't have to understand what is going on but still able to execute querys and so on. The issue with cassandra was there API was super low level and would take some time to write a solid high level interface and we did not have that kind of time. The issue with couchdb was the REST service. Since we were already connecting to our inhouse api using rest it would have been a double rest service. REST generally goes over http and there is a fair amount of over head for http to be as easy to work with has it is. And that over head adds time to loading information. So we took mongodb for that reason and many other reasons. Also since its a driver it is developed to work with the programming language which is great if your language is supported sucks if its not. Since Java is supported by mongodb then its fine.
I would recommend converting the XML files in to objects and then storing the objects in mongo. so each XML file would be embedded mongodocuments the great thing about mongo is you can search embedded documents and u can index them. So enjoy hat
I have only used MongoDB in a high-data-volume, low-load internal application, so I cannot really offer first hand advice for your choice.
The MongoDB people, however, have a comparison with CouchDB here. There are also quite a few more independent opinions (1, 2).
You should also consider the quality of the available database drivers for your environment. The Java MongoDB driver is quite stable, in my experience, but it seems to me that it still incurs more processing overhead than it should. I have not idea about any of the CouchDB drivers.
Do you have any other requirements apart from the ability to store large amounts of data? Do you need replication or sharding?
PS: How are you storing the XML files anyway? XML files do not map into JSON (which is what eg MongoDB uses) perfectly - unless you store the whole XML text in a single field.
PS2: Are you sure that you need a document-based database? If you are only going to perform searches on a few fields that are known beforehand, a relational DB might be easier to handle. Document-based DBs start making sense only when you don't have a predefined schema for your data or when you need to store more complex object hierarchies.
PS3: May I ask why huge data implies NoSQL to you? You can store insane amounts of data on any modern relational database (as long as you have the hardware, of course).
EDIT:
A couple of related SO questions:
(...and about a thousand more)
Maybe also these:
I'd like to add that Couchbase is a faster and more scalable option than CouchDB, the 2.0 version introduces Views, at a high level it's a distributed memcached (Membase Server) merged with CouchDB, but of course more sophisticated than just mashing them together. Founders of both CouchDB and Membase Server created Couchbase.
Also likely the best way to handle is conversion of XML-JSON for storage, and JSON-XML on retrieve. If you are doing XPATH queries in the database, then it would need to be a bit more sophisticated in the View creation.
www.couchbase.com
链接地址: http://www.djcxy.com/p/86456.html