Using a NoSQL database over MySQL
I have a web application running on Java stack (Struts 2 + Spring + Hibernate) and persisted in MySQL. I looked at NoSQL databases and they are certainly easy to reason about and work with than a RDBMS. It's a music streaming app which stores artist information and allows users to save playlists.
I am wondering whether there are any advantages (performance?, hardware cost?, simplified code?, scalability?) of switching to a NoSQL DB (CouchDB?, MongoDB?, Cassandra?). What would I lose/gain by switching to a NoSQL database?
Please advise.
The polite interpretation of "NoSQL" has become Not Only SQL
. If you have data that is indeed truly relational, or if your functionality depends on things like joins and ACIDity, then you should store that data in a relational way. In this post, I'll explain how I use MySQL alongside two NoSQL data stores. Modern, web-scale data storage is all about understanding how to pick the best tool(s) for the job(s).
That said, NoSQL is really a reaction to the fact that the relational method and way of thinking has been applied to problems where it's not actually a very good fit (typically huge tables with tens of millions of rows or more). Once tables get that large, the typical SQL "best practice" has been to manually shard the data -- that is, putting records 1 through 10,000,000 in table A, 10,000,001 through 20,000,001 in table B, and so on. Then, typically in the application model layer, the lookups are performed according to this scheme. This is what's called application-aware
scaling. It's time-intensive and error prone, but to scale something up while maintaining MySQL for the long table store, it's become a more or less standard MO. NoSQL represents, to me, the application-unaware
alternative.
Key-Value
When I had a MySQL prototype start getting too big for its own good, I personally moved as much data as possible to the lightning-fast Membase, which outperforms Memcached and adds persistence. Membase is a distributed key-value store that scales more or less linearly (Zynga uses it to handle a half-million ops per second, for instance) by adding more commodity servers into a cluster -- it's therefore a great fit for the cloud age of Amazon EC2, Joyent, etc.
It's well known that distributed key-value stores are the best way to get enormous, linear scale. The weakness of key-value is queryability and indexing. But even in the relational world, the best practice for scalability is to offload as much effort onto the application servers as possible, doing joins in memory on commodity app servers instead of asking the central RDB cluster to handle all of that logic. Since simple select
plus application logic
are really the best way to achieve massive scale even on MySQL, the transition to something like Membase (or its competitors like Riak) isn't really too bad.
Document Stores
Sometimes -- though I would argue less often than many think -- an application's design inherently requires secondary indices, range queryability, etc. The NoSQL approach to this is through a document store
like MongoDB. Like Membase, Mongo is very good in some areas where relational databases are particularly weak, like application-unaware
scaling, auto-sharding
, and maintaining flat response times even as dataset size balloons
. It's significantly slower than Membase and a bit trickier to do pure horizontal scale, but the benefit is that it's highly queryable. You can query on parameters and ranges in real time, or you can use Map/Reduce to perform complex batch operations on truly enormous data sets.
On the same project I mentioned above, which uses Membase to serve tons of live player data, we use MongoDB to store analytics/metrics data, which is really where MongoDB shines.
Why to keep things in SQL
I touched briefly on the fact that 'truly relational' information should stay in relational databases. As commenter Dan K. points out, I missed the part where I discuss the disadvantages of leaving RDBMS, or at least of leaving it entirely.
First, there's SQL itself. SQL is well-known and has been an industry standard for a long time. Some "NoSQL" databases like Google's App Engine Datastore (built on Big Table) implement their own SQL-like language (Google's is called, cutely, GQL for Google Query Language
). MongoDB takes a fresh approach to the querying problem with its delightful JSON query objects. Still, SQL itself is a powerful tool for getting information out of data, which is often the whole point of databases to begin with.
The most important reason to stay with RDBMS is ACID, or Atomicity, Consistency, Isolation, Durability
. I won't re-hash the state of Acid-NoSQL, as it's well-addressed in this post on SO. Suffice it to say, there's a rational reason Oracle's RDBMS has such a huge market that isn't going anywhere: some data needs pure ACID compliance. If your data does (and if it does, you're probably well aware of that fact), then so does your database. Keep that pH low!
Edit: Check out Aaronaught's post here. He represents the business-to-business perspective far better than I could, in part because I've spent my entire career in the consumer space.
I think that it very much depends on what you want to store in the database. I don't have experience with CouchDB or Cassandra so I will let someone else speak for them but I frequently use MongoDB and MySQL.
If you were developing application that required transactions eg a billing application you would definitely want to use MySQL because of its support for transactions. MySQL is ACIDic that is it is Atomic, Consistent, Isolated and Durable. What that essentially means is that when you update a row in MySQL - it is GUARANTEED to have happened. However the issue with MySQL is that it does not scale horizontally (by adding more and more servers) very easily. MySQL servers tend to be scaled vertically by adding more memory, HDD space etc but they eventually hit a ceiling and it can reach an enormous cost.
MongoDB is a document database. It stores JSON-like documents inside collections and is schema-less - so each document can be different. This is great for flexibility in your application. A lot of developers say that noSql solutions are developed more for programmers and they tend to be much easier to build with (in my experience). In addition MongoDB scales horizontally by sharding the database into chunks. In fact this can even be automated now.
But there are drawbacks to using MongoDB. If you are using it in production you really MUST put in a replication slave with it. This is because MongoDB does not have full single server durability. So if you suffer a power failure you will probably have to repair the entire MongoDB database which can take hours. This is probably not a big deal cost-wise if you are well-funded but if you are a new organisation with little money it can be difficult (use cloud computing? ). In addition MongoDB does not support transactions which is necessary to guarantee Atomicity and Isolation. Finally MongoDB is only eventually consistent (though I have seen a few sides to this argument) - which means that when a write happens all other processes are not GUARANTEED to see the information straight away - only eventually.
In my opinion if you are storing artist information and meta-data about tracks then MongoDB would be a good solution. If you were storing user data, billing data etc then store it in MySQL.
There is only one correct answer to the question as it is: change your current solution only if you are having performance troubles or you are expecting vast increase in traffic and have measured (Via stress tests) that your architecture doesn't fit.
Otherwise - there is no need to even evaluate alternatives.
链接地址: http://www.djcxy.com/p/86390.html上一篇: Solr与ElasticSearch
下一篇: 在MySQL上使用NoSQL数据库