MongoDB/CouchDB for storing files + replication?

if I would like to store a lot of files + replicate the db, what NoSql databse would be the best for this kind of job?

I was testing MongoDB and CouchDB and these DBs are really nice and easy to use. If it would be possible I would use one of them for storing files. Now I see the difference between Mongo and Couch, but I cannot explain which one is better for storing files. And if Im talking about storing files I mean files with 10-50MB but also maybe files with 50-500MB - and maybe a lot of updates.

I found here a nice table:

在这里输入图像描述

http://weblogs.asp.net/britchie/archive/2010/08/17/document-databases-compared-mongodb-couchdb-and-ravendb.aspx

Still not sure which of these properties are the best for filestoring and replication. But maybe I should choose another NoSql DB?


That table is way out of date:

  • Master-Slave replication has been deprecated in favour of replica sets for starters and also consistency is wrong there as well. You will want to completely re-read this section on the MongoDB docs.
  • Map/Reduce is only JavaScript, there is no others.
  • I have no idea what that table means by attachments but GridFS is a storage standard built into the drivers to help make storing large files in MongoDB easier. Meta-data is also supported through this method.
  • MongoDB is on version 2.2 so anything it mentions about versions before is now obsolete (ie sharding and single server durability).
  • I do not have personal experience with CouchDBs interface for storing files however I wouldn't be surprised if there was hardly any differences between the two. I would think this part is too subjective for us to answer and you will need to just go for which one suites you better.

    It is actually possible to build MongoDB clusters multi-regional (which S3 buckets are not and cannot be replicated as such without work) and replicate the most accessed files in a specific part of the world through MongoDB to these clusters.

    I mean the main upshot I have found at times is that MongoDB can act like S3 and Cloudfront put together which is great since you have the redundant storage and the ability to distribute your data.

    However that being said S3 is very valid option here and I would seriously give it a try, you might not be looking for the same stuff as me in a content network.

    Database storage of files do not come without their serious downsides, however speed shouldn't be a huge problem here since you should get the same speed from a none Cloudfront fronted S3 as you should get from MongoDB really (remember S3 is a redundant storage network, not a CDN).

    If you were to use S3 you would then store a row in your database that points to the file and houses meta-data about it.


    There is a project called CBFS by Dustin Sallings (one of the Couchbase founders, and creator of spymemcached and core contributor of memcached) and Marty Schoch that uses Couchbase and Go.

    It's an Infinite Node file store with redundancy and replication. Basically your very own S3 that supports lots of different hardware and sizes. It uses REST HTTP PUT/GET/DELETE, etc. so very easy to use. Very fast, very powerful.

    CBFS on Github : https://github.com/couchbaselabs/cbfs

    Protocol : https://github.com/couchbaselabs/cbfs/wiki/Protocol

    Blog Post : http://dustin.github.com/2012/09/27/cbfs.html

    Diverse Hardware : https://plus.google.com/105229686595945792364/posts/9joBgjEt5PB

    Other Cool Visuals :

    http://www.youtube.com/watch?v=GiFMVfrNma8

    http://www.youtube.com/watch?v=033iKVvrmcQ

    Contact me if you have questions and I can put you in touch.


    Have you considered Amazon S3 as an option? It's highly available, proven and has redundant storage etc....

    CouchDB, even though I personally like it a lot as it works very well with node.js, has the disadvantage that you need to compact it regularly if you don't want to waste too much diskspace. In your case if you are going to be doing a lot of updates to the same documents, that might be an issue.

    I can't really commment on MongoDB as I haven't used it, but again, if file storage is your main concern, then have a look at S3 and similar as they are completely focused on filestorage.

    You could combine the two where you store your meta data in a NoSql or Sql datastore and your actual files in a separate file store but keeping those 2 stores in sync and replicated might be tricky.

    链接地址: http://www.djcxy.com/p/15162.html

    上一篇: 由于GIL的线程化Python代码?

    下一篇: MongoDB / CouchDB用于存储文件+复制?