oriented NoSQL differ from document

2018-06-30 23:32:42

The three types of NoSQL databases I've read about is key-value, column-oriented, and document-oriented.

Key-value is pretty straight forward - a key with a plain value.

I've seen document-oriented databases described as like key-value, but the value can be a structure, like a JSON object. Each "document" can have all, some, or none of the same keys as another.

Column oriented seems to be very much like document oriented in that you don't specify a structure.

So what is the difference between these two, and why would you use one over the other?

I've specifically looked at MongoDB and Cassandra. I basically need a dynamic structure that can change, but not affect other values. At the same time I need to be able to search/filter specific keys and run reports. With CAP, AP is the most important to me. The data can "eventually" be synced across nodes, just as long as there is no conflict or loss of data. Each user would get their own "table".

In Cassandra, each row (addressed by a key) contains one or more "columns". Columns are themselves key-value pairs. The column names need not be predefined, ie the structure isn't fixed. Columns in a row are stored in sorted order according to their keys (names).

In some cases, you may have very large numbers of columns in a row (eg to act as an index to enable particular kinds of query). Cassandra can handle such large structures efficiently, and you can retrieve specific ranges of columns.

There is a further level of structure (not so commonly used) called super-columns, where a column contains nested (sub)columns.

You can think of the overall structure as a nested hashtable/dictionary, with 2 or 3 levels of key.

Normal column family:

row
    col  col  col ...
    val  val  val ...

Super column family:

row
      supercol                      supercol                     ...
          (sub)col  (sub)col  ...       (sub)col  (sub)col  ...
           val       val      ...        val       val      ...

There are also higher-level structures - column families and keyspaces - which can be used to divide up or group together your data.

See also this Question: Cassandra: What is a subcolumn

Or the data modelling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations

Re: comparison with document-oriented databases - the latter usually insert whole documents (typically JSON), whereas in Cassandra you can address individual columns or supercolumns, and update these individually, ie they work at a different level of granularity. Each column has its own separate timestamp/version (used to reconcile updates across the distributed cluster).

The Cassandra column values are just bytes, but can be typed as ASCII, UTF8 text, numbers, dates etc.

Of course, you could use Cassandra as a primitive document store by inserting columns containing JSON - but you wouldn't get all the features of a real document-oriented store.

主要区别在于文档存储（例如MongoDB和CouchDB）允许任意复杂的文档，即子文档中的子文档，文档列表等，而列存储（例如Cassandra和HBase）只允许固定格式，例如严格的单层或两级字典。

In "insert", to use rdbms words, Document-based is more consistent and straight foward. Note than cassandra let you achieve consistency with the notion of quorum, but that won't apply to all column-based systems and that reduce availibility. On a write-once / read-often heavy system, go for MongoDB. Also consider it if you always plan to read the whole structure of the object. A document-based system is designed to return the whole document when you get it, and is not very strong at returning parts of the whole row.

The column-based systems like Cassandra are way better than document-based in "updates". You can change the value of a column without even reading the row that contains it. The write doesn't actualy need to be done on the same server, a row may be contained on multiple files of multiple server. On huge fast-evolving data system, go for Cassandra. Also consider it if you plan to have very big chunk of data per key, and won't need to load all of them at each query. In "select", Cassandra let you load only the column you need.

Also consider that Mongo DB is written in C++, and is at its second major release, while Cassandra needs to run on a JVM, and its first major release is in release candidate only since yesterday (but the 0.X releases turned in productions of major company already).

On the other hand, Cassandra's designed was partly based on Amazon Dynamo, and it is built at its core to be an High Availibility solution, but that does not have anything to do with the column-based format. MongoDB scales out too, but not as gracefully as Cassandra.

链接地址: http://www.djcxy.com/p/86410.html

上一篇: 关注NoSQL / MongoDB

下一篇: 面向对象的NoSQL与文档不同