How to take the average of big data in MongoDB vs CouchDB?
I'm looking at this chart...
http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid
...which says:
Query Method
CouchDB - Map/reduce of javascript functions to lazily build an index per query
MongoDB - Dynamic; object-based query language
What exactly does this mean? For example, if I want to take an average of 1,000,000,000 values, does CouchDB automatically do it in a MapReduce way?
Can someone walk me through how to take an average of 1,000,000,000 values with both systems... this would be a very illuminating example.
Thanks.
CouchDB´s views are a strange and fascinating beast.
CouchDB does incremental map/reduce, that is to say, that once you specify your "view" it´ll work sort of like a materialized view from a relational database. It will not matter if you´re averaging 3 or 3 billion documents. The result is there.
But there is a threefold gotcha in there
1) querying is fast once the view is created and is updated. View creation can be slow if you have lots of small documents (if possible go with fatter documents). Once the view is created, the intermediary reduction steps are stored inside the B-tree nodes and you´ll won´t have to recompute them.
2) Views are updated lazily when you query then. To have a predictable performance, you better setup some sort of job to update them regularly. How do you Schedule Index Updates in CouchDB
3) You need to have a pretty good idea of how you´ll query your data with composite keys, ranges and grouping. CouchDB sucks at doing ad-hoc querying. http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
I´m sure someone will soon post the details of how to average 1,000,000,000 items in both databases, but you have to understand that CouchDB makes you do more upfront work in order to benefit from it´s incremental approach. It´s really something quite unique, but not really intended to scenarios when you´re doing averages or anything on ad-hoc queried data.
In Mongo, you can use either map/reduce(not incremental. It will matter whether you are averaging 3 or 3 billion documents, but mongo is considered to be blazingly fast due to its memory mapped I/O approach) or their aggregation features. http://www.mongodb.org/display/DOCS/Aggregation
I cannot speak about MongoDB, but I can tell you about CouchDB. CouchDB can only be natively queried via a Map/Reduce View Engine. In fact, a great place to start is this section of the wiki.
A view contains a map function, and an optional reduce function. The typical language for writing these functions is JavaScript, but there is an Erlang option available, and it is possible to build a view engine in just about any other programming language.
The map function serves to build a data-set out of the documents in the database. The reduce function serves to aggregate that data-set. As such, the map function is run on every single document in the database once the view is created. (and first queried) After creation, that function only runs on a document that is either newly created, or is modified/deleted. As such, view indexes are built incrementally , not dynamically.
In the case of 1,000,000,000 values, CouchDB will not need to calculate the results of your query every single time it's requested. Instead, it will only report on the value of the view index it has stored, which itself only changes whenever a document is created/updated/deleted.
As far as writing Map/Reduce functions, a lot of that work is left up to the programmer, as there are no built-in map functions. (ie. it's not "automatic") However, there are a few native reduce functions ( _sum
, _count
, _stats
) available.
Here's a simple example, we'll calculate the average height of some people.
// sample documents
{ "_id": "Dominic Barnes", "height": 64 }
{ "_id": "Some Tall Guy", "height": 75 }
{ "_id": "Some Short(er) Guy", "height": 58 }
// map function
function (doc) {
// first param is "key", which we do not need since `_id` is stored anyways
emit(null, doc.height);
}
// reduce function
_stats
The results of this view would look like this:
{
"rows": [
{
"key": null
"value": {
"sum": 197,
"count": 3,
"min": 58,
"max": 75,
"sumsqr": 13085
}
}
]
}
Calculating the average from here is as simple as dividing the sum by the count. If you want the average calculated within the view itself, you could check out this example.
链接地址: http://www.djcxy.com/p/86422.html上一篇: CouchDB数据库模型