Aggregate functions, Lack of Types, Entropy in Query problems on NoSQL DBs
While we believe that NoSQL Databases have come to fill a number of gaps which are challenging on the side of RDBMS, i have had several challenges over time with NoSQL DBs in the area of their query eco-system.
Couchbase for example, like its mother CouchDB have had major improvements in reading data using views, lists, Key lookups, map reduce, etc Couchbase has even moved to create an SQL-like query engine for their huge 2.X verson. MongoDB has also made serious improvements and complex queries are possible on it and many other NoSQL DB developments going on out there.
Most NoSQL DBs can perform Complex queries based on LOGICAL and COMPARISON OPERATORS
eg AND
, OR
, ==
etc However, aggergation and performing complex relations on data are a problem on my part. For example, in CouchDB and/or Couchbase, Views span only a single DB. It is not possible to write a view which will aggregate data from two or more databases.
Let me now get to the problem. Functions (whether aggregate or not): AVG
, SUM
, ROUND
, TRUNC
, MAX
, MIN
, etc The lack of data types makes it impossible to efficiently work with Date and Times hence the lack of Date and time functions eg TO_DATE
, SYSDATE
(for system date/time), ADD_MONTHSs
, DATE BETWEEN
, DATE/TIME format Conversion
etc It is true, that many will say that , they lack Schemas, types and stuff, but, i have found myself not running away from the need for atleast any one of the functions listed up there. For example because NoSQL DBs have no Date/Time data type, it is hard to perform queries based on those, because you might want to analyse trends based on time. Also, others have tried to use UNIX/EPOC Time stamps and stuff to solve this but it aint a single size fits all
solution.
Map Reduce can be used to attain aggregation to a certain (small) degree, but the overhead has been realised to be great. However, the lack of GROUP BY
functionality makes it a straineous solution to filter through what ou want. Look at the query below:
SELECT doc.field1, doc.field3, SUM(doc.field2 + doc.field4) FROM couchdb.my_database GROUP BY doc.field1, doc.field3 HAVING SUM(doc.field2 + doc.field4) > 20000;
This is not very easy to attain on CouchDB or Couchbase. i am not sure if its possible on MongoDB. I wish it were possible out of the box. This has made it difficult to use NoSQL as a Data warehouse or OLTP/OLAP solution. I found that, each time a complex analysis needs to be made, one needs to do it in the middle ware by paging through different datasets. Now, most experienced Guys (eg CLOUDANT
) have tweaked LUCENE
to perform complex queries, but because it was initially meant for indexing and text search, it has not solved the lack of FUNCTIONS
and DATA AGGREGATION
on most NoSQL DBs.
Because of lack of FUNCTIONS
, most NoSQL DBs have the NULL
data type but lack the option of converting NULL
Objects to something else, like it is in some RDBMS. For example in Oracle, i could: NVL(COLUMN,0)
in order to include all the rows while performing say an AVG
calculation on a given column (since say, by default the null columns will not be counted/included in the query processing).
To fully understand the problem, CouchDB views for example operate within the scope of a doc like this below:
function(doc){ // if statements, logical operators, comparison operators // e.t.c here. until you do am emit of that doc // if it satisfies the conditions set // emit(null, doc) OR emit(doc.x,[doc.y, doc.z]) e.t.c. // you can only emit javascript data types anyways emit(doc.field1,doc) }The docs which satisfy the filters, are let through and go onto the next stage or to a reduce function. Imagine a doc structure like this below:
{ x: '', y: '', z: { p: '', n: N // integer or number data type }, date: 'DD/MON/YYYY' // date format }
Now, lets imagine the possibility of this kind of query:
function(){ var average = select AVG(doc.z.n) from couchdb.my_database; var Result = select doc.x,doc.y from couchdb.my_database where doc.z.n > average and doc.y = 'some string' and doc.date between '01-JUN-2012' and '03-AUG-2012'; emit(Result); }
OR if this query were possible:
function(){ var latest = select MAX(doc.date) from couchdb.my_database; var Result = select doc.x,doc.z.p,MONTHS_BETWEEN(doc.date,latest) as "Months_interval" from couchdb.my_database where doc.y like '%john%' order by doc.z.p; emit(Result); }Qn 1: Which NoSQL Database solution has attained to a great degree, the query capability being talked about in the details above ? what key features make it stand out ?
Qn 2: Is the lack of a Schema, or the characteristic of being
Key-Value
a reason for the lack of FUNCTIONS
in Querying these Databases ? What is the reason for the lack of Aggregate functionality in most NoSQL DBs ? Qn 3: If the query ability above is possible in any of the NoSQL DBs, show how the last two (2) query problems above can be attained using the existing NoSQL infrastracture (consider any NoSQL technology of your choice)
MongoDB has something called Aggregation Framework and it works pretty well. I would say that almost every SQL Aggregation query could be carried out with this framework. Here you have some examples of "conversion" from SQL to Aggregation Framework.
Anyway MongoDB is a document oriented database and not key-value like CouchDB, so I don't know if it fits your requirements.
链接地址: http://www.djcxy.com/p/66412.html