NoSQL with analytic functions
I'm searching for any NoSQL
system (preferably open source) that supports analytic functions ( AF
for short) like Oracle/SQL Server/Postgres does. I didn't find any with build-in functions. I've read something about Hive
but it doesn't have actual feature of AF
(windows, first_last values, ntiles, lag, lead and so on) just histograms and ngrams. Also some NoSQL systems ( Redis
for example) support map/reduce, but I'm not sure if AF
can be replaced with it.
I want to make a performance comparison to choose either Postgres or NoSQL system.
So, in short:
NoSQL
systems with AF
AF
? Is it fast, reliable, easy to go. ps. I tried to make my question more constructive.
Some function uses knowledge of all existing data when it involves some king of aggregation (avg, median, standard deviation) or some ordering (first, last).
If you want a distributed NOSQL solution that support AF out of the box, the system will need to rely on some centralized indexing and metadata to keep information about the data in all nodes, thus having a master-node and probably a single point of failure.
You have to ask what you expect to accomplish using NoSQL. You want schemaless tables ? Distributed data ? Better raw performance for very simple queries ?
Depending of your needs, I see three main alternatives here:
1 - use a distributed NoSQL with no single point of failure (ie: Cassandra) to store your data and use map/reduce to process the data and produce the results for the desired function (almost any major NoSQL solution support Hadoop). The caveat is that map/reduce queries are not realtime (can take minutes or hours to execute the query) and requires extra-setup and learning.
2 - use a traditional RDBMS that support multiple servers like MySQL Cluster
3 - use a NoSQL with master/slave topology that supports ad-hoc and aggregation queries like Mongo
As for the second question: yes, you can rely on M/R to replace AF. You can do almost anything with M/R.
Once you've really understood how MapReduce works, you can do amazing things with a few lines of code.
Here is a nice video course:
http://code.google.com/intl/fr/edu/submissions/mapreduce-minilecture/listing.html
The real difficulty factor will be between functions that you can implement with a single MapReduce and those that will need chained MapReduces. Moreover, some nice MapReduce implementations (like CouchDB) don't allow you to chain MapReduces (easily).
链接地址: http://www.djcxy.com/p/11384.html上一篇: 在网页上循环移动文本
下一篇: 具有分析功能的NoSQL