How to achieve dimensional charting on large dataset?

I have successfully used combination of crossfilter, dc, d3 to build multivariate charts for smaller datasets.

My current system caters to 1.5 million txns a day and I want to use the above combination to show dimensional charts on this big sized data (spanned over 6 months). I cannot push this sized data to the frontend for obvious reasons.

The txn data has seconds level granularity but this level of granularity is not required in the visualization. If txn data can be rolled up to a granularity of a day at the backend and push the day based aggregation to the front end then it can drastically reduce the IO traffic and size of the data given to the crossfilter,dc and then dc can show its visualization magic.

Taking forward the above idea -> I decided to reduce the size of the data by reducing the granularity of the timeseries data from millseconds to day by pre-aggregating the data from various dimensions using the below GROUP BY query (this is similar to the stuff done by crossfilter but at the frontend)

SELECT TRUNC(DATELOGGED) AS DTLOGGED, CODE, ACTION, COUNT(*) AS TXNCOUNT, GROUPING_ID(TRUNC(DATELOGGED),CODE, ACTION) AS grouping_id FROM AAAA GROUP BY GROUPING SETS(TRUNC(DATELOGGED), (TRUNC(DATELOGGED),CURR_CODE), (TRUNC(DATELOGGED),ACTION));

Sample output of these rows: Tuples/Rows in which aggregation is done by (TRUNC(DATELOGGED),CODE) will have a common grouping_id 1 and by (TRUNC(DATELOGGED),ACTION) will have a common grouping_id 2

 //group by DTLOGGED, CODE
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"144","ACTION":"", "TXNCOUNT":69,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"376","ACTION":"", "TXNCOUNT":20,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"144","ACTION":"", "TXNCOUNT":254,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"376","ACTION":"", "TXNCOUNT":961,"GROUPING_ID":1},

//group by DTLOGGED, ACTION
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"ENROLLED_PURCHASE", "TXNCOUNT":373600,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"UNENROLLED_PURCHASE", "TXNCOUNT":48978,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"ENROLLED_PURCHASE", "TXNCOUNT":402311,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"UNENROLLED_PURCHASE", "TXNCOUNT":54910,"GROUPING_ID":2},

//group by DTLOGGED
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"", "TXNCOUNT":460732,"GROUPING_ID":3},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"", "TXNCOUNT":496060,"GROUPING_ID":3}];

Questions: These rows are are dis-joined ie not like usual rows where each row will have valid values for CODE and ACTION in a single row. After a selection is made in one of the graphs, the redrawing effect either removes the other graphs or shows no data on them. Please give me any troubleshooting help or suggest better ways to solve this?

http://jsfiddle.net/universallocalhost/5qJjT/3/


So there are a couple things going on in this question, so I'll try to separate them:

Crossfilter works with tidy data

http://vita.had.co.nz/papers/tidy-data.pdf

This means that you will need to come up with a naive method of filling in the nulls you're seeing (or if need be, in your initial query of the data, omit the nulled values. If you want to get really fancy, you could even infer the null values based off of other data. Whatever your solution, you need to make your data tidy prior to putting it into crossfilter.

Groups and Filtering Operations

txnVolByCurrcode = txnByCurrcode.group().reduceSum(function(d) {
    if(d.GROUPING_ID ===1) {
        return d.TXNCOUNT;
    } else {
        return 0;
    }
});

This is a filtering operation done on the reduction. This is something that you should separate. Allow that filtering to occur elsewhere (either in the visual, crossfilter itself, or in the query on the data).

This means your reduceSum's become:

var txnVolByCurrcode = txnByCurrcode.group().reduceSum(function(d) {
    return d.TXNCOUNT;
});

And if you would like the user to select which group to display:

var groupId = cfdata.dimension(function(d) { return d.GROUPING_ID; });
var groupIdGroup = groupId.group(); // this is an interesting name
dc.pieChart("#group-chart")
    .width(250)
    .height(250)
    .radius(125)
    .innerRadius(50)
    .transitionDuration(750)
    .dimension(groupId)
    .group(groupIdGroup)
    .renderLabel(true);

For an example of this working: http://jsfiddle.net/b67pX/

链接地址: http://www.djcxy.com/p/5596.html

上一篇: dc.js:使用reduceCount方法堆积面积图

下一篇: 如何在大型数据集上实现二维制图?