how to do a complex crossfilter reduction
I am trying to create a variable for putting in dc.js using a custom reduction (reduceAdd, reduceRemove etc) and am having trouble figuring out how to code it.
I wrote the function outside of these reduce functions and have to now replicate the same inside reduce functions in order to use the same for the graphs plotted. The logic and code written for outside reduce functions are as follows
Logic : For each unique contact_week available (dates), find max value of week_number,then sum up TOTCOUNT variable and DECAY_CNT variable and calculate percentage (DECAY_CNT/ TOTCOUNT) .
Here is the original code without using crossfilter:
//Decay % logic
var dates = d3.map(filter1,function(d) { return d.CONTACT_WEEK;}).keys() ;
console.log(dates);
var sum1,sum2 = 0;
for(var i=0; i<dates.length; i++)
{
data1 = filter1.filter(function(d) { return d.CONTACT_WEEK == dates[i] ;});
//console.log(data1);
var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
//console.log(max);
data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});
var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});
console.log(sum1);
var decay = sum2/sum1 * 100 ;
console.log(decay);
}
The first step in this is to identify unique values of dates (contact_week) - How do I go about doing this in the reduce functions as it's already a for loop that traverses through the data?
I guess for max etc, we can use reductio or some other logic as mentioned in comments, but I'm not really getting the approach/design to be followed here
Any help in approach/solutions will be highly appreciated.
UPDATE2 :
Trying a new approach using reductio js
Data explanation :
A few columns in my data - contact_week (dates) ; week_number (numbers - -4 to 6) ; decay_cnt (integers) ; totcount (integers) ; duration (ordinal values - pre, during and post) ;
Now, I need to calculate a percentage called decay %, which is calculated as follows: For each unique contact_week, find max of week_number, now for this filtered dataset, calculate sum (decay_cnt) / sum (totcount)
This has to be plotted in a barchart where the x-axis is duration and the metric - decay % is y axis
In pursuit of calculating the max of week-numbers of individual dates, I've plotted a bar chart for now, with contact_week as x-axis and max of week_number as the y-axis. How do I get the chart that I need?
Code :
dateDimension2 = ndx.dimension(function(d) {return d.CONTACT_WEEK ;});
decayGroup = reductio().max(function (d) { return d.WEEK_NUMBER; })(dateDimension2.group());
chart2
.width(500)
.height(200)
.x(d3.scale.ordinal())
//.x(d3.scale.ordinal().domain(["DURING","POST1"]))
.xUnits(dc.units.ordinal)
//.xUnits(function(){return 10;})
//.brushOn(false)
.yAxisLabel("Decay (in %)")
.dimension(dateDimension)
.group(decayGroup)
.gap(10)
.elasticY(true)
//.yAxis().tickValues([0, 5, 10, 15])
//.title(function(d) { return d.key + ": " + d3.round(d.value.new_count,2); })
/*.valueAccessor(function (p) {
//return p.value.count > 0 ? (p.value.dec_total / p.value.new_count) * 100 : 0;
return p.value.decay ;
})*/
.valueAccessor(function(d) { return d.value.max; })
.on('renderlet', function(chart) {
chart.selectAll('rect').on("click", function(d) {
console.log("click!", d);
});
})
.yAxis().ticks(5);
Any approach/suggestions will be highly appreciated
I think the solution mostly lies in the fake groups/dimensions and reduction js combined approach. Any alternatives are most welcome!
I've just added a FAQ and an example for this kind of problem.
As explained there, the idea is to maintain an array of rows which fall into each bin, since crossfilter doesn't provide access to that yet. Once we've got the actual rows, your calculations are almost the same as you are doing now, except that crossfilter keeps track of the list of weeks for you.
So you can use these functions from the example:
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
function groupArrayInit() {
return [];
}
You need to have a unique key in your records so that they can be added and removed reliably. I'll assume that your records have an ID
field.
Define your week dimension and group like so:
var weekDimension = ndx.dimension(function(d) {return d.CONTACT_WEEK ;}),
id_function = function(r) { return r.ID; },
weekGroup = weekDimension.group().reduce(groupArrayAdd(id_function), groupArrayRemove(id_function), groupArrayInit);
Then the most efficient time to calculate your metric is when it's needed, in the value accessor. So you can define your value accessor with the heart of the code you posted in your question.
(Of course, this code is untested because I don't know your data.)
var calculateDecay = function(kv) {
// kv.value has the array produced by the reduce functions.
var data1 = kv.value;
var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});
var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});
var decay = sum2/sum1 * 100 ;
return decay;
}
chart.valueAccessor(calculateDecay);
链接地址: http://www.djcxy.com/p/5602.html
上一篇: 在自定义交叉过滤器减少功能中避免多重总和