Avoid multiple sums in custom crossfilter reduce functions
This question arise from some difficulties in creating a crossfilter
dataset, in particular on how to group the different dimension and compute a derived values. The final aim is to have a number of dc.js
graphs using the dimensions and groups.
(Fiddle example https://jsfiddle.net/raino01r/0vjtqsjL/)
Question
Before going on with the explanation of the setting, the key question is the following:
How to create custom add
, remove
, init
, functions to pass in .reduce
so that the first two do not sum multiple times the same feature?
Data
Let's say I want to monitor the failure rate of a number of machines (just an example). I do this using different dimension: month, machine's location, and type of failure.
For example I have the data in the following form:
| month | room | failureType | failCount | machineCount |
|---------|------|-------------|-----------|--------------|
| 2015-01 | 1 | A | 10 | 5 |
| 2015-01 | 1 | B | 2 | 5 |
| 2015-01 | 2 | A | 0 | 3 |
| 2015-01 | 2 | B | 1 | 3 |
| 2015-02 | . | . | . | . |
Expected
For the three given dimensions , I should have:
Idea
Essentially, what counts in this setting is the couple (day, room)
. Ie given a day and a room there should be a rate attached to them (then the crossfilter should act to take in account the other filters).
Therefore, a way to go could be to store the couples that have already been used and do not sum machineCount
for them - however we still want to update the failCount
value.
Attempt (failing)
My attempt was to create custom reduce functions and not summing MachineCount
that were already taken into account.
However there are some unexpected behaviours. I'm sure this is not the way to go - so I hope to have some suggestion on this. // A dimension is one of: // ndx = crossfilter(data); // ndx.dimension(function(d){return d.month;}) // ndx.dimension(function(d){return d.room;}) // ndx.dimension(function(d){return d.failureType;}) // Goal: have a general way to get the group given the dimension:
function get_group(dim){
return dim.group().reduce(add_rate, remove_rate, initial_rate);
}
// month is given as datetime object
var monthNameFormat = d3.time.format("%Y-%m");
//
function check_done(p, v){
return p.done.indexOf(v.room+'_'+monthNameFormat(v.month))==-1;
}
// The three functions needed for the custom `.reduce` block.
function add_rate(p, v){
var index = check_done(p, v);
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
var count_to_sum = (index)? v.machineCount:0;
p.mach_count += count_to_sum;
p.fail_count += v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function remove_rate(p, v){
var index = check_done(p, v);
var count_to_subtract = (index)? v.machineCount:0;
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
p.mach_count -= count_to_subtract;
p.fail_count -= v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function initial_rate(){
return {rate: 0, mach_count:0, fail_count:0, done: new Array()};
}
Connection with dc.js
As mentioned, the previous code is needed to create dimension, group
to be passed in three different bar graphs using dc.js
.
Each graph will have .valueAccessor(function(d){return d.value.rate};)
.
See the jsfiddle (https://jsfiddle.net/raino01r/0vjtqsjL/), for an implementation. Different numbers, but the datastructure is the same. Notice the in the fiddle you expect a Machine count
to be 18 (in both months), however you always get the double (because of the 2 different locations).
Edit
Reduction + dc.js
Following Ethan Jewett answer, I used reductio
to take care of the grouping. The updated fiddle is here https://jsfiddle.net/raino01r/dpa3vv69/
My reducer
object needs two exception (month, room)
, when summing the machineCount
values. Hence it is built as follows:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room; })
.exception(function(d) { return d.month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This seems to fix the numbers when the graphs are rendered.
However , I do have a strange behaviour when filtering one single month and looking at the numbers in the type
graph.
Rather double create two exception, I could merge the two fields when processing the data. Ie as soon the data is defined I couls:
data.foreach(function(x){
x['room_month'] = x['room'] + '_' + x['month'];
})
Then the above reduction code should become:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room_month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This solution seems to work. However I am not sure if this is a sensible things to do: if the dataset is large,adding a new feature could slow down things quite a lot!
A few things:
Don't calculate rates in your Crossfilter reducers. Calculate the components of the rates. This will keep both simpler and faster. Do the actual division in your value accessor.
You've basically got the right idea. I think there are two problems that I see immediately:
In your remove_rate
your are not removing the key from the p.done
array. You should be doing something like if (index) p.done.splice(p.done.indexOf(v.room+'_'+monthNameFormat(v.month)), 1);
to remove it.
In your reduce functions, index
is a boolean. (index == -1)
will never evaluate to true
, IIRC. So your added machine count will always be 0. Use var count_to_sum = index ? v.machineCount:0;
var count_to_sum = index ? v.machineCount:0;
instead.
If you want to put together a working example, I or someone else will be happy to get it going for you, I'm sure.
You may also want to try Reductio. Crossfilter reducers are difficult to do right and efficiently, so it may make sense to use a library to help. With Reductio, creating a group that calculates your machine count and failure count looks like this:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
var dim = ndx.dimension(...)
var grp = dim.group()
reducer(group)
链接地址: http://www.djcxy.com/p/5604.html
上一篇: 使用Crossfilter reductio插件进行标准误差计算?
下一篇: 在自定义交叉过滤器减少功能中避免多重总和