LINQ Aggregate algorithm explained

This might sound lame, but I have not been able to find a really good explanation of Aggregate .

Good means short, descriptive, comprehensive with a small and clear example.


The easiest-to-understand definition of Aggregate is that it performs an operation on each element of the list taking into account the operations that have gone before. That is to say it performs the action on the first and second element and carries the result forward. Then it operates on the previous result and the third element and carries forward. etc.

Example 1. Summing numbers

var nums = new[]{1,2,3,4};
var sum = nums.Aggregate( (a,b) => a + b);
Console.WriteLine(sum); // output: 10 (1+2+3+4)

This adds 1 and 2 to make 3 . Then adds 3 (result of previous) and 3 (next element in sequence) to make 6 . Then adds 6 and 4 to make 10 .

Example 2. create a csv from an array of strings

var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate( (a,b) => a + ',' + b);
Console.WriteLine(csv); // Output a,b,c,d

This works in much the same way. Concatenate a a comma and b to make a,b . Then concatenates a,b with a comma and c to make a,b,c . and so on.

Example 3. Multiplying numbers using a seed

For completeness, there is an overload of Aggregate which takes a seed value.

var multipliers = new []{10,20,30,40};
var multiplied = multipliers.Aggregate(5, (a,b) => a * b);
Console.WriteLine(multiplied); //Output 1200000 ((((5*10)*20)*30)*40)

Much like the above examples, this starts with a value of 5 and multiplies it by the first element of the sequence 10 giving a result of 50 . This result is carried forward and multiplied by the next number in the sequence 20 to give a result of 1000 . This continues through the remaining 2 element of the sequence.

Live examples: http://rextester.com/ZXZ64749
Docs: http://msdn.microsoft.com/en-us/library/bb548651.aspx


Addendum

Example 2, above, uses string concatenation to create a list of values separated by a comma. This is a simplistic way to explain the use of Aggregate which was the intention of this answer. However, if using this technique to actually create a large amount of comma separated data, it would be more appropriate to use a StringBuilder , and this is entirely compatible with Aggregate using the seeded overload to initiate the StringBuilder .

var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate(new StringBuilder(), (a,b) => {
    if(a.Length>0)
        a.Append(",");
    a.Append(b);
    return a;
});
Console.WriteLine(csv);

Updated example: http://rextester.com/YZCVXV6464


It partly depends on which overload you're talking about, but the basic idea is:

  • Start with a seed as the "current value"
  • Iterate over the sequence. For each value in the sequence:
  • Apply a user-specified function to transform (currentValue, sequenceValue) into (nextValue)
  • Set currentValue = nextValue
  • Return the final currentValue
  • You may find the Aggregate post in my Edulinq series useful - it includes a more detailed description (including the various overloads) and implementations.

    One simple example is using Aggregate as an alternative to Count :

    // 0 is the seed, and for each item, we effectively increment the current value.
    // In this case we can ignore "item" itself.
    int count = sequence.Aggregate(0, (current, item) => current + 1);
    

    Or perhaps summing all the lengths of strings in a sequence of strings:

    int total = sequence.Aggregate(0, (current, item) => current + item.Length);
    

    Personally I rarely find Aggregate useful - the "tailored" aggregation methods are usually good enough for me.


    Super short Aggregate works like fold in Haskell/ML/F#.

    Slightly longer .Max(), .Min(), .Sum(), .Average() all iterates over the elements in a sequence and aggregates them using the respective aggregate function. .Aggregate () is generalized aggregator in that it allows the developer to specify the start state (aka seed) and the aggregate function.

    I know you asked for a short explaination but I figured as others gave a couple of short answers I figured you would perhaps be interested in a slightly longer one

    Long version with code One way to illustrate what does it could be show how you implement Sample Standard Deviation once using foreach and once using .Aggregate. Note: I haven't prioritized performance here so I iterate several times over the colleciton unnecessarily

    First a helper function used to create a sum of quadratic distances:

    static double SumOfQuadraticDistance (double average, int value, double state)
    {
        var diff = (value - average);
        return state + diff * diff;
    }
    

    Then Sample Standard Deviation using ForEach:

    static double SampleStandardDeviation_ForEach (
        this IEnumerable<int> ints)
    {
        var length = ints.Count ();
        if (length < 2)
        {
            return 0.0;
        }
    
        const double seed = 0.0;
        var average = ints.Average ();
    
        var state = seed;
        foreach (var value in ints)
        {
            state = SumOfQuadraticDistance (average, value, state);
        }
        var sumOfQuadraticDistance = state;
    
        return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
    }
    

    Then once using .Aggregate:

    static double SampleStandardDeviation_Aggregate (
        this IEnumerable<int> ints)
    {
        var length = ints.Count ();
        if (length < 2)
        {
            return 0.0;
        }
    
        const double seed = 0.0;
        var average = ints.Average ();
    
        var sumOfQuadraticDistance = ints
            .Aggregate (
                seed,
                (state, value) => SumOfQuadraticDistance (average, value, state)
                );
    
        return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
    }
    

    Note that these functions are identical except for how sumOfQuadraticDistance is calculated:

    var state = seed;
    foreach (var value in ints)
    {
        state = SumOfQuadraticDistance (average, value, state);
    }
    var sumOfQuadraticDistance = state;
    

    Versus:

    var sumOfQuadraticDistance = ints
        .Aggregate (
            seed,
            (state, value) => SumOfQuadraticDistance (average, value, state)
            );
    

    So what .Aggregate does is that it encapsulates this aggregator pattern and I expect that the implementation of .Aggregate would look something like this:

    public static TAggregate Aggregate<TAggregate, TValue> (
        this IEnumerable<TValue> values,
        TAggregate seed,
        Func<TAggregate, TValue, TAggregate> aggregator
        )
    {
        var state = seed;
    
        foreach (var value in values)
        {
            state = aggregator (state, value);
        }
    
        return state;
    }
    

    Using the Standard deviation functions would look something like this:

    var ints = new[] {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
    var average = ints.Average ();
    var sampleStandardDeviation = ints.SampleStandardDeviation_Aggregate ();
    var sampleStandardDeviation2 = ints.SampleStandardDeviation_ForEach ();
    
    Console.WriteLine (average);
    Console.WriteLine (sampleStandardDeviation);
    Console.WriteLine (sampleStandardDeviation2);
    

    IMHO

    So does .Aggregate help readability? In general I love LINQ because I think .Where, .Select, .OrderBy and so on greatly helps readability (if you avoid inlined hierarhical .Selects). Aggregate has to be in Linq for completeness reasons but personally I am not so convinced that .Aggregate adds readability compared to a well written foreach.

    链接地址: http://www.djcxy.com/p/31912.html

    上一篇: LINQ与groupby和count

    下一篇: LINQ聚合算法解释