Generic IEqualityComparer<T> and GetHashCode

Being somewhat lazy about implementing lots of IEqualityComparers, and given that I couldn't easily edit class implementations of object being compared, I went with the following, meant to be used with Distinct() and Except() extension methods. :

public class GenericEqualityComparer<T> : IEqualityComparer<T>
{
    Func<T, T, bool> compareFunction;
    Func<T, int> hashFunction;

    public GenericEqualityComparer(Func<T, T, bool> compareFunction, Func<T, int> hashFunction)
    {
        this.compareFunction = compareFunction;
        this.hashFunction = hashFunction;
    }

    public bool Equals(T x, T y)
    {
        return compareFunction(x, y);
    }

    public int GetHashCode(T obj)
    {
        return hashFunction(obj);
    }
}

Seems nice, but is giving an hash function everytimes REALLY necessary ? I understand that the hashcode is used to put objects in buckets. Different buckets, object are not equal, and equal is not called.

If GetHashCode returns the same value, equals is called. ( from : Why is it important to override GetHashCode when Equals method is overridden? )

So what could go wrong, if for example (and I hear a lot of programmers screaming in horror), GetHashCode returns a constant, to force the call to Equal ?


Nothing would go wrong, but in hash-table based containers, you're going from approx O(1) to O(n) performance when doing a lookup. You'd be better off simply storing everything in a List and brute force searching it for items that fulfil equality.


If a common use-case is comparing objects according to one of their properties, you could add an additional constructor and implement and call it like this:

public GenericEqualityComparer(Func<T, object> projection)
{
    compareFunction = (t1, t2) => projection(t1).Equals(projection(t2));
    hashFunction = t => projection(t).GetHashCode();
}

var comaparer = new GenericEqualityComparer( o => o.PropertyToCompare);

This will automatically use the hash as implemented by the property.

EDIT: a more efficient and robust implementation inspired my Marc's comments:

public static GenericEqualityComparer<T> Create<TValue>(Func<T, TValue> projection)
{
    return new GenericEqualityComparer<T>(
        (t1, t2) => EqualityComparer<TValue>.Default.Equals( projection(t1), projection(t2)),
        t => EqualityComparer<TValue>.Default.GetHashCode(projection(t)));
}

var comparer = GenericEqualityComparer<YourObjectType>.Create( o => o.PropertyToCompare); 

Your performance will go down the drain. Distinct and Except are efficient operations when implemented on set data structures. By providing a constant hash value you essentially destroy this characteristic and force naive algorithm using a linear search.

You need to see whether this is acceptable for your data volume. But for somewhat larger data sets, the difference will be pronounced. For example, Except will increase from expected time O(n) to O(n²), which can be a big deal.

Rather than providing a constant, why not just call the object's own GetHashCode method? It may not give a particularly good value but it cannot be worse than using a constant, and correctness will still be preserved unless the GetHashCode method of the object is overridden to return wrong values.

链接地址: http://www.djcxy.com/p/85328.html

上一篇: 关于Android权限和签名保护级别

下一篇: 通用IEqualityComparer <T>和GetHashCode