How to ensure hashCode() is consistent with equals()?

2018-06-25 14:04:51

When overriding the equals() function of java.lang.Object, the javadocs suggest that,

it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)

How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?


public class People{
   public String name;
   public int age;

   public int hashCode(){
      // How to get a unique integer based on name and age?
   }
}
/*******************************/
public class App{
   public static void main( String args[] ){
       People mike = new People();
       People melissa = new People();
       mike.name = "mike";
       mike.age = 23;
       melissa.name = "melissa";
       melissa.age = 24;
       System.out.println( mike.hasCode() );  // output?
       System.out.println( melissa.hashCode(); // output?
   }
}

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

For example, here's a hashCode function that Idea generates for your People class:

public int hashCode() {
    int result = name != null ? name.hashCode() : 0;
    result = 31 * result + age;
    return result;
}

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like

public boolean equals(Object obj) {
    if (this==obj) return true;
    if (obj==null) return false;
    if (!(getClass().equals(obj.getClass())) return false;
    Person other = (Person)obj;
    return (name==null ? other.name==null : name.equals(other.name)) &&
        age==other.age;
}

Any time you override equals you must override hashCode . Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age;
}

Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.

public int hashCode() {
    return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}

Also, it's perfectly valid for two non-equals objects to have the same hash code:

public int hashCode() {    
    return age;    
}

In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.

A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.

Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.

You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.

The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.

One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.

Edit based on comments:

1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).

2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

链接地址: http://www.djcxy.com/p/71622.html

上一篇: 如何获取重写hashCode（）的对象的唯一ID？

下一篇: 如何确保hashCode（）与equals（）一致？