Java Glossary : hashCode

CMP home Java glossary home Menu no menu Last updated 2004-06-28 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : H words : hashCode.

hashCode
The key thing to understand about hashCodes is that they need not be unique, just as close to unique as practically possible.

HashCodes in a Nutshell

If you want to file something away for later retrieval, it can be faster if you file it numerically rather than by a long alphabetic key. A hashCode is a way of computing a small (32-bit) digest numeric key from a long String or even an arbitrary clump of bytes. The numeric key itself is meaningless and the hashCode functions for computing them can look a bit insane. However, when you go to look for something, you can do the same digest calculation on the long alphabetic key you are looking for, and no matter how bizarre an algorithm you used, you will calculate the same hashCode, and will be able to look up numerically with it. Of course there is always the possibility two different Strings will have the same digest hashCode. However, even then, all is not lost; it greatly narrows down the search, hence speeding it up.

A Hashtable goes a step further, scrunching down the hashCode even further to an even smaller number that it can use to directly index an array, usually by dividing it by some (ideally prime) number and taking the remainder.

String.hashCode Implementation

In JDK 1.0.x and 1.1.x the hashCode function for long Strings worked by sampling every nth character. This pretty well guaranteed you would have many Strings hashing to the same value, thus slowing down Hashtable lookup. In JDK 1.2 the function has been improved to multiply the result so far by 31 then add the next character in sequence. This is a little slower, but is much better at avoiding collisions.

The Gemini Twins: equals and hashCode

The default hashCode just uses the address of the Object and the default equals method just compares addresses. If you override one of these two methods, you must override the other to match. The rules are:

So if you had a Fruit object with a flavour and colour field, and you decided that any two objects with the same flavour were for all intents and purposes equal, you would define your equals and hashCode methods like this:


view

Calculating hashCodes

The xor operator is useful in computing hashing functions. To create a hashCode based on two fields, compute the hashCodes of the two fields separately and xor them together with the ^ operator. To create an hash on all the elements of an array you could xor all the values together.

Here is another approach that would work better if you had two Strings in your object. It gives a different hash code for your two objects when:

// object1 has
o1.string1 = "apple";
o1.string2 = "orange";
// and object2 has
o2.string1 = "orange";
o2.string2 = "apple";

It works like this to combine hash codes of the fields in your object:

/**
 * hashCode that combines two strings
 * @return a hash code value on the pair of strings.
 */
public int hashCode()
   {
   int result = 17;
   result = 37 * result + string1.hashCode();
   result = 37 * result + string2.hashCode();
   // etc for all fields in the object
   return result;
   }

Here is roughly how String.hashCode works inside:

/**
 * simplified version of how String.hashCode works.
 * For the actual version see src.jar java/lang/String.java
 * @return a hash code value for this String.
 */
public int hashCode()
   {
   // inside the String is a char[] val that holds the characters.
   int hash = 0;
   int len = char.length();
   for ( int i=0; i<len; i++ )
      {
      hash = 31 * hash + val[i];
      }
   return hash;
   }

Here is a fast hash algorithm you can apply to bytes, short, chars, ints, arrays etc. I used an assembler version of it it my BBL Forth compiler hashcode where it is essentially implemented in two instructions, ROL and XOR.

/**
 * Java version of the assembler hashCode used in the BBL Forth compiler.
 * @return a hash code value for the byte[] val byte array in this object.
 */
public int hashCode()
   {
   //  byte[] val  holds the data.
   int hash = 0;
   int len = val.length();
   for ( int i=0; i<len; i++ )
      {
      // rotate left and xor (very fast in assembler, a bit clumsy in Java)
      hash <<= 1;
      if ( hash < 0 )
         {
         hash |= 1;
         }
      hash ^= val[i];
      }
   return hash;
   }

When Do You Need A Custom equals and hashCode?

The hashCode method only gets invoked when you use the object as the key to a Hashtable. It is not used when the object is merely a Hashtable value. Most of the time your Hashtable keys are simple Strings, so you rarely need to write custom equals and hashCode methods. When you use a HashSet to help you check for duplicate objects, then you likely will need a custom equals and hashCode method. There your objects act as both key and value.

If you know the key values in advance, it is possible to construct a hashCode function that has no collisions.


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 8220.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/hashcode.html J:\mindprod\jgloss\hashcode.html