Internal Implementation of HashMap

In this article, we are going to see how HashMap internally works in Java. Also, we will have a look at changes made in Java 8 to make implementation of HashMap faster.

HashMap is the data structure used in Java to store key-value pairs, where the average retrieval time for get() and put() operations is constant i.e. O(1). To learn more about HashMap Please visit this article.

Hashing Principle :

Hashing is a method used to produce an integer value from an object and this integer value known as the hash value. In HashMap, the key object is used for Hashing.

Internal Structure :

For internal working of HashMap, HashMap maintains an array of bucket, each bucket is a linked-list and linked list is a list of nodes wherein each node contains key-value pair.

After Java 8, a bucket can be a linked-list or a binary tree depending upon the Threshold

Bucket Table :

The bucket table is also known as the array of bucket of HashMap in java. HashMap uses hashing method to calculate the index of array or which Bucket is going to be used.

It is represented as :

// Each Node of the Array is a Bucket
Node<K,V>[] table;

Bucket :

A bucket is a linked list of nodes where each node is an instance of class Node<K,V>. The key of the node is used to generate the hash value and this hash value is used to find the bucket from Bucket Table.

Note: When two or more keys generate the same hash, then it called a Hash Collision. In that case, the new node adds to the linked list

Node :

Each node of the Linked List is an instance of class Node<K,V> and actually store key-value pair that we are storing in the HashMap. This class is an static inner class of HashMap class and it implements the Map.Entry<K,V> interface. The representation of this Node as :

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
}

final int hash : hash value of the key
final K key : Key of the node
V value : Value of the node
Node<K,V> next : Pointer to the next node present in the bucket or linked-list

Index Calculation in HashMap :

when we store or retrieve any key-value pair, HashMap calculates the index of the bucket for each and every operation.
The Key object is used to calculate the index of the bucket. By using this key, hash value is calculated using hash(key) private method of HashMap.

hash(key) method is a private method of HashMap that returns the hash value of the key, also if the hash value is too large then converts it into a smaller hash value.

if hash value of Key Object returns the integer that is greater than the size of the array i.e., hash(key) > n, then ArrayOutOfBoundsException could be raised. To handle this situation, HashMap reduces the hash value between 0 and n-1 using an expression :

index = hash(key) &amp; (n-1)

Now, this index value is generated is used by HashMap to find bucket location and can never generate any Exception as the index value always from 0 to n-1.

hashCode() method :

hashCode() method is used to get the hash Code of an object. hashCode() method of object class returns the memory reference of object in integer form. Definition of hashCode() method is public native hashCode(). It indicates the implementation of hashCode() is native because there is not any direct method in java to fetch the reference of object. It is possible to provide your own implementation of hashCode().
In HashMap, hashCode() is used to calculate the bucket and therefore calculate the index.

Now, this index value is generated is used by HashMap to find bucket location and can never generate any Exception as the index value always from 0 to n-1.

Note :
a) If the key is null then the hash value returned by the hash(key) private method of HashMap will be 0.
b) If the key.hashCode() is too large then hash(key) private method of HashMap converts it into smaller number.

equals() method :

equals method is used to check that 2 objects are equal or not. This method is provided by Object class. You can override this in your class to provide your own implementation.
HashMap uses equals() to compare the key whether the are equal or not. If equals() method return true, they are equal otherwise not equal.

Empty HashMap :

Initially HashMap size is taken as 16

HashMap map = new HashMap();

Inserting Key-Value Pair :

map.put(new Key("Sumanth"), 31);

Calculate hash code of Key {“Sumanth”}. It will be generated as 90.
Calculate index by using index method it will be 6.
Create a node object

{
  int hash = 90
  // {"Sumanth"} is not a string but 
  // an object of class Key
  Key key = {"Sumanth"}
  Integer value = 31
  Node next = null
}

Place this object at index 6, if no other object is presented there.

Now HashMap becomes :

*HashMap after inserting 1st key-value pair*

Inserting another Key-Value Pair :

map.put(new Key("sachin"), 39);

Follow the above steps for calculating hashcode and index. So the node object looks like :

{
  int hash = 98
  Key key = {"sachin"}
  Integer value = 39
  Node next = null
}

Place this object at index 3 if no other object is presented there.
Now HashMap becomes :

HashMap after inserting another key-value pair

Hash Collision :

map.put(new Key("Durgesh"), 50);

Calculate hash code of Key {“Durgesh”}. It will be generated as 90.
Calculate index by using index method it will be 6.
Create a node object as :

 {
  int hash = 90
  Key key = {"Durgesh"}
  Integer value = 50
  Node next = null
}

Place this object at index 6 if no other object is presented there.
In this case a node object is found at the index 6 – this is a case of collision.
In that case, check via hashCode() and equals() method that if both the keys are same.
If keys are same, replace the value with current value.
Otherwise connect this node object to the previous node object via linked list and both are stored at index 6.
Now HashMap becomes :

Working of get() method :

1. The key is used to calculate the hash value by calling private hash(key)method, internally hash(key) method call hashCode() method of key.

hash = hash(key);

HashMap can store one null value as the key. In that case, the hash value returned by the hash(key) method will be 0 and 0th bucket location will be used.

2. The above hash is reduced from 0 to n-1 to calculate the index of bucket (where n is the size of array of bucket).

index = hash &amp; (n-1);

3. If the bucket is null, then null will be returned.
Else the first node of the indexed bucket is fetched and key of that node(k) is compared with the key (provided in the get(key) method of HashMap), comparison is done by equals() method i.e., key.equals(k).
If the equals() method returned true then value of that node is returned otherwise next pointer of the node is used to fetched the next node in the list and apply the same procedure to that node. This process goes on till equals() returned true or next pointer is null. In case of next pointer is null then null is returned (which represents key is not present in the bucket or we can say not present in the HashMap).

Load Factor :

Load Factor is a factor that is used by HashMap internally to decide when the size of Bucket array needs to be increased. It is 0.75 by default i.e., when the number of nodes in the HashMap is more than 75% of Total Capacity then the HashMap grows its bucket array size.

Initial Capacity :

Initial Capacity is a measure used internally by HashMap that represents the number of bucket or size of the bucket array at the time of creation of HashMap. By default the initial capacity is 2⁴ = 16.

The capacity of HashMap always doubled each time when HashMap needs to increased its bucket array size i.e., 2⁴ = 16 (initially), 2⁵ = 32, 2⁶ = 64 etc.

When to increase the size of bucket array in the HashMap?

As we know, both load factor and available capacity together is used by HashMap to decide when to increase the size of bucket array.

Load Factor : 0.75 
Initial Capacity : 16 (Available Capacity initially) 
Load Factor * Available Capacity = 0.75 * 16 = 12 
So, at the time when 12+1 = 13^th key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 16*2 = 32. 
Now Available capacity : 32 
---------------------------------- 
Next Time when 
Load Factor * Available Capacity = 0.75 * 32 = 24 i.e., 25^th key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 32*2 = 64 and so on.

Rehashing :

When HashMap grows its bucket array size, then Rehashing is done. Re-Hashing is a process where bucket index is calculated for each node again with respect to the updated size of the bucket array.

Java 8 improvements for internal working of HashMap :

HashMap in java 8, maintains a value called TREEIFY_THRESHOLD, it is an Integer Constant and currently the value of TREEIFY_THRESHOLD is 8. It is used as whenever in any bucket the number of nodes becomes more than this Threshold value then the data structure of that bucket is convert from linked-list to balanced tree. This change is done to make put-get operation faster as the linked-list provides O(n) time complexity while the time required to traverse the balance tree is O(log n). So that’s how java 8 made HashMap much faster than its old version.

Conclusion

In this post, we learn what is hashing, the internal structure of hashmap, how HashMap works internally in java to store and retrieve key-value pair and the changes made by java 8. This is a common interview question on HashMap.

Thanks, Note

Thanks for spending your valuable time reading this post. Hope this post taught you something new today. Please share the post if you like it.