Friday, September 22, 2023

How HashSet works in Java [Explained with Example]

Not many Java programmers know that HashSet is internally implemented using HashMap in Java, so if you know How HashMap works internally in Java, more likely you can figure out how HashSet works in Java. But, now a curious Java developer can question that, how come HashSet uses HashMap because you need a key-value pair to use with Map, while in HashSet we only store one object. Good question, isn't it? If you remember some functionality of the earlier class, then you know that HashMap allows duplicate values, and this property is exploited while implementing HashSet in Java.


Since HashSet implements Set interface it needs to guarantee uniqueness and this is achieved by storing elements as keys with the same value always. Things get clear by checking HashSet.java from the JDK source code.

All you need to look at is, how elements are stored in HashSet and how they are retrieved from HashSet. Since HashSet doesn't provide any direct method for retrieving objects e.g. get(Key key) from HashMap or get(int index) from List, the only way to get objects from HashSet is via Iterator. See here for code examples of iterating over HashSet in Java.

When you create an object of HashSet in Java, it internally creates an instance of backup Map with default initial capacity 16 and default load factor 0.75 as shown below :

/**
  * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
  * default initial capacity (16) and load factor (0.75).
  */

public HashSet() {
   map = new HashMap<>();
}

Now let's see the code for add() and iterate() method from java.util.HashSet in Java to understand how HashSet works internally in Java.




How Object is stored in HashSet in Java?

As you can see below, a call to add(Object) is a delegate to put(Key, Value) internally, where Key is the object you have passed and value is another object,  called PRESENT, which is a constant in java.util.HashSet as shown below :

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

public boolean add(E e) {
   return map.put(e, PRESENT)==null;
}

Since PRESENT is a constant, for all keys we have the same value in backup HashMap called map.

How HashSet Internally Works in Java [Explained]





How Object is retrieved from HashSet?

Now let's see the code for getting iterator for traversing over HashSet in Java. iterator() method from java.util.HashSet class returns iterator for backup Map returned by map.keySet().iterator() method.

       /**
     * Returns an iterator over the elements in this set.  The elements
     * are returned in no particular order.
     *
     * @return an Iterator over the elements in this set
     * @see ConcurrentModificationException
     */

    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }








How to use HashSet in Java - Example

Using HashSet in Java is very simple, don't think it is Map but think more like Collection i.e. add elements by using add() method, check its return value to see if the object already existed in HashSet or not. Similarly use an iterator for retrieving elements from HashSet in Java. 

You can also use contains() method to check if an object already exists in HashSet or not. This method use equals() method for comparing object for matching. You can also use the remove() method to remove objects from HashSet. Since the element of HashSet is used as a key in backup HashMap, they must implement the equals() and hashCode() method. 

Immutability is not a requirement but if it's immutable then you can assume that object will not be changed during its stay on set. Following example demonstrate basic usage of HashSet in Java, for more advanced example, you can check this tutorial.    

import java.util.HashSet;
import java.util.Iterator;


/**
 * Java Program to demonstrate How HashSet works internally in Java.
 * @author http://java67.com
 */

public class HashSetDemo{
     
   public static void main(String args[]) {

      HashSet<String> supportedCurrencies = new HashSet<String>();              

      // adding object into HashSet, this will be translated to put() calls
      supportedCurrencies.add("USD");
      supportedCurrencies.add("EUR");
      supportedCurrencies.add("JPY");
      supportedCurrencies.add("GBP");
      supportedCurrencies.add("INR");
      supportedCurrencies.add("CAD");

      // retrieving object from HashSet
      Iterator<String> itr = supportedCurrencies.iterator();
      while(itr.hasNext()){
         System.out.println(itr.next());

      }
   }

}

Output
JPY
EUR
INR
USD
CAD
GBP

That's all about How HashSet is implemented in Java and How HashSet works internally. As I said, If you how HashMap internally in Java, you can explain the working of HashSet provided,  you know it uses the same values for all keys.

Remember to override equals() and hashCode() for any object you are going to store in HashSet, since your object is used as a key in backup Map, it must override those methods. Make your object Immutable or effective immutable if possible.

Now, is the quiz time? What is difference between HashSet and TreeSet and LinkedHashSet in Java? Can you pass a HashSet if a method except a TreeSet?

21 comments:

  1. Little confusion.
    At the starting you said elements are stored as keys with same value always but later you said the value for all keys is constant which is PRESENT.
    Which of these is correct?

    ReplyDelete
    Replies
    1. @Hi, He is right. Object we are adding to the set is internally added as a Key to the HashMap and value for the HahsMap in the internal implementation of HashSet will be same "PRESENT". Hence we are only responsible for our object value that we are adding to the set.
      Thanks
      Kumar Shorav

      Delete
  2. As per my understanding I guess values are stored as null for all keys.

    ReplyDelete
    Replies
    1. That's not true, PRESENT object is stored as values for all keys. Look at the below code from add() method of HashSet :
      public boolean add(E e) {
      return map.put(e, PRESENT)==null;
      }

      You should not have doubt after looking at this code :)

      Delete
  3. why would anyone like to store null values ?.
    And in fact values are never stored ,only the references pointing to those values(ie objects) are stored in any type of collection.

    ReplyDelete
    Replies
    1. That's correct Amitesh, No null values are stored, only PRESENT is stored, and that too is just one object, it's just references which are stored.

      Delete
  4. But why HashSet cant have its own implementation , why does it use HashMap, why do we have to store "PRESENT" constant unnecerrily

    ReplyDelete
    Replies
    1. Well Prateek, this is just one way to implement HashSet. The advantage of this approach is that they don't have to implement HashSet from scratch, they can reuse HashMap code to the heavy lifting.

      Delete
  5. Very helpful article. I am becoming a very BIG fan of your blog. The content you write is extremely simple to understand. Thanks!

    ReplyDelete
    Replies
    1. Thank You Jaya. If you like article, don't forget to share with your friends :-)

      Delete
  6. in case of hashmap we will have a linked list in each bucket and will store the key's having same hash code in a linked list. Is it the same here too?

    ReplyDelete
  7. Yes, that's true because it internally uses HashMap, so whatever implementation of HashMap is available it also applicable to HashSet. For Example, from Java 8 onwards, HashMap will switch to tree data structure instead of linked list if collision crosses a certain threshold to keep HashMap performance healthy. Because in case of heavy collision your Map will turn into a linked list and a get() will take O(n) instead of intended O(1).

    ReplyDelete
  8. Hii, my question is what is table in hashmap

    ReplyDelete
  9. Hello,

    if load factor is 0.75 and initial capacity is 16.
    After inserting 12 elements what will be the new Capacity?

    ReplyDelete
  10. I was wondering why does not the Set range also have something like 'SetMapHash', so that you can have index-based access (for fast retrieval) and also ensuring the uniqueness of stored objects?

    ReplyDelete
  11. Hi,
    Can anyone explain "if set intenally implements HashMap then HashSet should allow atleast 1 null"..

    ReplyDelete
  12. Hi, Thank you for such wonderful explanation.
    1. Please clear my doubt, when you say "Immutability is not requirement but if its immutable then you can assume that object will not be changed during its stay on set". When can the objects change in absence of immutability.
    2.

    2. How hashset reacts if new object's hashcode is same as of existing object. So, probably as the underlying structure is map, so, a linked list will be formed but what will happen if we try to iterate that hashset.


    ReplyDelete
  13. internals are covered brilliantly

    ReplyDelete

Feel free to comment, ask questions if you have any doubt.