Not many Java programmers know that HashSet is internally implemented using HashMap in Java, so if you know How HashMap works internally in Java, more likely you can figure out how HashSet works in Java. But, now a curious Java developer can question that, how come HashSet uses HashMap because you need a key-value pair to use with Map, while in HashSet we only store one object. Good question, isn't it? If you remember some functionality of the earlier class, then you know that HashMap allows duplicate values, and this property is exploited while implementing HashSet in Java.
Since HashSet implements Set interface it needs to guarantee uniqueness and this is achieved by storing elements as keys with the same value always. Things get clear by checking HashSet.java from the JDK source code.
All you need to look at is, how elements are stored in HashSet and how they are retrieved from HashSet. Since HashSet doesn't provide any direct method for retrieving objects e.g. get(Key key) from HashMap or get(int index) from List, the only way to get objects from HashSet is via Iterator. See here for code examples of iterating over HashSet in Java.
When you create an object of HashSet in Java, it internally creates an instance of backup Map with default initial capacity 16 and default load factor 0.75 as shown below :
/**
Now let's see the code for add() and iterate() method from java.util.HashSet in Java to understand how HashSet works internally in Java.
Since PRESENT is a constant, for all keys we have the same value in backup HashMap called map.
Since HashSet implements Set interface it needs to guarantee uniqueness and this is achieved by storing elements as keys with the same value always. Things get clear by checking HashSet.java from the JDK source code.
All you need to look at is, how elements are stored in HashSet and how they are retrieved from HashSet. Since HashSet doesn't provide any direct method for retrieving objects e.g. get(Key key) from HashMap or get(int index) from List, the only way to get objects from HashSet is via Iterator. See here for code examples of iterating over HashSet in Java.
When you create an object of HashSet in Java, it internally creates an instance of backup Map with default initial capacity 16 and default load factor 0.75 as shown below :
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); }
Now let's see the code for add() and iterate() method from java.util.HashSet in Java to understand how HashSet works internally in Java.
How Object is stored in HashSet in Java?
As you can see below, a call to add(Object) is a delegate to put(Key, Value) internally, where Key is the object you have passed and value is another object, called PRESENT, which is a constant in java.util.HashSet as shown below :private transient HashMap<E,Object> map; // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); public boolean add(E e) { return map.put(e, PRESENT)==null; }
Since PRESENT is a constant, for all keys we have the same value in backup HashMap called map.
How Object is retrieved from HashSet?
Now let's see the code for getting iterator for traversing over HashSet in Java. iterator() method from java.util.HashSet class returns iterator for backup Map returned by map.keySet().iterator() method./**
* Returns an iterator over the elements in this set. The elements * are returned in no particular order. * * @return an Iterator over the elements in this set * @see ConcurrentModificationException */ public Iterator<E> iterator() { return map.keySet().iterator(); }
How to use HashSet in Java - Example
Using HashSet in Java is very simple, don't think it is Map but think more like Collection i.e. add elements by using add() method, check its return value to see if the object already existed in HashSet or not. Similarly use an iterator for retrieving elements from HashSet in Java.You can also use contains() method to check if an object already exists in HashSet or not. This method use equals() method for comparing object for matching. You can also use the remove() method to remove objects from HashSet. Since the element of HashSet is used as a key in backup HashMap, they must implement the equals() and hashCode() method.
Immutability is not a requirement but if it's immutable then you can assume that object will not be changed during its stay on set. Following example demonstrate basic usage of HashSet in Java, for more advanced example, you can check this tutorial.
That's all about How HashSet is implemented in Java and How HashSet works internally. As I said, If you how HashMap internally in Java, you can explain the working of HashSet provided, you know it uses the same values for all keys.
import java.util.HashSet; import java.util.Iterator; /** * Java Program to demonstrate How HashSet works internally in Java. * @author http://java67.com */ public class HashSetDemo{ public static void main(String args[]) { HashSet<String> supportedCurrencies = new HashSet<String>(); // adding object into HashSet, this will be translated to put() calls supportedCurrencies.add("USD"); supportedCurrencies.add("EUR"); supportedCurrencies.add("JPY"); supportedCurrencies.add("GBP"); supportedCurrencies.add("INR"); supportedCurrencies.add("CAD"); // retrieving object from HashSet Iterator<String> itr = supportedCurrencies.iterator(); while(itr.hasNext()){ System.out.println(itr.next()); } } } Output JPY EUR INR USD CAD GBP
That's all about How HashSet is implemented in Java and How HashSet works internally. As I said, If you how HashMap internally in Java, you can explain the working of HashSet provided, you know it uses the same values for all keys.
Remember to override equals() and hashCode() for any object you are going to store in HashSet, since your object is used as a key in backup Map, it must override those methods. Make your object Immutable or effective immutable if possible.
Now, is the quiz time? What is difference between HashSet and TreeSet and LinkedHashSet in Java? Can you pass a HashSet if a method except a TreeSet?
Little confusion.
ReplyDeleteAt the starting you said elements are stored as keys with same value always but later you said the value for all keys is constant which is PRESENT.
Which of these is correct?
@Hi, He is right. Object we are adding to the set is internally added as a Key to the HashMap and value for the HahsMap in the internal implementation of HashSet will be same "PRESENT". Hence we are only responsible for our object value that we are adding to the set.
DeleteThanks
Kumar Shorav
As per my understanding I guess values are stored as null for all keys.
ReplyDeleteThat's not true, PRESENT object is stored as values for all keys. Look at the below code from add() method of HashSet :
Deletepublic boolean add(E e) {
return map.put(e, PRESENT)==null;
}
You should not have doubt after looking at this code :)
why would anyone like to store null values ?.
ReplyDeleteAnd in fact values are never stored ,only the references pointing to those values(ie objects) are stored in any type of collection.
That's correct Amitesh, No null values are stored, only PRESENT is stored, and that too is just one object, it's just references which are stored.
DeleteBut why HashSet cant have its own implementation , why does it use HashMap, why do we have to store "PRESENT" constant unnecerrily
ReplyDeleteWell Prateek, this is just one way to implement HashSet. The advantage of this approach is that they don't have to implement HashSet from scratch, they can reuse HashMap code to the heavy lifting.
DeleteVery helpful article. I am becoming a very BIG fan of your blog. The content you write is extremely simple to understand. Thanks!
ReplyDeleteThank You Jaya. If you like article, don't forget to share with your friends :-)
Deletein case of hashmap we will have a linked list in each bucket and will store the key's having same hash code in a linked list. Is it the same here too?
ReplyDeleteYes, that's true because it internally uses HashMap, so whatever implementation of HashMap is available it also applicable to HashSet. For Example, from Java 8 onwards, HashMap will switch to tree data structure instead of linked list if collision crosses a certain threshold to keep HashMap performance healthy. Because in case of heavy collision your Map will turn into a linked list and a get() will take O(n) instead of intended O(1).
ReplyDeletegreat info thanks.
DeleteHii, my question is what is table in hashmap
ReplyDeleteHello,
ReplyDeleteif load factor is 0.75 and initial capacity is 16.
After inserting 12 elements what will be the new Capacity?
I was wondering why does not the Set range also have something like 'SetMapHash', so that you can have index-based access (for fast retrieval) and also ensuring the uniqueness of stored objects?
ReplyDeleteHi,
ReplyDeleteCan anyone explain "if set intenally implements HashMap then HashSet should allow atleast 1 null"..
Yes, HashSet allows only one null value.
DeleteHi, Thank you for such wonderful explanation.
ReplyDelete1. Please clear my doubt, when you say "Immutability is not requirement but if its immutable then you can assume that object will not be changed during its stay on set". When can the objects change in absence of immutability.
2.
2. How hashset reacts if new object's hashcode is same as of existing object. So, probably as the underlying structure is map, so, a linked list will be formed but what will happen if we try to iterate that hashset.
Great Concepts
ReplyDeleteinternals are covered brilliantly
ReplyDelete