HashSet
4 mins
HashSet is the most commonly used Set implementation in Java, backed by a HashMap instance to provide O(1) average-time performance for basic operations.
Source Code #
Core Implementation #
HashSet is a thin wrapper around HashMap, using the map’s keys to store set elements:
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
Implementation Details #
Construction #
public HashSet() {
map = new HashMap<>();
}
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
public HashSet(int initialCapacity) {
map = new HashMap<>(initialCapacity);
}
// Package private constructor for LinkedHashSet
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<>(initialCapacity, loadFactor);
}
Add Operation #
public boolean add(E e) {
return map.put(e, PRESENT) == null;
}
The add operation:
- Uses the element as a key in the backing HashMap
- Associates it with the dummy
PRESENTvalue - Returns
trueif the key was not already present (map.put returned null) - Returns
falseif the element was already in the set
Remove Operation #
public boolean remove(Object o) {
return map.remove(o) == PRESENT;
}
The remove operation:
- Removes the key from the backing HashMap
- Returns
trueif the key was present (map.remove returned PRESENT) - Returns
falseif the key was not found
Contains Operation #
public boolean contains(Object o) {
return map.containsKey(o);
}
Size Operation #
public int size() {
return map.size();
}
Clear Operation #
public void clear() {
map.clear();
}
Iterator Support #
public Iterator<E> iterator() {
return map.keySet().iterator();
}
The iterator is obtained from the keySet of the backing HashMap, which means:
- It iterates over the elements in no particular order
- It supports removal via
Iterator.remove() - It’s fail-fast (throws
ConcurrentModificationExceptionif set is modified during iteration)
Spliterator Support #
public Spliterator<E> spliterator() {
return map.keySet().spliterator();
}
Clone Operation #
@SuppressWarnings("unchecked")
public Object clone() {
try {
HashSet<E> newSet = (HashSet<E>) super.clone();
newSet.map = (HashMap<E, Object>) map.clone();
return newSet;
} catch (CloneNotSupportedException e) {
throw new InternalError(e);
}
}
Serialization #
private void writeObject(java.io.ObjectOutputStream s) throws IOException {
// Write out any hidden serialization magic to the ObjectOutputStream
s.defaultWriteObject();
// Write out size
s.writeInt(map.size());
// Write out all elements in the proper order (they are stored in
// HashMap without ordering)
for (E e : map.keySet())
s.writeObject(e);
}
private void readObject(java.io.ObjectInputStream s)
throws IOException, ClassNotFoundException {
// Read in any hidden serialization magic
s.defaultReadObject();
// Read in size
int size = s.readInt();
// Read in all elements in the proper order, ensuring that
// duplicate elements are not added
map = (((HashSet<?>)this) instanceof LinkedHashSet
? new LinkedHashMap<E,Object>(size, .75f)
: new HashMap<E,Object>(size, .75f));
for (int i = 0; i < size; i++) {
@SuppressWarnings("unchecked")
E e = (E) s.readObject();
map.put(e, PRESENT);
}
}
Performance Characteristics #
Since HashSet is backed by HashMap, it inherits the same performance characteristics:
Time Complexity:
add(E e): O(1) average, O(n) worst caseremove(Object o): O(1) average, O(n) worst casecontains(Object o): O(1) average, O(n) worst casesize(): O(1)iterator().next(): O(1) average
Space Complexity: O(n) where n is the number of elements
Memory Overhead: Slightly higher than HashMap due to the dummy PRESENT object
Key Properties #
- No Duplicates: Enforces the Set contract - no duplicate elements
- Null Elements: Permits at most one null element
- Unordered: Does not guarantee any particular iteration order
- Not Thread-Safe: Must be externally synchronized for concurrent use
- Fail-Fast Iterators: Iterators throw
ConcurrentModificationExceptionif set is modified during iteration
Best Practices #
- Choose appropriate initial capacity: Prevents frequent rehashing
- Use immutable elements: Elements should not change their
hashCode()after insertion - Consider alternatives:
LinkedHashSetfor insertion-order iterationTreeSetfor sorted elementsConcurrentHashMap.newKeySet()for thread-safe sets
- Prefer for membership testing: HashSet provides O(1)
contains()operations - Use proper equals/hashCode: Elements must properly implement these methods
Common Pitfalls #
- Modifying elements in set: Changing an element’s properties that affect
hashCode()can break the set - Concurrent modification: Modifying set while iterating causes
ConcurrentModificationException - Performance degradation: Poor
hashCode()implementation causes many collisions - Memory leaks: Using mutable objects that change their
hashCode() - Null handling: Only one null element allowed, but it’s easy to accidentally add multiple
Comparison with Other Set Implementations #
| Implementation | Ordering | Nulls Allowed | Performance | Thread-Safe |
|---|---|---|---|---|
| HashSet | Unordered | 1 null | O(1) average | No |
| LinkedHashSet | Insertion order | 1 null | O(1) average | No |
| TreeSet | Sorted (natural or comparator) | No nulls | O(log n) | No |
| ConcurrentHashMap.newKeySet() | Unordered | 1 null | O(1) average | Yes |
Internal HashMap Configuration #
HashSet uses specific HashMap constructors:
- Default:
new HashMap<>(initial capacity 16, load factor 0.75) - With collection:
new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16)) - With capacity/load factor:
new HashMap<>(initialCapacity, loadFactor) - LinkedHashSet:
new LinkedHashMap<>(initialCapacity, loadFactor)
The load factor of 0.75 provides a good balance between time and space costs.