Skip to main content
  1. Java Collections Framework/

HashSet

4 mins

HashSet is the most commonly used Set implementation in Java, backed by a HashMap instance to provide O(1) average-time performance for basic operations.

Source Code #

View Source on GitHub

Core Implementation #

HashSet is a thin wrapper around HashMap, using the map’s keys to store set elements:

private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

Implementation Details #

Construction #

public HashSet() {
    map = new HashMap<>();
}

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

public HashSet(int initialCapacity) {
    map = new HashMap<>(initialCapacity);
}

// Package private constructor for LinkedHashSet
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
    map = new LinkedHashMap<>(initialCapacity, loadFactor);
}

Add Operation #

public boolean add(E e) {
    return map.put(e, PRESENT) == null;
}

The add operation:

  1. Uses the element as a key in the backing HashMap
  2. Associates it with the dummy PRESENT value
  3. Returns true if the key was not already present (map.put returned null)
  4. Returns false if the element was already in the set

Remove Operation #

public boolean remove(Object o) {
    return map.remove(o) == PRESENT;
}

The remove operation:

  1. Removes the key from the backing HashMap
  2. Returns true if the key was present (map.remove returned PRESENT)
  3. Returns false if the key was not found

Contains Operation #

public boolean contains(Object o) {
    return map.containsKey(o);
}

Size Operation #

public int size() {
    return map.size();
}

Clear Operation #

public void clear() {
    map.clear();
}

Iterator Support #

public Iterator<E> iterator() {
    return map.keySet().iterator();
}

The iterator is obtained from the keySet of the backing HashMap, which means:

  • It iterates over the elements in no particular order
  • It supports removal via Iterator.remove()
  • It’s fail-fast (throws ConcurrentModificationException if set is modified during iteration)

Spliterator Support #

public Spliterator<E> spliterator() {
    return map.keySet().spliterator();
}

Clone Operation #

@SuppressWarnings("unchecked")
public Object clone() {
    try {
        HashSet<E> newSet = (HashSet<E>) super.clone();
        newSet.map = (HashMap<E, Object>) map.clone();
        return newSet;
    } catch (CloneNotSupportedException e) {
        throw new InternalError(e);
    }
}

Serialization #

private void writeObject(java.io.ObjectOutputStream s) throws IOException {
    // Write out any hidden serialization magic to the ObjectOutputStream
    s.defaultWriteObject();
    
    // Write out size
    s.writeInt(map.size());
    
    // Write out all elements in the proper order (they are stored in
    // HashMap without ordering)
    for (E e : map.keySet())
        s.writeObject(e);
}

private void readObject(java.io.ObjectInputStream s)
    throws IOException, ClassNotFoundException {
    // Read in any hidden serialization magic
    s.defaultReadObject();
    
    // Read in size
    int size = s.readInt();
    
    // Read in all elements in the proper order, ensuring that
    // duplicate elements are not added
    map = (((HashSet<?>)this) instanceof LinkedHashSet 
           ? new LinkedHashMap<E,Object>(size, .75f)
           : new HashMap<E,Object>(size, .75f));
    
    for (int i = 0; i < size; i++) {
        @SuppressWarnings("unchecked")
        E e = (E) s.readObject();
        map.put(e, PRESENT);
    }
}

Performance Characteristics #

Since HashSet is backed by HashMap, it inherits the same performance characteristics:

  • Time Complexity:

    • add(E e): O(1) average, O(n) worst case
    • remove(Object o): O(1) average, O(n) worst case
    • contains(Object o): O(1) average, O(n) worst case
    • size(): O(1)
    • iterator().next(): O(1) average
  • Space Complexity: O(n) where n is the number of elements

  • Memory Overhead: Slightly higher than HashMap due to the dummy PRESENT object

Key Properties #

  1. No Duplicates: Enforces the Set contract - no duplicate elements
  2. Null Elements: Permits at most one null element
  3. Unordered: Does not guarantee any particular iteration order
  4. Not Thread-Safe: Must be externally synchronized for concurrent use
  5. Fail-Fast Iterators: Iterators throw ConcurrentModificationException if set is modified during iteration

Best Practices #

  1. Choose appropriate initial capacity: Prevents frequent rehashing
  2. Use immutable elements: Elements should not change their hashCode() after insertion
  3. Consider alternatives:
    • LinkedHashSet for insertion-order iteration
    • TreeSet for sorted elements
    • ConcurrentHashMap.newKeySet() for thread-safe sets
  4. Prefer for membership testing: HashSet provides O(1) contains() operations
  5. Use proper equals/hashCode: Elements must properly implement these methods

Common Pitfalls #

  1. Modifying elements in set: Changing an element’s properties that affect hashCode() can break the set
  2. Concurrent modification: Modifying set while iterating causes ConcurrentModificationException
  3. Performance degradation: Poor hashCode() implementation causes many collisions
  4. Memory leaks: Using mutable objects that change their hashCode()
  5. Null handling: Only one null element allowed, but it’s easy to accidentally add multiple

Comparison with Other Set Implementations #

ImplementationOrderingNulls AllowedPerformanceThread-Safe
HashSetUnordered1 nullO(1) averageNo
LinkedHashSetInsertion order1 nullO(1) averageNo
TreeSetSorted (natural or comparator)No nullsO(log n)No
ConcurrentHashMap.newKeySet()Unordered1 nullO(1) averageYes

Internal HashMap Configuration #

HashSet uses specific HashMap constructors:

  • Default: new HashMap<> (initial capacity 16, load factor 0.75)
  • With collection: new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16))
  • With capacity/load factor: new HashMap<>(initialCapacity, loadFactor)
  • LinkedHashSet: new LinkedHashMap<>(initialCapacity, loadFactor)

The load factor of 0.75 provides a good balance between time and space costs.