705. Design HashSet - Leetcode Solution

Code Implementation

class MyHashSet:

    def __init__(self):
        self.size = 1000
        self.buckets = [[] for _ in range(self.size)]

    def _hash(self, key: int) -> int:
        return key % self.size

    def add(self, key: int) -> None:
        h = self._hash(key)
        if key not in self.buckets[h]:
            self.buckets[h].append(key)

    def remove(self, key: int) -> None:
        h = self._hash(key)
        try:
            self.buckets[h].remove(key)
        except ValueError:
            pass

    def contains(self, key: int) -> bool:
        h = self._hash(key)
        return key in self.buckets[h]

class MyHashSet {
private:
    static const int size = 1000;
    vector<list<int>> buckets;
    int hash(int key) {
        return key % size;
    }
public:
    MyHashSet() : buckets(size) {}
    
    void add(int key) {
        int h = hash(key);
        if (find(buckets[h].begin(), buckets[h].end(), key) == buckets[h].end())
            buckets[h].push_back(key);
    }
    
    void remove(int key) {
        int h = hash(key);
        buckets[h].remove(key);
    }
    
    bool contains(int key) {
        int h = hash(key);
        return find(buckets[h].begin(), buckets[h].end(), key) != buckets[h].end();
    }
};

class MyHashSet {
    private final int size = 1000;
    private LinkedList<Integer>[] buckets;

    public MyHashSet() {
        buckets = new LinkedList[size];
        for (int i = 0; i < size; i++)
            buckets[i] = new LinkedList<>();
    }
    
    private int hash(int key) {
        return key % size;
    }
    
    public void add(int key) {
        int h = hash(key);
        if (!buckets[h].contains(key))
            buckets[h].add(key);
    }
    
    public void remove(int key) {
        int h = hash(key);
        buckets[h].remove((Integer) key);
    }
    
    public boolean contains(int key) {
        int h = hash(key);
        return buckets[h].contains(key);
    }
}

var MyHashSet = function() {
    this.size = 1000;
    this.buckets = Array.from({length: this.size}, () => []);
};

MyHashSet.prototype._hash = function(key) {
    return key % this.size;
};

MyHashSet.prototype.add = function(key) {
    const h = this._hash(key);
    if (!this.buckets[h].includes(key)) {
        this.buckets[h].push(key);
    }
};

MyHashSet.prototype.remove = function(key) {
    const h = this._hash(key);
    const idx = this.buckets[h].indexOf(key);
    if (idx !== -1) {
        this.buckets[h].splice(idx, 1);
    }
};

MyHashSet.prototype.contains = function(key) {
    const h = this._hash(key);
    return this.buckets[h].includes(key);
};

Problem Description

You are asked to design and implement a data structure called MyHashSet without using any built-in hash set libraries. This custom hash set should support three operations:

add(key): Insert the integer key into the set.
remove(key): Remove the integer key from the set if present.
contains(key): Return true if the set contains the integer key, otherwise false.

The set should only store unique elements (no duplicates), and all operations should be efficient. The problem typically specifies a value range for key (e.g., 0 ≤ key <= 10⁶), and you should not use any built-in hash set or hash map data structures.

Thought Process

When asked to design a hash set from scratch, the first idea might be to use a simple array or list to store all elements. However, this approach leads to slow operations: checking if a key exists, adding, or removing requires scanning the entire list, resulting in O(n) time per operation.

To improve performance, we need a way to quickly find if a key exists and where to store it. This is where hash tables come in: by computing a hash of the key, we can map it to a specific "bucket" or location in an array. This allows for much faster lookups, ideally in constant O(1) time.

The challenge is to handle collisions (when multiple keys map to the same bucket) and to ensure that all operations remain efficient. We must also avoid using built-in hash set structures and instead implement our own logic for storing and managing elements.

Solution Approach

The standard way to implement a hash set is to use an array of buckets, where each bucket holds a collection (like a list or linked list) of keys that hash to the same value. Here's how we can implement this:

Choose the number of buckets:
- Pick a fixed array size (e.g., 1000). This is a tradeoff between space and collision probability.
Hash function:
- For a given key, compute hash(key) = key % size to determine which bucket to use.
Bucket storage:
- Each bucket is a list (or linked list) that stores all keys hashing to that bucket.
Operations:
- add(key): Hash the key, check if it is already in the bucket (to avoid duplicates), and add it if not present.
- remove(key): Hash the key, and remove it from the bucket if present.
- contains(key): Hash the key, and check if it exists in the bucket.
Collision handling:
- Since several keys may hash to the same bucket, each bucket can store multiple keys.

This approach ensures that all operations are fast on average, provided the hash function distributes keys evenly and the number of buckets is large enough to minimize collisions.

Example Walkthrough

Let's walk through an example using MyHashSet:

Initialize: MyHashSet() creates an array of 1000 buckets, each empty.
Add 1001: add(1001)
- Hash: 1001 % 1000 = 1. Go to bucket 1.
- Bucket 1 is empty, so we add 1001.
Check contains 1001: contains(1001)
- Hash: 1001 % 1000 = 1. Go to bucket 1.
- 1001 is present in the bucket, so return true.
Remove 1001: remove(1001)
- Hash: 1001 % 1000 = 1. Go to bucket 1.
- 1001 is present, so remove it from the bucket.
Check contains 1001: contains(1001)
- Hash: 1001 % 1000 = 1. Go to bucket 1.
- 1001 is no longer present, so return false.

This demonstrates how keys are added, checked, and removed efficiently using hashing and buckets.

Time and Space Complexity

Brute-force approach:
- If we used a simple list, add, remove, and contains would all be O(n), where n is the number of elements in the set.
Optimized hash set approach:
- Time Complexity:
  - On average, all operations (add, remove, contains) are O(1) due to hashing and small bucket sizes.
  - In the worst case (all keys collide into one bucket), operations degrade to O(n) for that bucket, but with a good hash function and enough buckets, this is rare.
- Space Complexity:
  - O(k + m), where k is the number of buckets (fixed, e.g., 1000), and m is the number of elements stored (each stored in a bucket list).

Summary

We designed a custom hash set by using an array of buckets and a simple hash function to distribute keys efficiently. This approach ensures that insertion, removal, and lookup are all fast, typically O(1) time, by leveraging the principle of hashing and handling collisions with bucket lists. The key insight is to avoid slow linear scans by mapping each key directly to a bucket, making the solution both efficient and elegant for large sets of integers.

AlgoMap.io

Want Help Cracking FAANG?