Implementing a Custom HashSet in Rust
This challenge asks you to implement your own version of a hash set in Rust. A hash set is a data structure that stores unique elements and allows for efficient insertion, deletion, and checking for the existence of elements using hashing. Building one from scratch will deepen your understanding of hash tables, collision resolution, and Rust's ownership and borrowing rules.
Problem Description
Your task is to create a generic HashSet<T> struct in Rust that mimics the functionality of std::collections::HashSet. This implementation should store unique elements of any type T that satisfies the Eq and Hash traits.
Key Requirements:
- Generic Type
T: TheHashSetshould be able to store elements of any typeTthat implementsstd::hash::Hashandstd::cmp::Eq. - Uniqueness: Only one instance of any given element should be stored. If an attempt is made to insert an element that already exists, the set should not change.
- Core Operations: Implement the following methods:
new(): Creates an emptyHashSet.insert(value: T): Adds a value to the set. Returnstrueif the value was newly inserted,falseif the value was already present.contains(value: &T): Returnstrueif the set contains the given value,falseotherwise.remove(value: &T): Removes a value from the set. Returnstrueif the value was present and removed,falseotherwise.len(): Returns the number of elements in the set.is_empty(): Returnstrueif the set contains no elements,falseotherwise.
- Internal Representation: You will need to choose an underlying data structure to implement the hash table. A common approach is to use a
Vecof "buckets", where each bucket is a list (or another suitable collection) of elements that hash to the same index. - Collision Resolution: You must implement a strategy to handle hash collisions (when multiple keys hash to the same bucket). Separate chaining (using linked lists or
Vecs within buckets) is a common and acceptable approach. - Resizing: To maintain performance, the hash set should resize its internal storage (the number of buckets) when the load factor (number of elements / number of buckets) exceeds a certain threshold.
Expected Behavior:
- Insertion of unique elements should succeed.
- Insertion of duplicate elements should not change the set and should return
false. containsshould correctly report the presence or absence of elements.removeshould correctly remove elements and return appropriate boolean values.lenandis_emptyshould accurately reflect the set's state.
Edge Cases:
- Empty set operations.
- Inserting and removing all elements.
- Handling types with identical hashes but different values (due to
Hashimplementation or collision resolution).
Examples
Example 1: Basic Operations
Input:
let mut my_set = HashSet::new();
my_set.insert(10);
my_set.insert(20);
my_set.insert(10); // Duplicate
Output:
set.len() -> 2
set.contains(&10) -> true
set.contains(&30) -> false
set.remove(&20) -> true
set.len() -> 1
set.contains(&20) -> false
Explanation:
- An empty set is created.
- 10 is inserted, set size becomes 1.
- 20 is inserted, set size becomes 2.
- 10 is already present, insertion returns
false, set size remains 2. contains(&10)returnstrue.contains(&30)returnsfalse.- 20 is removed, returns
true, set size becomes 1. contains(&20)now returnsfalse.
Example 2: String Elements
Input:
let mut string_set: HashSet<String> = HashSet::new();
string_set.insert("hello".to_string());
string_set.insert("world".to_string());
string_set.insert("hello".to_string()); // Duplicate
Output:
string_set.len() -> 2
string_set.contains(&"hello".to_string()) -> true
string_set.remove(&"goodbye".to_string()) -> false
Explanation:
The HashSet works with String types. "hello" is inserted, then "world". The second insertion of "hello" is a duplicate. len() reports 2. contains correctly finds "hello". An attempt to remove "goodbye" fails as it's not in the set.
Example 3: Empty Set and Zero Operations
Input:
let empty_set: HashSet<i32> = HashSet::new();
Output:
empty_set.len() -> 0
empty_set.is_empty() -> true
empty_set.contains(&5) -> false
empty_set.remove(&5) -> false
Explanation:
Demonstrates that len, is_empty, contains, and remove behave correctly on an initially empty set.
Constraints
- The initial capacity (number of buckets) of the
HashSetcan be a small power of 2, e.g., 8 or 16. - The load factor threshold for resizing should be around 0.75.
- The implementation should handle types
Tthat areCloneorCopy(thoughClonemight be needed for insertion if you don't want to take ownership immediately, or if you need to store multiple copies in the internal structure for some reason, e.g., if you chose a different collision resolution). For this challenge, assumeTcan beClone. - Performance should be generally O(1) on average for
insert,contains, andremove, assuming a good hash function and a reasonable load factor. Worst-case performance can be O(n) if all elements hash to the same bucket.
Notes
- You'll need to import traits from
std::hashandstd::cmp. - Consider how to handle ownership of values stored in the
HashSet. You might need toclonevalues when inserting if you want to retain a copy in the caller's scope, or if the internal representation requires it. - When calculating the bucket index, use the modulo operator (
%) with the number of buckets. - For resizing, you'll need to create a new, larger internal storage and re-insert all existing elements into the new buckets.
- Think carefully about the lifetime of references passed to
containsandremove. - A
Vec<Option<T>>orVec<Vec<T>>for buckets are common starting points for internal representation.Vec<Vec<T>>is generally preferred for separate chaining.