Hone logo
Hone
Problems

Implementing a Custom HashSet in Rust

This challenge asks you to implement your own version of a hash set in Rust. A hash set is a data structure that stores unique elements and allows for efficient insertion, deletion, and checking for the existence of elements using hashing. Building one from scratch will deepen your understanding of hash tables, collision resolution, and Rust's ownership and borrowing rules.

Problem Description

Your task is to create a generic HashSet<T> struct in Rust that mimics the functionality of std::collections::HashSet. This implementation should store unique elements of any type T that satisfies the Eq and Hash traits.

Key Requirements:

  • Generic Type T: The HashSet should be able to store elements of any type T that implements std::hash::Hash and std::cmp::Eq.
  • Uniqueness: Only one instance of any given element should be stored. If an attempt is made to insert an element that already exists, the set should not change.
  • Core Operations: Implement the following methods:
    • new(): Creates an empty HashSet.
    • insert(value: T): Adds a value to the set. Returns true if the value was newly inserted, false if the value was already present.
    • contains(value: &T): Returns true if the set contains the given value, false otherwise.
    • remove(value: &T): Removes a value from the set. Returns true if the value was present and removed, false otherwise.
    • len(): Returns the number of elements in the set.
    • is_empty(): Returns true if the set contains no elements, false otherwise.
  • Internal Representation: You will need to choose an underlying data structure to implement the hash table. A common approach is to use a Vec of "buckets", where each bucket is a list (or another suitable collection) of elements that hash to the same index.
  • Collision Resolution: You must implement a strategy to handle hash collisions (when multiple keys hash to the same bucket). Separate chaining (using linked lists or Vecs within buckets) is a common and acceptable approach.
  • Resizing: To maintain performance, the hash set should resize its internal storage (the number of buckets) when the load factor (number of elements / number of buckets) exceeds a certain threshold.

Expected Behavior:

  • Insertion of unique elements should succeed.
  • Insertion of duplicate elements should not change the set and should return false.
  • contains should correctly report the presence or absence of elements.
  • remove should correctly remove elements and return appropriate boolean values.
  • len and is_empty should accurately reflect the set's state.

Edge Cases:

  • Empty set operations.
  • Inserting and removing all elements.
  • Handling types with identical hashes but different values (due to Hash implementation or collision resolution).

Examples

Example 1: Basic Operations

Input:
let mut my_set = HashSet::new();
my_set.insert(10);
my_set.insert(20);
my_set.insert(10); // Duplicate

Output:
set.len() -> 2
set.contains(&10) -> true
set.contains(&30) -> false
set.remove(&20) -> true
set.len() -> 1
set.contains(&20) -> false

Explanation:

  1. An empty set is created.
  2. 10 is inserted, set size becomes 1.
  3. 20 is inserted, set size becomes 2.
  4. 10 is already present, insertion returns false, set size remains 2.
  5. contains(&10) returns true.
  6. contains(&30) returns false.
  7. 20 is removed, returns true, set size becomes 1.
  8. contains(&20) now returns false.

Example 2: String Elements

Input:
let mut string_set: HashSet<String> = HashSet::new();
string_set.insert("hello".to_string());
string_set.insert("world".to_string());
string_set.insert("hello".to_string()); // Duplicate

Output:
string_set.len() -> 2
string_set.contains(&"hello".to_string()) -> true
string_set.remove(&"goodbye".to_string()) -> false

Explanation: The HashSet works with String types. "hello" is inserted, then "world". The second insertion of "hello" is a duplicate. len() reports 2. contains correctly finds "hello". An attempt to remove "goodbye" fails as it's not in the set.

Example 3: Empty Set and Zero Operations

Input:
let empty_set: HashSet<i32> = HashSet::new();

Output:
empty_set.len() -> 0
empty_set.is_empty() -> true
empty_set.contains(&5) -> false
empty_set.remove(&5) -> false

Explanation: Demonstrates that len, is_empty, contains, and remove behave correctly on an initially empty set.

Constraints

  • The initial capacity (number of buckets) of the HashSet can be a small power of 2, e.g., 8 or 16.
  • The load factor threshold for resizing should be around 0.75.
  • The implementation should handle types T that are Clone or Copy (though Clone might be needed for insertion if you don't want to take ownership immediately, or if you need to store multiple copies in the internal structure for some reason, e.g., if you chose a different collision resolution). For this challenge, assume T can be Clone.
  • Performance should be generally O(1) on average for insert, contains, and remove, assuming a good hash function and a reasonable load factor. Worst-case performance can be O(n) if all elements hash to the same bucket.

Notes

  • You'll need to import traits from std::hash and std::cmp.
  • Consider how to handle ownership of values stored in the HashSet. You might need to clone values when inserting if you want to retain a copy in the caller's scope, or if the internal representation requires it.
  • When calculating the bucket index, use the modulo operator (%) with the number of buckets.
  • For resizing, you'll need to create a new, larger internal storage and re-insert all existing elements into the new buckets.
  • Think carefully about the lifetime of references passed to contains and remove.
  • A Vec<Option<T>> or Vec<Vec<T>> for buckets are common starting points for internal representation. Vec<Vec<T>> is generally preferred for separate chaining.
Loading editor...
rust