Implementing a B-Tree Map in Rust
This challenge asks you to implement a BTreeMap data structure from scratch in Rust. A B-tree map is a balanced tree structure that stores key-value pairs, offering efficient search, insertion, and deletion operations with guaranteed logarithmic time complexity. Understanding and implementing such a structure provides deep insights into balanced trees, memory management, and generic programming in Rust.
Problem Description
Your task is to implement a BTreeMap<K, V> struct in Rust that mimics the behavior of the standard library's std::collections::BTreeMap. The map should maintain keys in sorted order and provide methods for common map operations.
Key Requirements:
- B-Tree Structure: Implement the core B-tree logic for node splitting, merging, and rebalancing to maintain the tree's height balance and order property.
- Key-Value Storage: Store generic key-value pairs.
- Sorted Order: Keys must be stored in ascending order within nodes and across the entire tree.
- Generic Types: The map should be generic over key type
Kand value typeV. - Traits:
Kmust implementOrd(for comparison and sorting) and potentiallyCloneorCopydepending on implementation details.Vmight needCloneif values are copied during operations. - Core Operations: Implement the following methods:
new(): Creates an emptyBTreeMap.insert(key: K, value: V): Inserts a key-value pair. If the key already exists, update its value and return the old value.get(key: &K): Returns an immutable reference to the value associated with the key, orNoneif the key is not found.get_mut(key: &K): Returns a mutable reference to the value associated with the key, orNoneif the key is not found.remove(key: &K): Removes the key-value pair and returns the removed value, orNoneif the key is not found.contains_key(key: &K): Returnstrueif the map contains the given key,falseotherwise.len(): Returns the number of key-value pairs in the map.is_empty(): Returnstrueif the map is empty,falseotherwise.
- Node Structure: Define appropriate internal node and leaf node structures. Consider how to handle node capacity. A common choice for the order (or degree) of the B-tree is a fixed value, e.g.,
M = 4orM = 8. This means each node (except the root) will have betweenceil(M/2) - 1andM - 1keys.
Expected Behavior:
- Insertion and deletion operations should maintain the B-tree properties, including node splitting when a node exceeds capacity and node merging or borrowing when a node falls below minimum occupancy.
getandget_mutshould efficiently locate keys using binary search within nodes.removeshould handle cases where removing a key makes a node underflow, requiring rebalancing.
Edge Cases:
- Empty tree operations.
- Inserting/removing from a tree with a single node.
- Splitting the root node.
- Merging nodes that result in the root becoming empty and a child becoming the new root.
- Handling keys that are very close to each other.
Examples
Example 1: Basic Insert and Get
let mut map = BTreeMap::<i32, String>::new();
map.insert(10, "apple".to_string());
map.insert(20, "banana".to_string());
map.insert(5, "cherry".to_string());
assert_eq!(map.get(&10), Some(&"apple".to_string()));
assert_eq!(map.get(&5), Some(&"cherry".to_string()));
assert_eq!(map.get(&15), None);
assert_eq!(map.len(), 3);
Explanation:
Keys are inserted and stored in sorted order. get correctly retrieves values for existing keys and returns None for non-existent ones.
Example 2: Overwriting and Removing
let mut map = BTreeMap::<i32, String>::new();
map.insert(10, "apple".to_string());
map.insert(20, "banana".to_string());
let old_value = map.insert(10, "apricot".to_string());
assert_eq!(old_value, Some("apple".to_string()));
assert_eq!(map.get(&10), Some(&"apricot".to_string()));
assert_eq!(map.len(), 2);
let removed_value = map.remove(&20);
assert_eq!(removed_value, Some("banana".to_string()));
assert_eq!(map.get(&20), None);
assert_eq!(map.len(), 1);
let non_existent_remove = map.remove(&30);
assert_eq!(non_existent_remove, None);
assert_eq!(map.len(), 1);
Explanation:
Inserting a key that already exists updates the value and returns the old value. remove successfully deletes an existing key and returns its value, while attempting to remove a non-existent key returns None.
Example 3: Edge Case - Splitting the Root
Imagine a B-tree with a maximum of 3 keys per node (M=3). Minimum keys per node (except root) is ceil(3/2)-1 = 1.
// Assume BTreeMap is configured with M=3
let mut map = BTreeMap::<i32, i32>::new();
map.insert(1, 10); // Node: [1]
map.insert(2, 20); // Node: [1, 2]
map.insert(3, 30); // Node: [1, 2, 3] - Full, needs split on next insert
// Inserting 4 causes node [1, 2, 3] to split. 2 becomes root.
// Left child: [1], Right child: [3, 4]
map.insert(4, 40);
assert_eq!(map.get(&1), Some(&10));
assert_eq!(map.get(&2), Some(&20));
assert_eq!(map.get(&3), Some(&30));
assert_eq!(map.get(&4), Some(&40));
assert_eq!(map.len(), 4);
// This implies a root node with key 2, and two children nodes.
// The internal structure would reflect this.
Explanation:
When the initial node containing keys [1, 2, 3] becomes full and 4 is inserted, the node splits. The middle key (2) is promoted to become the new root. The remaining keys are distributed into child nodes.
Constraints
- The B-tree order
M(maximum number of children per node, which impliesM-1keys) should be a configurable parameter or a well-defined constant (e.g.,const M: usize = 4;). For simplicity in implementation, you can start with a small, fixedM. - Keys must implement
Ord. - Values can be any type.
- The implementation should aim for efficient operations, striving for O(log N) complexity for insert, get, and remove.
- Memory management should be handled correctly, avoiding memory leaks.
Notes
- Consider using an enum for
Nodeto differentiate between internal nodes and leaf nodes. - Think about how to represent keys and values within nodes. Storing them in sorted vectors is a common approach.
- The logic for splitting nodes and handling underflow (merging or borrowing from siblings) is the most complex part of B-tree implementation. Break it down into smaller, manageable functions.
- You might find it helpful to implement helper methods for searching within a node and for managing node capacity.
- Testing thoroughly, especially with edge cases and larger datasets, is crucial.