Hone logo
Hone
Problems

Implementing Consistent Hashing for Scalable Key Distribution

Consistent hashing is a distributed hashing technique that minimizes key remapping when nodes are added or removed from a hash ring. This is crucial for distributed systems like caches, databases, and load balancers, where maintaining data availability and minimizing disruption during scaling events is paramount. Your task is to implement a consistent hashing algorithm in Python.

Problem Description

You need to design and implement a ConsistentHash class that distributes keys across a set of nodes. The core functionality of this class will be to:

  1. Add Nodes: Allow new nodes to be added to the hash ring.
  2. Remove Nodes: Allow existing nodes to be removed from the hash ring.
  3. Get Node for Key: Given a key, determine which node it should be assigned to.

The implementation should strive for efficiency and correctness, especially when dealing with node additions and removals. You should use a hashing function (like hashlib.md5 or hashlib.sha1) to map both keys and nodes to points on a conceptual ring. To improve distribution and reduce the impact of hot spots, you should also implement virtual nodes (also known as replicas or copies). Each physical node should be represented by multiple virtual nodes on the ring.

Key Requirements:

  • Virtual Nodes: The ConsistentHash class should support the concept of virtual nodes. A parameter should be provided to control the number of virtual nodes per physical node.
  • Ordered Ring: The hash ring should be represented in an ordered manner (e.g., a sorted list or a sorted dictionary) to efficiently find the next node.
  • Key Distribution: When adding or removing nodes, only a fraction of keys should be remapped to different nodes, ideally proportional to the number of nodes added/removed.
  • Hashing Function: Use a standard cryptographic hash function (e.g., MD5 or SHA1).
  • API Design: Provide clear methods for adding nodes, removing nodes, and getting the node responsible for a given key.

Expected Behavior:

  • When a key is hashed, it should map to a point on the ring. The node responsible for the key is the first node encountered when moving clockwise from the key's hash point on the ring.
  • Adding a node should primarily affect keys that were previously assigned to the node immediately preceding the new node's position on the ring.
  • Removing a node should cause its keys to be reassigned to the next node clockwise on the ring.

Edge Cases:

  • Empty Ring: What happens when get_node is called on an empty hash ring?
  • Single Node: How does the distribution work with only one node?
  • Identical Hashes: While unlikely with good hash functions, consider how collisions might be implicitly handled by the ring structure.
  • Node Removal with No Successor: If the last node is removed, what happens to keys?

Examples

Example 1:

from consistent_hash import ConsistentHash

# Initialize with 100 virtual nodes per physical node
ch = ConsistentHash(replicas=100)

# Add nodes
ch.add_node("node1")
ch.add_node("node2")
ch.add_node("node3")

# Assign keys
keys = ["apple", "banana", "cherry", "date", "elderberry", "fig", "grape"]
key_assignments = {}
for key in keys:
    key_assignments[key] = ch.get_node(key)

print(key_assignments)

Output:

{'apple': 'node1', 'banana': 'node2', 'cherry': 'node1', 'date': 'node3', 'elderberry': 'node2', 'fig': 'node3', 'grape': 'node1'}

Explanation:

The keys are distributed among node1, node2, and node3. The specific assignment depends on the hashed values of the keys and the positions of the virtual nodes of each physical node on the ring.

Example 2:

from consistent_hash import ConsistentHash

ch = ConsistentHash(replicas=100)
ch.add_node("node1")
ch.add_node("node2")
ch.add_node("node3")

keys = ["apple", "banana", "cherry", "date", "elderberry", "fig", "grape"]
key_assignments_before = {key: ch.get_node(key) for key in keys}

# Remove a node
ch.remove_node("node2")

key_assignments_after = {key: ch.get_node(key) for key in keys}

print("Assignments Before Removal:", key_assignments_before)
print("Assignments After Removal:", key_assignments_after)

# Calculate how many keys were remapped
remapped_count = 0
for key in keys:
    if key_assignments_before[key] != key_assignments_after[key]:
        remapped_count += 1

print(f"Number of remapped keys: {remapped_count}")

Output:

Assignments Before Removal: {'apple': 'node1', 'banana': 'node2', 'cherry': 'node1', 'date': 'node3', 'elderberry': 'node2', 'fig': 'node3', 'grape': 'node1'}
Assignments After Removal: {'apple': 'node1', 'banana': 'node3', 'cherry': 'node1', 'date': 'node3', 'elderberry': 'node3', 'fig': 'node3', 'grape': 'node1'}
Number of remapped keys: 3

Explanation:

After removing node2, keys that were previously assigned to node2 (like "banana" and "elderberry") are now assigned to the next available node clockwise, which is node3 in this case. Other keys remain on their original nodes. The number of remapped keys is relatively small, demonstrating the benefit of consistent hashing.

Example 3 (Edge Case: Empty Ring):

from consistent_hash import ConsistentHash

ch = ConsistentHash(replicas=100)

try:
    ch.get_node("some_key")
except ValueError as e:
    print(e)

Output:

Hash ring is empty.

Explanation:

Calling get_node on an empty hash ring raises a ValueError as there are no nodes to assign the key to.

Constraints

  • The number of physical nodes can range from 0 to 1000.
  • The number of virtual nodes per physical node (replicas) will be between 50 and 300.
  • Keys and node names will be strings.
  • The hashing function should produce a 32-bit or 64-bit integer representation for consistent ordering on the ring.
  • The time complexity for add_node and remove_node should be O(R * log N), where R is the number of replicas and N is the number of physical nodes.
  • The time complexity for get_node should be O(log N), where N is the number of physical nodes.

Notes

  • You will need to choose a suitable hashing algorithm. Consider hashlib.md5 or hashlib.sha1. Remember to convert the resulting hash digest to an integer.
  • A good data structure for storing the ordered ring is a sorted list or a sorted dictionary. If using a sorted list, you can use binary search (bisect module in Python) to find the appropriate node efficiently.
  • When generating virtual nodes, use a consistent naming scheme (e.g., node_name-replica_index).
  • Consider how to handle the wrap-around behavior of the ring (i.e., when a key's hash is greater than the largest hash on the ring).
  • Thoroughly test your implementation with various scenarios, including adding and removing nodes sequentially and concurrently (though your implementation will likely be single-threaded).
Loading editor...
python