Consistent Hashing Implementation
Consistent hashing is a technique used to distribute keys across a set of servers in a way that minimizes the number of keys that need to be remapped when servers are added or removed. This is crucial for distributed systems where maintaining data locality and minimizing disruption during scaling is essential. Your task is to implement a consistent hashing algorithm in Python.
Problem Description
You need to implement a consistent hashing ring. The ring will consist of servers, each identified by a unique key (an integer). Keys are hashed to a point on the ring, and a key is assigned to the server immediately clockwise from its hashed position. The implementation should support adding and removing servers, and efficiently determining which server a given key should be assigned to.
Key Requirements:
- Hashing: Use a simple hash function (e.g., modulo operation) to map keys and server identifiers to the ring.
- Server Assignment: Given a key, determine the server responsible for that key.
- Dynamic Updates: The system should handle adding and removing servers without requiring a complete rehash of all keys.
- Ring Representation: The ring should be represented in a way that allows for efficient clockwise traversal.
Expected Behavior:
- The
add_server(server_key)method should add a new server to the ring. - The
remove_server(server_key)method should remove a server from the ring. - The
get_server(key)method should return the server responsible for the given key. - Adding or removing a server should only affect the mapping of a small subset of keys.
Edge Cases to Consider:
- Empty ring (no servers).
- Adding a server that already exists.
- Removing a server that doesn't exist.
- Hashing collisions (though a simple modulo hash is used, consider how this might impact distribution).
- Large number of servers.
Examples
Example 1:
Input:
servers = []
add_server(10)
add_server(20)
add_server(30)
get_server(15)
remove_server(20)
get_server(15)
Output:
30
30
Explanation: Initially, the ring is empty. Servers 10, 20, and 30 are added. Key 15 hashes to a position between 10 and 20, so it's assigned to 20. After removing 20, key 15 now hashes to a position between 10 and 30, so it's assigned to 30.
Example 2:
Input:
servers = [1, 2, 3, 4, 5]
get_server(7)
add_server(6)
get_server(7)
remove_server(3)
get_server(7)
Output:
4
4
5
Explanation: With servers 1, 2, 3, 4, and 5, key 7 is assigned to server 4. After adding server 6, key 7 remains assigned to server 4. After removing server 3, key 7 is assigned to server 5.
Example 3: (Edge Case - Empty Ring)
Input:
servers = []
get_server(5)
Output:
None
Explanation: If the ring is empty, no server can be assigned, so None is returned.
Constraints
- Server keys and keys to be assigned are non-negative integers.
- The number of servers will be between 0 and 1000.
- The number of keys to be assigned will be between 1 and 10000.
- The hash function should be simple and efficient (e.g., modulo operation). Performance is not a primary concern for this exercise, but avoid excessively complex hashing algorithms.
- The ring size is implicitly defined by the maximum possible integer value (e.g., 2<sup>32</sup> - 1 for 32-bit integers).
Notes
- Consider using a sorted data structure (e.g., a list) to represent the ring, allowing for efficient clockwise traversal.
- The modulo operator (%) is a suitable hash function for this problem.
- Think about how to handle the case where a key hashes to a position between two servers. The server clockwise from the hashed position is responsible.
- Focus on the core logic of consistent hashing – adding, removing, and assigning keys. Error handling and extensive testing are not required for this exercise, but consider how you would handle invalid inputs in a production environment.