Distributed Leader Election in Go
Leader election is a fundamental problem in distributed systems. It allows a group of nodes to automatically choose one node as the leader, ensuring that only one node is responsible for coordinating tasks and making decisions. This challenge asks you to implement a basic leader election algorithm in Go using a simple heartbeat mechanism.
Problem Description
You are tasked with implementing a leader election algorithm for a cluster of Go nodes. Each node periodically sends a "heartbeat" signal. The node that sends the most recent heartbeat is considered the leader. If a node fails to send a heartbeat within a specified timeout period, the remaining nodes will initiate a new election.
What needs to be achieved:
- Implement a
Nodestruct representing a node in the cluster. Each node should have a unique ID, a channel for receiving heartbeats from other nodes, and a channel to send its own heartbeats. - Implement a
LeaderElectionfunction that takes a slice ofNodestructs as input and simulates the leader election process. - The function should return the ID of the elected leader.
- Nodes should periodically send heartbeats to each other.
- If a node fails to receive a heartbeat from the current leader within a timeout period, it should initiate a new election.
Key Requirements:
- Heartbeat Mechanism: Nodes should periodically send heartbeats to all other nodes in the cluster.
- Timeout: A timeout period should be defined. If a node doesn't receive a heartbeat within this period, it assumes the leader has failed.
- Election Trigger: When a leader failure is detected, a new election should be triggered.
- Unique Leader: Only one node should be elected as the leader at any given time.
- Concurrency: The solution should be concurrent to simulate a distributed environment.
Expected Behavior:
- Initially, all nodes are candidates for the leader.
- The node that sends the first heartbeat becomes the initial leader.
- Subsequent heartbeats from the leader maintain its leadership.
- If the leader fails (doesn't send a heartbeat within the timeout), a new election is triggered, and a new leader is chosen.
- The
LeaderElectionfunction should return the ID of the current leader.
Edge Cases to Consider:
- Empty Cluster: What happens if the input slice of nodes is empty?
- Single Node Cluster: What happens if there's only one node in the cluster?
- Node Failure During Election: What happens if a node fails during the election process?
- Network Partitioning: (This is a more advanced consideration, but think about how your solution might behave if nodes become isolated from each other.)
Examples
Example 1:
Input: [{ID: "A", HeartbeatChan: make(chan string), SendChan: make(chan string)}, {ID: "B", HeartbeatChan: make(chan string), SendChan: make(chan string)}, {ID: "C", HeartbeatChan: make(chan string), SendChan: make(chan string)}]
Output: "A" (or "B" or "C" - the first node to send a heartbeat)
Explanation: All nodes start sending heartbeats. The first node to send its heartbeat is elected as the leader.
Example 2:
Input: [{ID: "A", HeartbeatChan: make(chan string), SendChan: make(chan string)}, {ID: "B", HeartbeatChan: make(chan string), SendChan: make(chan string)}]
Output: "A" (or "B" - the first node to send a heartbeat)
Explanation: Similar to Example 1, but with only two nodes.
Example 3: (Simulating Leader Failure)
Input: [{ID: "A", HeartbeatChan: make(chan string), SendChan: make(chan string)}, {ID: "B", HeartbeatChan: make(chan string), SendChan: make(chan string)}]
Output: "B" (after node "A" fails to send a heartbeat within the timeout)
Explanation: Node "A" is initially elected as the leader. If "A" stops sending heartbeats, "B" detects the failure and becomes the new leader.
Constraints
- Number of Nodes: The input slice of nodes can contain between 1 and 10 nodes (inclusive).
- Node ID: Node IDs are strings and are unique within the cluster.
- Heartbeat Interval: The heartbeat interval should be between 100ms and 500ms (inclusive).
- Timeout Period: The timeout period should be between 1 second and 3 seconds (inclusive).
- Concurrency: The solution must be concurrent to simulate a distributed environment. Use goroutines and channels effectively.
- No External Dependencies: Do not use external libraries for leader election. Implement the core logic yourself.
Notes
- Consider using channels to communicate between nodes.
- Think about how to handle node failures and trigger new elections.
- The heartbeat interval and timeout period are crucial parameters that affect the stability and responsiveness of the leader election algorithm.
- This is a simplified implementation. Real-world leader election algorithms are more complex and robust, handling issues like network partitions and message loss.
- Focus on the core concepts of leader election: heartbeat mechanism, timeout, and election trigger.
- Error handling is not required for this challenge, but consider how you might handle errors in a production environment.