Hone logo
Hone
Problems

Implement a Simplified Raft Consensus Protocol in Go

This challenge asks you to implement a simplified version of the Raft consensus algorithm in Go. Raft is a distributed consensus algorithm designed to be understandable, and understanding its core mechanics is crucial for building reliable distributed systems. Your implementation will focus on the leader election and log replication phases.

Problem Description

You are tasked with building a Raft node that can participate in a cluster. Each node will maintain its state (Follower, Candidate, or Leader) and communicate with other nodes using RPCs.

Your implementation should cover the following:

  • Node States: Implement the three Raft states:
    • Follower: Listens for heartbeats from the leader, grants votes to candidates if conditions are met.
    • Candidate: Attempts to become leader by requesting votes from other nodes.
    • Leader: Sends heartbeats to followers, replicates log entries.
  • RPCs: Implement the necessary Remote Procedure Calls (RPCs) for Raft communication:
    • RequestVote RPC: Called by candidates to request votes.
    • AppendEntries RPC: Called by leaders to replicate log entries and as heartbeats.
  • Leader Election: Implement the logic for a follower to become a candidate and for a candidate to win an election. This involves randomized election timeouts.
  • Log Replication: Implement the logic for a leader to send log entries to followers and for followers to append them to their logs.
  • Timers: Implement randomized election timeouts for followers and heartbeats for leaders.
  • Basic State Management: Each node should maintain its current term, votedFor, and log.

Key Requirements:

  1. State Transitions: Nodes must correctly transition between Follower, Candidate, and Leader states.
  2. Election Logic: Candidates must correctly solicit votes, and nodes must correctly grant votes based on the current term and log consistency.
  3. AppendEntries Logic: Leaders must send AppendEntries RPCs, and followers must respond appropriately, handling log consistency checks.
  4. Network Simulation: You will need a way to simulate network communication between nodes. This could be done using channels or a simple in-memory RPC stub. For this challenge, we will assume an in-memory simulation.
  5. Persistence (Simplified): For this challenge, we will not require persistent storage. However, in a real Raft implementation, logs and state must be persisted to survive crashes.

Expected Behavior:

  • In an idle cluster, one node should eventually become the leader.
  • Once a leader is elected, it should continuously send heartbeats to maintain its leadership.
  • If the leader fails, the cluster should elect a new leader.
  • (Optional, for advanced consideration) If a client sends a command to a follower, it should be redirected to the leader.

Important Edge Cases to Consider:

  • Split Votes: What happens when no candidate receives a majority of votes due to timing or network partitions?
  • Log Divergence: How do leaders ensure followers' logs are consistent before replicating new entries?
  • Network Latency/Loss: While we simulate network, consider how these factors impact timeouts and RPCs.
  • New Node Joining: How would a new node join an existing cluster and catch up? (This is an advanced consideration beyond the core implementation for this challenge).

Examples

For this simplified challenge, we won't have direct input/output in the traditional sense. Instead, we'll define scenarios and what the expected behavior of the nodes should be.

Scenario 1: Initial Cluster Startup

  • Description: Three nodes (Node A, Node B, Node C) are started simultaneously.
  • Expected Behavior:
    • Initially, all nodes are Followers.
    • Due to randomized election timeouts, one node (e.g., Node A) will time out first and become a Candidate.
    • Node A will send RequestVote RPCs to Node B and Node C.
    • If Node B and Node C grant votes (assuming their terms and logs are less or equal to Node A's), Node A will become the Leader.
    • Node A will then start sending AppendEntries RPCs (heartbeats) to Node B and Node C.
    • Node B and Node C will transition to Followers and acknowledge the heartbeats.

Scenario 2: Leader Failure

  • Description: A cluster of three nodes (A: Leader, B: Follower, C: Follower) is operating. Node A abruptly stops responding.
  • Expected Behavior:
    • Nodes B and C will not receive heartbeats from Node A within their election timeouts.
    • Both Node B and Node C will independently time out and transition to Candidates.
    • They will send RequestVote RPCs to each other.
    • Due to randomized timeouts, one candidate (e.g., Node B) will likely receive a majority of votes and become the new Leader.
    • Node B will then start sending heartbeats to Node C.

Scenario 3: Log Replication

  • Description: Node A is the Leader, Node B and Node C are Followers. A client sends a command to Node A.
  • Expected Behavior:
    • Node A appends the command to its log as a new entry.
    • Node A sends an AppendEntries RPC to Node B and Node C containing the new log entry.
    • Node B and Node C receive the AppendEntries RPC. They check for log consistency (prevLogIndex, prevLogTerm).
    • Assuming consistency, Node B and Node C append the entry to their logs and respond to Node A.
    • Once Node A receives acknowledgments from a majority of nodes (including itself), the entry is considered committed.

Constraints

  • Number of Nodes: The implementation should support a cluster of at least 3 nodes. You can hardcode or configure the number of nodes for testing.
  • Network Simulation: Assume an in-memory simulation of RPCs. No actual network sockets are required.
  • Log Entries: Log entries can be simple strings or integers for this challenge.
  • Timeouts: Election timeouts should be randomized within a specified range (e.g., 150-300ms). Heartbeat intervals should be fixed (e.g., 50ms).
  • Concurrency: Your implementation must be concurrency-safe, as nodes will be handling multiple RPCs and timers concurrently. Use Go's goroutines and channels appropriately.
  • State Machine: You do not need to implement a full state machine for applying commands to the log; focus on the Raft consensus logic itself.

Notes

  • RPC Mechanism: You can implement your RPC layer using Go's built-in net/rpc package or a simpler channel-based communication for in-memory simulation. For this challenge, a channel-based simulation is highly recommended for ease of testing.
  • Data Structures: Carefully design your data structures for node state, logs, and RPC arguments/results.
  • Testing: Thorough testing is crucial. Consider writing unit tests for individual components (e.g., state transitions, vote granting logic) and integration tests for cluster behavior.
  • Raft Paper: Refer to the original Raft paper ("In Search of Reliable Distributed Systems" by Diego Ongaro and John Ousterhout) for detailed specifications of the algorithm.
  • Start Simple: Begin by implementing leader election, then move to log replication. Ensure each phase works correctly before proceeding.
  • Error Handling: While not explicitly detailed, consider how your implementation would handle RPC errors or node failures in a more robust system. For this challenge, assume RPCs are reliable within the simulation.
Loading editor...
go