Implementing the Paxos Consensus Algorithm in Go
The Paxos algorithm is a family of protocols for solving consensus in a network of unreliable or fallible processors. It's a fundamental building block for fault-tolerant distributed systems, enabling agreement on a single value even in the presence of node failures or network delays. This challenge asks you to implement a simplified version of the Paxos algorithm in Go, focusing on the core logic of achieving consensus.
Problem Description
Your task is to implement a simplified, single-decree Paxos algorithm in Go. The goal is for a set of Proposers to agree on a single proposed value. The algorithm should handle a basic level of fault tolerance, simulating node failures by not receiving responses from certain nodes.
You will need to implement the roles of Proposer and Acceptor. For simplicity, we will not implement the Learner role; instead, the consensus result will be available at the Acceptors.
Key Requirements:
-
Proposer Role:
- Initiate a proposal for a value.
- Send "Prepare" requests to a majority of Acceptors.
- If a majority responds with promises, choose the highest-numbered proposal seen from the responses. If no proposal was seen, use its own proposed value.
- Send "Accept" requests with the chosen proposal number and value to a majority of Acceptors.
- If a majority of Acceptors acknowledge the "Accept" request with the same proposal number and value, consensus is reached for that proposal.
-
Acceptor Role:
- Respond to "Prepare" requests by promising not to accept any future proposals with a number less than the promised number.
- Track the highest-numbered "Prepare" request seen and the value associated with the highest-numbered "Accept" request it has already accepted.
- Respond to "Accept" requests if the proposal number is greater than or equal to the highest-numbered "Prepare" request it has already promised.
-
Communication:
- Simulate network communication. For this challenge, you can use Go channels to pass messages between simulated Proposers and Acceptors.
- Simulate network unreliability by having some messages not reach their destination or by having some nodes fail to respond within a timeout.
Expected Behavior:
- A Proposer should be able to propose a value and, through communication with Acceptors, eventually reach agreement on that value, provided a majority of Acceptors are available and responsive.
- If multiple Proposers attempt to propose concurrently, the algorithm should still resolve to a single value. The higher proposal number should generally win.
- The system should be able to tolerate the failure (unresponsiveness) of a minority of Acceptors.
Edge Cases to Consider:
- Concurrent Proposals: Multiple Proposers initiating proposals at roughly the same time.
- Failing Nodes: Some Acceptors might not respond to requests.
- Network Delays: Messages might be delayed, impacting timeouts.
- Re-proposals: A Proposer might need to re-initiate its proposal if it doesn't receive enough responses.
Examples
Example 1: Simple Consensus
- Nodes: 3 Acceptors (A1, A2, A3), 1 Proposer (P1).
- P1 proposes value "V1" with proposal number 1.
- Scenario: P1 sends Prepare(1) to A1, A2, A3. All respond with Promise(1, nil). P1 then sends Accept(1, "V1") to A1, A2, A3. A1, A2, A3 accept.
- Output: Consensus reached on value "V1" with proposal number 1.
Example 2: Competing Proposals
- Nodes: 3 Acceptors (A1, A2, A3), 2 Proposers (P1, P2).
- P1 proposes value "V1" with proposal number 1.
- P2 proposes value "V2" with proposal number 2.
- Scenario:
- P1 sends Prepare(1) to A1, A2, A3. A1, A2 promise. A3 is slow.
- P2 sends Prepare(2) to A1, A2, A3.
- A1 responds to P2 with Promise(2, nil) (since 2 > 1).
- A2 responds to P2 with Promise(2, nil) (since 2 > 1).
- A3 eventually responds to P1 with Promise(1, nil).
- P2 sees promises from A1, A2 (majority). P2 sends Accept(2, "V2") to A1, A2.
- A1 accepts 2, "V2".
- A2 accepts 2, "V2".
- P1, seeing no majority for Prepare(1) in time and then seeing P2's higher proposal, might try to learn the value or start a new proposal. For this simplified version, we can assume P1 might give up or start over.
- If P2 then sends Accept(2, "V2") to A3, A3 might also accept if its state is consistent.
- Output: Consensus reached on value "V2" with proposal number 2.
Example 3: Acceptor with Previous Acceptance
- Nodes: 3 Acceptors (A1, A2, A3), 1 Proposer (P1).
- Scenario:
- P1 proposes value "V_old" with proposal number 1. P1 sends Prepare(1) to A1, A2, A3. A1, A2, A3 promise. P1 sends Accept(1, "V_old") to A1, A2, A3. A1, A2 accept. A3 is slow.
- Later, P2 proposes value "V_new" with proposal number 2. P2 sends Prepare(2) to A1, A2, A3.
- A1 responds to P2 with Promise(2, "V_old") (since 2 > 1 and it already accepted proposal 1).
- A2 responds to P2 with Promise(2, "V_old") (since 2 > 1 and it already accepted proposal 1).
- A3 eventually responds to P2's Prepare(2) with Promise(2, nil).
- P2 sees promises from A1, A2, A3. P2 must use the value "V_old" from A1 and A2's responses. P2 sends Accept(2, "V_old") to A1, A2, A3.
- A1, A2, A3 accept.
- Output: Consensus reached on value "V_old" with proposal number 2.
Constraints
- You must implement at least 3 Acceptors.
- You can implement multiple Proposers, but at least one.
- Proposal numbers should be strictly increasing integers.
- Simulated network unreliability should be controllable (e.g., a probability of dropping messages or a fixed number of unresponsive nodes).
- The simulation should run until consensus is achieved or a reasonable timeout/maximum number of rounds is reached for demonstration.
Notes
- This is a single-decree Paxos implementation. Multi-decree Paxos is significantly more complex.
- Focus on the core message passing and state management of Proposers and Acceptors.
- Consider using structs to represent messages and proposal numbers.
- Think about how to manage state within your Acceptor and Proposer implementations.
- For simulating failures, you can:
- Use a random number generator to decide if a message is dropped.
- Have certain nodes simply not respond to requests for a period.
- You will likely need goroutines to simulate concurrent operations of Proposers and Acceptors.
- Consider using
context.Contextwith timeouts for simulating network latency and preventing indefinite hangs. - A simple way to model "majority" is
(numberOfAcceptors / 2) + 1.