Hone logo
Hone
Problems

Implementing a Circuit Breaker Pattern in Python

The circuit breaker pattern is a crucial resilience mechanism in distributed systems. It prevents an application from repeatedly trying to execute an operation that is likely to fail, thus protecting resources and preventing cascading failures. Your task is to implement a Python class that embodies this pattern, allowing you to control access to a potentially unreliable service.

Problem Description

You need to create a CircuitBreaker class in Python that manages access to a function (representing a remote service call or any operation that might fail). The circuit breaker should transition between three states:

  1. Closed: Requests are allowed to pass through to the protected function. If the function succeeds, the state remains Closed. If it fails, a counter is incremented.
  2. Open: If the failure counter reaches a predefined threshold, the circuit breaker transitions to Open. In this state, all subsequent requests are immediately rejected without even calling the protected function, returning an error.
  3. Half-Open: After a specified timeout period in the Open state, the circuit breaker transitions to Half-Open. In this state, a limited number of requests are allowed to pass through. If these requests succeed, the circuit breaker resets to Closed. If they fail, it immediately returns to Open.

Key Requirements:

  • The CircuitBreaker class should be initialized with parameters defining its behavior:
    • failure_threshold: The number of consecutive failures allowed before tripping the circuit breaker to Open.
    • recovery_timeout: The duration (in seconds) the circuit breaker stays in the Open state before transitioning to Half-Open.
    • exceptions: A tuple of exception types that should be considered failures.
  • A method (e.g., call) should be provided to wrap the protected function. This method will:
    • Check the current state of the circuit breaker.
    • Execute the protected function based on the state.
    • Update the state and failure count accordingly.
    • Raise exceptions for failed calls or if the circuit is open.
  • The circuit breaker should maintain an internal state and a failure counter.
  • It should track the time elapsed since the last failure to manage the recovery_timeout.

Expected Behavior:

  • When in the CLOSED state, calls to the protected function are executed. If an exception in exceptions occurs, the failure count increases. If failure_threshold is reached, transition to OPEN.
  • When in the OPEN state, calls to the protected function are immediately rejected with a specific exception (e.g., CircuitBreakerOpenError) without executing the function. After recovery_timeout has passed since entering the OPEN state, transition to HALF_OPEN.
  • When in the HALF_OPEN state, a limited number of calls (e.g., one or a small configurable number) are allowed to pass.
    • If a call in HALF_OPEN succeeds, transition back to CLOSED and reset the failure count.
    • If a call in HALF_OPEN fails (with an exception in exceptions), transition back to OPEN immediately.
  • If the protected function raises an exception not in the exceptions tuple, it should be re-raised immediately without affecting the circuit breaker's state.

Edge Cases to Consider:

  • What happens if recovery_timeout is 0?
  • What happens if failure_threshold is 0 or 1?
  • Concurrent access to the circuit breaker: While not strictly required for this basic implementation, consider how race conditions might arise in a multi-threaded environment (though this challenge focuses on the core logic).

Examples

Example 1: Successful Calls and then Failure

Let's assume: failure_threshold = 3 recovery_timeout = 5 seconds exceptions = (ValueError, TypeError)

import time

class CircuitBreakerOpenError(Exception):
    pass

class MockService:
    def __init__(self):
        self.call_count = 0

    def potentially_failing_operation(self, should_fail_after_calls=4):
        self.call_count += 1
        if self.call_count >= should_fail_after_calls:
            raise ValueError("Operation failed due to simulated error")
        return f"Operation successful (call #{self.call_count})"

# --- Circuit Breaker Usage ---
mock_service = MockService()
cb = CircuitBreaker(failure_threshold=3, recovery_timeout=5, exceptions=(ValueError,))

# Call 1: Success
print(cb.call(mock_service.potentially_failing_operation))

# Call 2: Success
print(cb.call(mock_service.potentially_failing_operation))

# Call 3: Success
print(cb.call(mock_service.potentially_failing_operation))

# Call 4: Failure (will trip the breaker)
try:
    print(cb.call(mock_service.potentially_failing_operation))
except ValueError as e:
    print(f"Caught expected error: {e}")

# Call 5: Breaker is OPEN
try:
    print(cb.call(mock_service.potentially_failing_operation))
except CircuitBreakerOpenError as e:
    print(f"Caught circuit breaker error: {e}")

# Wait for recovery timeout
time.sleep(6)

# Call 6: Breaker is HALF-OPEN, success should reset it
print(cb.call(mock_service.potentially_failing_operation, should_fail_after_calls=1)) # Reset mock service to succeed immediately

# Call 7: Breaker should now be CLOSED again
print(cb.call(mock_service.potentially_failing_operation, should_fail_after_calls=1))

Expected Output for Example 1:

Operation successful (call #1)
Operation successful (call #2)
Operation successful (call #3)
Caught expected error: Operation failed due to simulated error
Caught circuit breaker error: Circuit breaker is open.
Operation successful (call #4)
Operation successful (call #5)

Explanation: The first three calls succeed. The fourth call triggers a ValueError, incrementing the failure count to 3, which trips the breaker to OPEN. The fifth call is immediately rejected with CircuitBreakerOpenError. After a 6-second wait (longer than recovery_timeout), the breaker becomes HALF-OPEN. The sixth call is allowed through and succeeds, resetting the breaker to CLOSED. The seventh call also succeeds.

Example 2: Open State Timeout and Subsequent Failure in Half-Open

Let's assume the same parameters as Example 1.

import time

class CircuitBreakerOpenError(Exception):
    pass

class MockService:
    def __init__(self):
        self.call_count = 0

    def get_status(self, fail_on_attempt=None):
        self.call_count += 1
        if fail_on_attempt is not None and self.call_count == fail_on_attempt:
            raise ConnectionError("Simulated network issue")
        return "Service is healthy"

# --- Circuit Breaker Usage ---
mock_service = MockService()
cb = CircuitBreaker(failure_threshold=2, recovery_timeout=3, exceptions=(ConnectionError,))

# Two failures to open the circuit
print(cb.call(mock_service.get_status, fail_on_attempt=1))
mock_service.call_count = 0 # Reset for next call
print(cb.call(mock_service.get_status, fail_on_attempt=1))
mock_service.call_count = 0 # Reset for next call

print("Circuit tripped. Waiting for recovery timeout...")
time.sleep(4) # Wait for recovery timeout

# First call in HALF-OPEN state, but it fails
mock_service.call_count = 0 # Reset for next call
try:
    print(cb.call(mock_service.get_status, fail_on_attempt=1)) # This call should fail
except ConnectionError as e:
    print(f"Caught expected error in HALF-OPEN: {e}")

# Circuit should now be OPEN again
print("Circuit is OPEN again. Trying a call...")
try:
    print(cb.call(mock_service.get_status))
except CircuitBreakerOpenError as e:
    print(f"Caught circuit breaker error: {e}")

Expected Output for Example 2:

Service is healthy
Service is healthy
Circuit tripped. Waiting for recovery timeout...
Caught expected error in HALF-OPEN: Simulated network issue
Circuit is OPEN again. Trying a call...
Caught circuit breaker error: Circuit breaker is open.

Explanation: Two calls lead to ConnectionError (simulated by fail_on_attempt=1 each time after resetting call_count), tripping the circuit to OPEN. After waiting 4 seconds, the circuit becomes HALF-OPEN. The first call in this state is configured to fail, causing the circuit breaker to immediately revert to the OPEN state. A subsequent call is then correctly rejected.

Constraints

  • failure_threshold: Must be an integer greater than or equal to 1.
  • recovery_timeout: Must be a non-negative float or integer representing seconds.
  • exceptions: Must be a tuple of valid Python exception classes.
  • The call method should accept the function to be executed as its first argument and any subsequent positional or keyword arguments that should be passed to that function.
  • The circuit breaker should be thread-safe for basic state transitions (though full concurrency handling for complex scenarios is out of scope for this core implementation).

Notes

  • You will need to implement your own CircuitBreakerOpenError exception.
  • The time module will be essential for tracking the recovery_timeout.
  • Consider how you will manage the state transitions and the failure counter efficiently.
  • For the HALF-OPEN state, a common approach is to allow a single "test" call. If successful, reset to CLOSED; if it fails, revert to OPEN. You can implement this as a single test call.
  • Think about how to pass arbitrary arguments (*args, **kwargs) to the protected function.
  • You may want to use an enum or constants for the circuit breaker states (CLOSED, OPEN, HALF_OPEN).
Loading editor...
python