Implementing a Token Bucket Rate Limiter in Python

Rate limiting is a crucial mechanism for managing the flow of requests to an API or service, preventing overload and ensuring fair usage. This challenge requires you to implement a common rate limiting algorithm, the token bucket, using Python. A well-implemented token bucket can gracefully handle bursts of traffic while adhering to predefined limits.

Problem Description

Your task is to create a Python class, TokenBucket, that simulates a token bucket rate limiter. This class should allow a certain number of "tokens" to be consumed per unit of time. Tokens are refilled at a constant rate, and requests can only proceed if there are sufficient tokens available.

Key Requirements:

Initialization: The TokenBucket class should be initialized with:
- capacity: The maximum number of tokens the bucket can hold.
- refill_rate: The number of tokens added to the bucket per second.
consume(tokens_needed) method:
- This method attempts to consume a specified number of tokens_needed.
- It should return True if the tokens can be consumed (i.e., there are enough tokens available and the bucket is not being overfilled due to refills).
- It should return False if there are not enough tokens available.
- The bucket's current token count should be updated only if the consumption is successful.
Automatic Refill: Tokens should be automatically refilled based on the refill_rate. The refill logic should consider the time elapsed since the last refill.

Expected Behavior:

When consume(n) is called, the bucket checks if n tokens are available. If yes, it subtracts n from the current token count and returns True.
If n tokens are not available, it returns False without modifying the token count.
Between calls to consume, tokens should be added to the bucket, up to its capacity.

Edge Cases:

Initial state: When the bucket is created, it should start with its full capacity of tokens.
Refill exceeding capacity: The number of tokens in the bucket should never exceed its capacity.
Zero or negative refill rate/capacity: While not explicitly forbidden by this problem description, in a real-world scenario, these would be invalid. For this challenge, assume valid positive integers.
Rapid consumption/refill: The system should handle scenarios where consume is called very frequently, and the refill logic needs to be precise.

Examples

Example 1:

import time

# Initialize with capacity of 5 tokens and refill rate of 2 tokens/second
bucket = TokenBucket(capacity=5, refill_rate=2)

# Initially, bucket has 5 tokens
print(f"Initial tokens: {bucket.current_tokens}") # Expected: 5.0

# Consume 2 tokens - should succeed
print(f"Attempt to consume 2 tokens: {bucket.consume(2)}") # Expected: True
print(f"Tokens after consumption: {bucket.current_tokens}") # Expected: 3.0

# Consume 4 tokens - should fail (only 3 available)
print(f"Attempt to consume 4 tokens: {bucket.consume(4)}") # Expected: False
print(f"Tokens after failed consumption: {bucket.current_tokens}") # Expected: 3.0

# Wait for some time to allow refill
time.sleep(2) # Wait for 2 seconds

# After 2 seconds, 2 * 2 = 4 tokens should have been refilled
# Current tokens: 3 + 4 = 7. But capacity is 5. So, 5.
print(f"Attempt to consume 3 tokens: {bucket.consume(3)}") # Expected: True
print(f"Tokens after consumption: {bucket.current_tokens}") # Expected: 2.0

Explanation: The bucket starts full. Consuming 2 tokens leaves 3. Attempting to consume 4 fails. After waiting 2 seconds, 4 tokens are refilled, bringing the total to 7, but capped at 5. Then, consuming 3 tokens leaves 2.

Example 2:

import time

# Initialize with capacity of 10 tokens and refill rate of 1 token/second
bucket = TokenBucket(capacity=10, refill_rate=1)

# Consume 10 tokens
print(f"Attempt to consume 10 tokens: {bucket.consume(10)}") # Expected: True
print(f"Tokens after consumption: {bucket.current_tokens}") # Expected: 0.0

# Try to consume 1 token immediately - should fail
print(f"Attempt to consume 1 token: {bucket.consume(1)}") # Expected: False
print(f"Tokens after failed consumption: {bucket.current_tokens}") # Expected: 0.0

# Wait for 3 seconds
time.sleep(3)

# After 3 seconds, 3 * 1 = 3 tokens should be refilled
print(f"Attempt to consume 2 tokens: {bucket.consume(2)}") # Expected: True
print(f"Tokens after consumption: {bucket.current_tokens}") # Expected: 1.0

Explanation: Consuming all 10 tokens empties the bucket. The immediate attempt to consume 1 token fails. After 3 seconds, 3 tokens are refilled, allowing the next consumption of 2 tokens.

Constraints

capacity: An integer greater than 0.
refill_rate: A float or integer greater than 0, representing tokens per second.
consume(tokens_needed): tokens_needed will be an integer greater than 0.
The implementation should be efficient and handle time-based refills correctly. Avoid busy-waiting where possible.

Notes

You'll need to keep track of the last time the bucket was refilled or accessed to accurately calculate elapsed time for refills.
Consider using time.monotonic() for reliable time tracking.
The token count can be a float to handle fractional refills accurately, but consumption might be of integer amounts.
Think about how to structure your class to manage the state (current tokens, last refill time) effectively.