Implement a Token Bucket Rate Limiter in Python

Rate limiting is a crucial technique for controlling the rate at which requests are made to a service, preventing overload and ensuring fair usage. This challenge asks you to implement a classic rate limiting algorithm, the Token Bucket, in Python.

Problem Description

Your task is to create a TokenBucket class that simulates a rate limiter. The Token Bucket algorithm works by having a bucket that can hold a certain number of tokens. Tokens are added to the bucket at a fixed rate. When a request arrives, it tries to consume a token from the bucket. If a token is available, the request is allowed to proceed. If the bucket is empty, the request is denied or delayed until a token becomes available.

Key Requirements:

The TokenBucket class should be initialized with a capacity (maximum number of tokens the bucket can hold) and a refill_rate (number of tokens added per second).
A consume(tokens) method should be implemented. This method attempts to consume a specified number of tokens.
The consume method should return True if the tokens can be consumed (i.e., there are enough tokens in the bucket).
The consume method should return False if there are not enough tokens available to satisfy the request.
Tokens should be refilled periodically to maintain the specified refill_rate. This refill process should happen lazily when consume is called or a new request is made.

Expected Behavior:

When consume(n) is called and n tokens are available, n tokens are removed from the bucket, and True is returned.
When consume(n) is called and fewer than n tokens are available, False is returned, and no tokens are removed.
The bucket should never exceed its capacity. If a refill would cause it to exceed capacity, it should be capped at capacity.

Edge Cases:

What happens if consume is called with tokens greater than the bucket's capacity?
What if the refill_rate is very high or very low?
Consider the scenario where consume is called rapidly in succession versus with long delays.

Examples

Example 1:

import time

# Initialize a bucket with capacity 10 and refill rate of 2 tokens/sec
limiter = TokenBucket(capacity=10, refill_rate=2)

# Consume 1 token (should succeed)
print(limiter.consume(1)) # Output: True

# Consume 9 more tokens (should succeed, filling up the limit)
print(limiter.consume(9)) # Output: True

# Try to consume 1 more token (should fail as bucket is empty)
print(limiter.consume(1)) # Output: False

# Wait for 1 second to allow tokens to refill
time.sleep(1)

# Consume 1 token (should succeed as 2 tokens refilled)
print(limiter.consume(1)) # Output: True

Explanation: Initially, the bucket has 10 tokens. The first consume(1) succeeds, leaving 9. The next consume(9) succeeds, leaving 0. The subsequent consume(1) fails. After 1 second, 2 tokens are refilled, making 2 tokens available, so consume(1) succeeds.

Example 2:

import time

limiter = TokenBucket(capacity=5, refill_rate=1)

# Consume 3 tokens
print(limiter.consume(3)) # Output: True

# Wait for 2 seconds
time.sleep(2)

# Refill should have added 2 tokens. Bucket now has 2 + (5-3) = 4 tokens.
# Consume 3 tokens (should succeed)
print(limiter.consume(3)) # Output: True

# Bucket now has 1 token.
# Wait for 3 seconds.
time.sleep(3)

# Refill should have added 3 tokens. Bucket now has 1 + 3 = 4 tokens.
# Consume 4 tokens (should succeed)
print(limiter.consume(4)) # Output: True

# Bucket now has 0 tokens.
# Try to consume 1 token (should fail)
print(limiter.consume(1)) # Output: False

Explanation: This example demonstrates refills and consumption, showing how the bucket state changes over time with a slower refill rate. The refill amount is capped by the capacity.

Example 3 (Edge Case - Large Consumption Request):

import time

limiter = TokenBucket(capacity=5, refill_rate=10) # High refill rate

# Try to consume more tokens than capacity (should fail)
print(limiter.consume(10)) # Output: False

# Even though the refill rate is high, we can't consume more than capacity at once.
# Consume 5 tokens (should succeed)
print(limiter.consume(5)) # Output: True

# Bucket is empty. Try to consume 1 token.
print(limiter.consume(1)) # Output: False

# Wait briefly. Tokens will refill quickly.
time.sleep(0.1)
print(limiter.consume(1)) # Output: True

Explanation: This shows that a single consume request cannot exceed the bucket's capacity, even if tokens are refilled rapidly.

Constraints

The capacity will be a positive integer between 1 and 1000.
The refill_rate will be a positive integer representing tokens per second, between 1 and 100.
The tokens argument for the consume method will be a positive integer between 1 and 100.
Your implementation should be thread-safe. Multiple threads might call the consume method concurrently.
The time aspect should be handled using Python's standard time module.

Notes

Consider how to track the last refill time and the current number of tokens.
You'll need to calculate how many tokens should have been refilled since the last check based on the elapsed time and the refill_rate.
The time.time() function will be useful for tracking time.
Remember to handle the capacity constraint carefully during refills.
For thread safety, consider using synchronization primitives like threading.Lock.