Implement a Python Caching Layer
Web applications and data-intensive services often involve computationally expensive operations or frequent requests to external resources. A caching layer can significantly improve performance by storing the results of these operations and returning them directly for subsequent identical requests, avoiding redundant work. This challenge asks you to implement a basic in-memory caching mechanism in Python.
Problem Description
You need to create a Python class, let's call it Cache, that acts as an in-memory cache. This cache should store key-value pairs. The primary operations your Cache class will need to support are:
set(key, value, ttl=None): Stores avalueassociated with akey. Optionally, attl(Time To Live) in seconds can be provided. Ifttlis set, the cache entry should expire after that duration.get(key): Retrieves thevalueassociated with akey. If the key is not found in the cache, or if it has expired, it should returnNone.delete(key): Removes akeyand its associated value from the cache.
Key Requirements:
- The cache should be thread-safe to handle concurrent access from multiple threads.
- Expiration of cached items based on
ttlshould be handled automatically. This meansgetoperations should respect the expiration. - If
ttlis not provided, the item should be considered to have an indefinite lifespan (until explicitly deleted or the cache is cleared).
Expected Behavior:
- When
setis called with attl, the item should be retrievable untilttlseconds have passed since it was set. - After expiration,
getfor that key should returnNone. deleteshould permanently remove an item.- Concurrent
set,get, anddeleteoperations should not lead to race conditions or corrupted cache state.
Edge Cases to Consider:
- Setting an item with a
ttlof 0 or a negative value (should probably be treated as immediately expired or raise an error, but for this challenge, let's assume it means it won't be stored or is immediately expired). - Accessing a key that was recently deleted.
- Accessing a key that has expired.
- The scenario where multiple threads try to access or modify the same key simultaneously.
Examples
Example 1:
from cache import Cache
import time
cache = Cache()
# Set a value without TTL
cache.set("user:1", {"name": "Alice", "email": "alice@example.com"})
print(cache.get("user:1")) # Output: {'name': 'Alice', 'email': 'alice@example.com'}
# Set a value with TTL
cache.set("config:settings", {"theme": "dark", "language": "en"}, ttl=5)
print(cache.get("config:settings")) # Output: {'theme': 'dark', 'language': 'en'}
# Wait for expiration
time.sleep(6)
print(cache.get("config:settings")) # Output: None
Explanation:
The first set stores "user:1" indefinitely. The second set stores "config:settings" with a 5-second TTL. After 6 seconds, "config:settings" has expired and get returns None.
Example 2:
from cache import Cache
import time
cache = Cache()
# Set a value with TTL
cache.set("short_lived_data", [1, 2, 3], ttl=2)
print(cache.get("short_lived_data")) # Output: [1, 2, 3]
# Delete the item before expiration
cache.delete("short_lived_data")
print(cache.get("short_lived_data")) # Output: None
# Set another item
cache.set("permanent_data", "This should stay")
print(cache.get("permanent_data")) # Output: This should stay
Explanation:
"short_lived_data" is set with a TTL of 2 seconds. Before it expires, delete is called, and subsequent get calls return None. "permanent_data" remains in the cache.
Example 3: Thread Safety
from cache import Cache
import threading
import time
cache = Cache()
num_threads = 10
num_iterations = 100
def worker(thread_id):
for i in range(num_iterations):
key = f"thread_{thread_id}_item_{i}"
value = f"data_from_thread_{thread_id}_{i}"
cache.set(key, value, ttl=1)
retrieved_value = cache.get(key)
if retrieved_value != value:
print(f"Thread {thread_id} ERROR: Mismatch for key {key}. Expected {value}, got {retrieved_value}")
time.sleep(0.001) # Simulate some work
threads = []
for i in range(num_threads):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
print("All threads finished.")
# Optional: Verify a few keys still exist (they might have expired)
print(f"Cache size after threads: {len(cache.cache)}") # Accessing internal cache size for verification
Explanation: This example demonstrates concurrent access. Multiple threads simultaneously set items with short TTLs. A correctly implemented thread-safe cache should handle these concurrent writes and reads without data corruption or errors. The key is ensuring that operations like adding to a dictionary or checking expiration times are atomic.
Constraints
- The cache implementation should use Python's standard library. No external caching libraries (like
redis,memcachedclients,functools.lru_cachedirectly) are allowed for the core caching logic. - The cache should reside entirely in memory.
- The
ttlshould be handled by checking the current time against the timestamp when the item was set. - Performance of
getandsetoperations should ideally be O(1) on average, assuming no significant expiration cleanup overhead.
Notes
- Consider how to store the expiration timestamp along with the value for each cached item.
- For thread safety,
threading.Lockorthreading.RLockis your friend. Think about which operations need to be protected by a lock. - The mechanism for handling expired items can be passive (checked only on
get) or active (a background thread cleaning up). For this challenge, a passive approach checked duringgetis sufficient and simpler. - You might need a way to store the creation time for each item to implement TTL.
- Consider the data structure you will use for the cache itself (e.g., a dictionary).