Secure Data Vault
Imagine you need to store sensitive information, such as API keys, passwords, or personal data, in a Python application. Simply storing this data in plain text files or variables is a significant security risk. This challenge asks you to create a Python class that provides a secure way to store and retrieve sensitive data, protecting it from unauthorized access.
Problem Description
Your task is to implement a SecureDataVault class in Python. This class should allow users to store and retrieve arbitrary data, but with an added layer of security. The data should be encrypted when stored and decrypted only when retrieved by an authorized user.
Key Requirements:
- Encryption/Decryption: The vault must use a strong encryption algorithm (e.g., AES in CBC mode) to protect the stored data.
- Master Password: Access to the vault (both storing and retrieving data) must be protected by a master password. This password will be used to derive the encryption/decryption key.
- Data Storage: The encrypted data should be stored in a persistent manner, for example, in a file.
- Key Derivation: A secure method should be used to derive the encryption key from the master password (e.g., using PBKDF2).
- Interface: The class should provide methods for:
- Initializing the vault (possibly creating a new one or loading an existing one).
- Storing a piece of data associated with a unique key (e.g.,
store_data(key, data)). - Retrieving a piece of data using its key (e.g.,
retrieve_data(key)). - Deleting a piece of data (e.g.,
delete_data(key)).
Expected Behavior:
- When
store_datais called, the provideddatashould be encrypted using a key derived from the master password and then saved. - When
retrieve_datais called, the vault should locate the encrypted data, decrypt it using the derived key, and return the original data. - If
retrieve_datais called with a key that doesn't exist or if the master password is incorrect, an appropriate error should be raised. - Data should remain encrypted even if the storage file is inspected.
Edge Cases to Consider:
- Empty data to be stored.
- Special characters or different data types being stored.
- Attempting to retrieve data before initializing the vault or without providing the correct master password.
- Corrupted storage files.
Examples
Example 1: Basic Storage and Retrieval
# Assume a file named 'my_vault.dat' will be created/used
master_password = "supersecretpassword123"
# Initialize the vault
vault = SecureDataVault("my_vault.dat", master_password)
# Store some sensitive data
vault.store_data("api_key", "abcdef1234567890")
vault.store_data("database_password", "pa$$wOrd!@#$")
# Retrieve the data
retrieved_api_key = vault.retrieve_data("api_key")
retrieved_db_password = vault.retrieve_data("database_password")
print(f"Retrieved API Key: {retrieved_api_key}")
print(f"Retrieved DB Password: {retrieved_db_password}")
# Attempt to retrieve data with a different password (will fail)
try:
wrong_vault = SecureDataVault("my_vault.dat", "wrongpassword")
wrong_vault.retrieve_data("api_key")
except Exception as e:
print(f"Error when using wrong password: {e}")
Expected Output (will vary slightly based on encryption implementation but conceptually):
Retrieved API Key: abcdef1234567890
Retrieved DB Password: pa$$wOrd!@#$
Error when using wrong password: Decryption failed. Invalid password or corrupted data.
Explanation:
The SecureDataVault is initialized with a file path and a master password. Sensitive data ("api_key" and "database_password") is stored, which internally gets encrypted. When retrieved with the correct password, the data is decrypted and returned. An attempt to access the data with an incorrect password results in an error, demonstrating the security.
Example 2: Deleting Data and Handling Non-existent Keys
master_password = "anothersecurepassword456"
vault = SecureDataVault("another_vault.dat", master_password)
vault.store_data("user_token", "xyz789uvw")
print("Token stored.")
vault.delete_data("user_token")
print("Token deleted.")
# Attempt to retrieve deleted data
try:
vault.retrieve_data("user_token")
except KeyError as e:
print(f"Error retrieving deleted data: {e}")
# Attempt to retrieve non-existent data
try:
vault.retrieve_data("non_existent_key")
except KeyError as e:
print(f"Error retrieving non-existent data: {e}")
Expected Output:
Token stored.
Token deleted.
Error retrieving deleted data: 'user_token'
Error retrieving non-existent data: 'non_existent_key'
Explanation:
Data is stored and then explicitly deleted. Attempts to retrieve the deleted data or data that was never stored correctly raise a KeyError.
Constraints
- The storage file should not exceed 100MB in size.
- The master password will be a string, between 8 and 64 characters long.
- The keys used for storing data will be strings, between 1 and 32 characters long.
- The data to be stored can be any pickleable Python object.
- The encryption process should be reasonably fast, with encryption and decryption of up to 1MB of data taking no more than 1 second on a standard modern CPU.
Notes
- You'll likely need to use libraries like
cryptographyfor robust encryption andhashliborscrypt/PBKDF2for secure password hashing and key derivation. - Consider how you will handle the initialization of a new vault versus loading an existing one.
- Think about how to store metadata (like initialization vectors or salt) along with the encrypted data, if your chosen encryption mode requires it.
- Error handling is crucial for a secure system. Ensure appropriate exceptions are raised for invalid passwords, missing keys, or potential data corruption.
- Storing the derived key directly is not recommended. It should be re-derived from the master password each time the vault is accessed.