Python Database Migrations Challenge
Imagine you're building a new web application and need to manage your database schema over time. As your application evolves, you'll need to add new tables, columns, or modify existing ones. Database migrations are a systematic way to handle these schema changes, ensuring your database stays consistent across different environments and deployments. This challenge will guide you through creating a basic migration system in Python.
Problem Description
Your task is to create a Python script that can apply and potentially revert database schema changes. You will simulate a migration system that takes a set of migration scripts and applies them to a target database.
What needs to be achieved:
- Migration Scripting: Define a mechanism for creating individual migration scripts. Each script will represent a specific database change (e.g.,
CREATE TABLE,ALTER TABLE). - Migration Application: Develop a function or class that can iterate through a directory of migration scripts, check which ones have already been applied, and apply the new ones in the correct order.
- Migration Tracking: Implement a way to track which migrations have been successfully applied to the database. This is crucial for knowing what needs to be run and for potential rollbacks.
Key Requirements:
- Migration Directory: Migrations will be stored in a specified directory.
- Naming Convention: Migration files should follow a clear naming convention that allows for chronological ordering (e.g.,
001_create_users_table.py,002_add_email_to_users.py). - Apply Functionality: A function
apply_migrations(migration_dir, db_connection)should take the directory path and a database connection object and apply any pending migrations. - Migration State Storage: A simple mechanism is needed to store the state of applied migrations. For this challenge, you can simulate this by maintaining a list of applied migration filenames, or by creating a simple "migrations" table in the database itself (though for simplicity, we'll start with a programmatic approach).
- Idempotency (Optional but good practice): Ideally, migration scripts should be idempotent, meaning running them multiple times has the same effect as running them once. For this challenge, focus on the core application logic.
- Pythonic Implementation: The solution should be written in idiomatic Python.
Expected Behavior:
When apply_migrations is called:
- It scans the
migration_dirfor files matching the migration pattern. - It determines which migrations have already been applied (based on your tracking mechanism).
- It sorts the remaining migrations chronologically.
- It executes the SQL (or Python code that generates SQL) within each pending migration script against the
db_connection. - After successful execution, it marks the migration as applied.
Important Edge Cases to Consider:
- Empty Migration Directory: What happens if there are no migration files?
- Already Applied Migrations: How does the system handle migrations that have already been run?
- Order of Operations: Ensuring migrations are applied in the correct sequence is vital.
- Errors During Migration: How would you handle a migration script that fails? (For this challenge, you can log the error and stop the process or mark the migration as failed).
Examples
Example 1: Basic Migration Application
Let's assume we have a simple SQLite database and a migration directory.
Directory Structure:
/migrations
001_create_users_table.py
002_add_creation_date_to_users.py
001_create_users_table.py content:
# This file defines the SQL to create the users table
SQL = """
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT NOT NULL UNIQUE
);
"""
002_add_creation_date_to_users.py content:
# This file defines the SQL to add a creation_date column
SQL = """
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
"""
Simulated Migration State (initially empty):
applied_migrations = []
apply_migrations call:
import sqlite3
import os
# Assume apply_migrations is defined elsewhere and works as described
db_file = "my_app.db"
if os.path.exists(db_file):
os.remove(db_file) # Start fresh for the example
conn = sqlite3.connect(db_file)
cursor = conn.cursor()
# Simulate applying migrations
# In a real system, this would be your migration runner
# For this example, we'll manually simulate the application logic
def simulate_apply_migrations(migration_dir, db_conn):
all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])
applied_migrations = [] # In-memory state for this simulation
for filename in all_migration_files:
if filename not in applied_migrations:
filepath = os.path.join(migration_dir, filename)
try:
with open(filepath, 'r') as f:
# Dynamically import the script
# This is a simplified approach; a real system might use a dedicated import mechanism
module_name = filename[:-3] # Remove .py
import_path = f"{migration_dir}.{module_name}" # This might require adjusting sys.path
# For simplicity in this example, let's directly execute the SQL from the file content
# A real system would likely define a migration class with 'up' and 'down' methods
with open(filepath, 'r') as script_file:
script_content = script_file.read()
# Execute the SQL found in the script
if "SQL =" in script_content:
sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
db_conn.execute(sql_statement)
db_conn.commit()
print(f"Applied migration: {filename}")
applied_migrations.append(filename)
else:
print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")
except Exception as e:
print(f"Error applying migration {filename}: {e}")
db_conn.rollback() # Rollback if an error occurs
# In a real system, you might want to mark this migration as failed and stop
break # Stop processing further migrations on error
return applied_migrations
# Create the migrations directory and files for the example
os.makedirs("migrations", exist_ok=True)
with open("migrations/001_create_users_table.py", "w") as f:
f.write("""
SQL = \"\"\"
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT NOT NULL UNIQUE
);
\"\"\"
""")
with open("migrations/002_add_creation_date_to_users.py", "w") as f:
f.write("""
SQL = \"\"\"
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
\"\"\"
""")
applied_state = simulate_apply_migrations("migrations", conn.cursor())
print(f"Final applied migrations state: {applied_state}")
conn.close()
Expected Output (after running simulate_apply_migrations):
Applied migration: 001_create_users_table.py
Applied migration: 002_add_creation_date_to_users.py
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']
Explanation:
The script finds 001_create_users_table.py and 002_add_creation_date_to_users.py. Since neither has been applied, it executes the SQL defined in 001_create_users_table.py to create the users table. Then, it executes the SQL from 002_add_creation_date_to_users.py to add the creation_date column. Finally, it records both as applied.
Example 2: Running Migrations Again
Directory Structure: Same as Example 1.
Simulated Migration State (after Example 1):
applied_migrations = ['001_create_users_table.py', '002_add_creation_date_to_users.py']
apply_migrations call (with the same database file and state):
# Assume the database 'my_app.db' and migrations directory are already set up from Example 1
db_file = "my_app.db"
conn = sqlite3.connect(db_file)
cursor = conn.cursor()
# Re-run the simulation function with the existing state
# In a real scenario, the applied state would be loaded from storage
applied_state_after_rerun = simulate_apply_migrations("migrations", conn.cursor()) # Note: simulate_apply_migrations resets its internal state, a real system would load it.
# To accurately simulate, we'd need to pass the applied_migrations list IN to simulate_apply_migrations
# Let's simulate passing the state IN
def simulate_apply_migrations_with_state(migration_dir, db_conn, current_applied_state):
all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])
for filename in all_migration_files:
if filename not in current_applied_state:
filepath = os.path.join(migration_dir, filename)
try:
with open(filepath, 'r') as script_file:
script_content = script_file.read()
if "SQL =" in script_content:
sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
db_conn.execute(sql_statement)
db_conn.commit()
print(f"Applied migration: {filename}")
current_applied_state.append(filename) # Update the state IN PLACE
else:
print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")
except Exception as e:
print(f"Error applying migration {filename}: {e}")
db_conn.rollback()
break
return current_applied_state
# Re-initialize connection and cursor for clarity
conn = sqlite3.connect(db_file)
cursor = conn.cursor()
# Simulate loading existing applied state
existing_applied = ['001_create_users_table.py', '002_add_creation_date_to_users.py']
print(f"Starting with applied state: {existing_applied}")
final_applied_state = simulate_apply_migrations_with_state("migrations", conn.cursor(), existing_applied)
print(f"Final applied migrations state: {final_applied_state}")
conn.close()
Expected Output:
Starting with applied state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']
Explanation:
Since both migration files are already in the applied_migrations list, no new migrations are executed. The system correctly identifies that everything is up-to-date.
Constraints
- Python Version: Python 3.7 or higher.
- Database: The solution should be designed to work with a common relational database. For testing and simplicity, SQLite is a good choice, but the core logic should be adaptable to PostgreSQL, MySQL, etc., by using appropriate database connectors (e.g.,
psycopg2,mysql.connector). - Migration File Format: Migration scripts must be Python files (
.py). Each file should contain a variable namedSQLthat holds the SQL statement(s) for the migration. - Migration Order: Migration files must be named with a zero-padded numerical prefix to ensure correct chronological ordering (e.g.,
001_...,002_...,010_...). - Performance: For this challenge, micro-optimizations are not critical. The focus is on correctness and understandability. A typical migration process involves a few database operations.
Notes
- Migration State Storage: For this challenge, you can maintain the state of applied migrations in memory during the script's execution. In a real-world application, this state would be stored persistently in a dedicated "migrations" table within your database itself. This allows the migration system to know its state even after the application restarts.
- Rollback: Implementing rollback functionality (
downmigrations) is beyond the scope of this initial challenge, but it's a crucial part of a robust migration system. - Abstraction: Consider how you might abstract the database interaction. Instead of directly using
sqlite3, you might use an ORM like SQLAlchemy, which can then handle different database backends more seamlessly. - Dynamic Importing: You'll need to figure out how to load and execute Python code from the migration files. Be mindful of security implications if you were to generalize this for untrusted sources.