Python Database Migrations Challenge

Imagine you're building a new web application and need to manage your database schema over time. As your application evolves, you'll need to add new tables, columns, or modify existing ones. Database migrations are a systematic way to handle these schema changes, ensuring your database stays consistent across different environments and deployments. This challenge will guide you through creating a basic migration system in Python.

Problem Description

Your task is to create a Python script that can apply and potentially revert database schema changes. You will simulate a migration system that takes a set of migration scripts and applies them to a target database.

What needs to be achieved:

Migration Scripting: Define a mechanism for creating individual migration scripts. Each script will represent a specific database change (e.g., CREATE TABLE, ALTER TABLE).
Migration Application: Develop a function or class that can iterate through a directory of migration scripts, check which ones have already been applied, and apply the new ones in the correct order.
Migration Tracking: Implement a way to track which migrations have been successfully applied to the database. This is crucial for knowing what needs to be run and for potential rollbacks.

Key Requirements:

Migration Directory: Migrations will be stored in a specified directory.
Naming Convention: Migration files should follow a clear naming convention that allows for chronological ordering (e.g., 001_create_users_table.py, 002_add_email_to_users.py).
Apply Functionality: A function apply_migrations(migration_dir, db_connection) should take the directory path and a database connection object and apply any pending migrations.
Migration State Storage: A simple mechanism is needed to store the state of applied migrations. For this challenge, you can simulate this by maintaining a list of applied migration filenames, or by creating a simple "migrations" table in the database itself (though for simplicity, we'll start with a programmatic approach).
Idempotency (Optional but good practice): Ideally, migration scripts should be idempotent, meaning running them multiple times has the same effect as running them once. For this challenge, focus on the core application logic.
Pythonic Implementation: The solution should be written in idiomatic Python.

Expected Behavior:

When apply_migrations is called:

It scans the migration_dir for files matching the migration pattern.
It determines which migrations have already been applied (based on your tracking mechanism).
It sorts the remaining migrations chronologically.
It executes the SQL (or Python code that generates SQL) within each pending migration script against the db_connection.
After successful execution, it marks the migration as applied.

Important Edge Cases to Consider:

Empty Migration Directory: What happens if there are no migration files?
Already Applied Migrations: How does the system handle migrations that have already been run?
Order of Operations: Ensuring migrations are applied in the correct sequence is vital.
Errors During Migration: How would you handle a migration script that fails? (For this challenge, you can log the error and stop the process or mark the migration as failed).

Examples

Example 1: Basic Migration Application

Let's assume we have a simple SQLite database and a migration directory.

Directory Structure:

/migrations
    001_create_users_table.py
    002_add_creation_date_to_users.py

001_create_users_table.py content:

# This file defines the SQL to create the users table
SQL = """
CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE
);
"""

002_add_creation_date_to_users.py content:

# This file defines the SQL to add a creation_date column
SQL = """
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
"""

Simulated Migration State (initially empty):

applied_migrations = []

apply_migrations call:

import sqlite3
import os

# Assume apply_migrations is defined elsewhere and works as described

db_file = "my_app.db"
if os.path.exists(db_file):
    os.remove(db_file) # Start fresh for the example

conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Simulate applying migrations
# In a real system, this would be your migration runner
# For this example, we'll manually simulate the application logic
def simulate_apply_migrations(migration_dir, db_conn):
    all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])
    applied_migrations = [] # In-memory state for this simulation

    for filename in all_migration_files:
        if filename not in applied_migrations:
            filepath = os.path.join(migration_dir, filename)
            try:
                with open(filepath, 'r') as f:
                    # Dynamically import the script
                    # This is a simplified approach; a real system might use a dedicated import mechanism
                    module_name = filename[:-3] # Remove .py
                    import_path = f"{migration_dir}.{module_name}" # This might require adjusting sys.path

                    # For simplicity in this example, let's directly execute the SQL from the file content
                    # A real system would likely define a migration class with 'up' and 'down' methods
                    with open(filepath, 'r') as script_file:
                        script_content = script_file.read()
                        # Execute the SQL found in the script
                        if "SQL =" in script_content:
                            sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
                            db_conn.execute(sql_statement)
                            db_conn.commit()
                            print(f"Applied migration: {filename}")
                            applied_migrations.append(filename)
                        else:
                            print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")

            except Exception as e:
                print(f"Error applying migration {filename}: {e}")
                db_conn.rollback() # Rollback if an error occurs
                # In a real system, you might want to mark this migration as failed and stop
                break # Stop processing further migrations on error

    return applied_migrations

# Create the migrations directory and files for the example
os.makedirs("migrations", exist_ok=True)
with open("migrations/001_create_users_table.py", "w") as f:
    f.write("""
SQL = \"\"\"
CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE
);
\"\"\"
""")
with open("migrations/002_add_creation_date_to_users.py", "w") as f:
    f.write("""
SQL = \"\"\"
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
\"\"\"
""")

applied_state = simulate_apply_migrations("migrations", conn.cursor())
print(f"Final applied migrations state: {applied_state}")

conn.close()

Expected Output (after running simulate_apply_migrations):

Applied migration: 001_create_users_table.py
Applied migration: 002_add_creation_date_to_users.py
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']

Explanation:

The script finds 001_create_users_table.py and 002_add_creation_date_to_users.py. Since neither has been applied, it executes the SQL defined in 001_create_users_table.py to create the users table. Then, it executes the SQL from 002_add_creation_date_to_users.py to add the creation_date column. Finally, it records both as applied.

Example 2: Running Migrations Again

Directory Structure: Same as Example 1.

Simulated Migration State (after Example 1):

applied_migrations = ['001_create_users_table.py', '002_add_creation_date_to_users.py']

apply_migrations call (with the same database file and state):

# Assume the database 'my_app.db' and migrations directory are already set up from Example 1

db_file = "my_app.db"
conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Re-run the simulation function with the existing state
# In a real scenario, the applied state would be loaded from storage
applied_state_after_rerun = simulate_apply_migrations("migrations", conn.cursor()) # Note: simulate_apply_migrations resets its internal state, a real system would load it.
# To accurately simulate, we'd need to pass the applied_migrations list IN to simulate_apply_migrations

# Let's simulate passing the state IN
def simulate_apply_migrations_with_state(migration_dir, db_conn, current_applied_state):
    all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])

    for filename in all_migration_files:
        if filename not in current_applied_state:
            filepath = os.path.join(migration_dir, filename)
            try:
                with open(filepath, 'r') as script_file:
                    script_content = script_file.read()
                    if "SQL =" in script_content:
                        sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
                        db_conn.execute(sql_statement)
                        db_conn.commit()
                        print(f"Applied migration: {filename}")
                        current_applied_state.append(filename) # Update the state IN PLACE
                    else:
                        print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")

            except Exception as e:
                print(f"Error applying migration {filename}: {e}")
                db_conn.rollback()
                break
    return current_applied_state

# Re-initialize connection and cursor for clarity
conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Simulate loading existing applied state
existing_applied = ['001_create_users_table.py', '002_add_creation_date_to_users.py']
print(f"Starting with applied state: {existing_applied}")

final_applied_state = simulate_apply_migrations_with_state("migrations", conn.cursor(), existing_applied)
print(f"Final applied migrations state: {final_applied_state}")

conn.close()

Expected Output:

Starting with applied state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']

Explanation:

Since both migration files are already in the applied_migrations list, no new migrations are executed. The system correctly identifies that everything is up-to-date.

Constraints

Python Version: Python 3.7 or higher.
Database: The solution should be designed to work with a common relational database. For testing and simplicity, SQLite is a good choice, but the core logic should be adaptable to PostgreSQL, MySQL, etc., by using appropriate database connectors (e.g., psycopg2, mysql.connector).
Migration File Format: Migration scripts must be Python files (.py). Each file should contain a variable named SQL that holds the SQL statement(s) for the migration.
Migration Order: Migration files must be named with a zero-padded numerical prefix to ensure correct chronological ordering (e.g., 001_..., 002_..., 010_...).
Performance: For this challenge, micro-optimizations are not critical. The focus is on correctness and understandability. A typical migration process involves a few database operations.

Notes

Migration State Storage: For this challenge, you can maintain the state of applied migrations in memory during the script's execution. In a real-world application, this state would be stored persistently in a dedicated "migrations" table within your database itself. This allows the migration system to know its state even after the application restarts.
Rollback: Implementing rollback functionality (down migrations) is beyond the scope of this initial challenge, but it's a crucial part of a robust migration system.
Abstraction: Consider how you might abstract the database interaction. Instead of directly using sqlite3, you might use an ORM like SQLAlchemy, which can then handle different database backends more seamlessly.
Dynamic Importing: You'll need to figure out how to load and execute Python code from the migration files. Be mindful of security implications if you were to generalize this for untrusted sources.

Python Database Migrations Challenge

Problem Description

What needs to be achieved:

Migration Scripting: Define a mechanism for creating individual migration scripts. Each script will represent a specific database change (e.g., CREATE TABLE, ALTER TABLE).
Migration Application: Develop a function or class that can iterate through a directory of migration scripts, check which ones have already been applied, and apply the new ones in the correct order.
Migration Tracking: Implement a way to track which migrations have been successfully applied to the database. This is crucial for knowing what needs to be run and for potential rollbacks.

Key Requirements:

Migration Directory: Migrations will be stored in a specified directory.
Naming Convention: Migration files should follow a clear naming convention that allows for chronological ordering (e.g., 001_create_users_table.py, 002_add_email_to_users.py).
Apply Functionality: A function apply_migrations(migration_dir, db_connection) should take the directory path and a database connection object and apply any pending migrations.
Migration State Storage: A simple mechanism is needed to store the state of applied migrations. For this challenge, you can simulate this by maintaining a list of applied migration filenames, or by creating a simple "migrations" table in the database itself (though for simplicity, we'll start with a programmatic approach).
Idempotency (Optional but good practice): Ideally, migration scripts should be idempotent, meaning running them multiple times has the same effect as running them once. For this challenge, focus on the core application logic.
Pythonic Implementation: The solution should be written in idiomatic Python.

Expected Behavior:

When apply_migrations is called:

It scans the migration_dir for files matching the migration pattern.
It determines which migrations have already been applied (based on your tracking mechanism).
It sorts the remaining migrations chronologically.
It executes the SQL (or Python code that generates SQL) within each pending migration script against the db_connection.
After successful execution, it marks the migration as applied.

Important Edge Cases to Consider:

Empty Migration Directory: What happens if there are no migration files?
Already Applied Migrations: How does the system handle migrations that have already been run?
Order of Operations: Ensuring migrations are applied in the correct sequence is vital.
Errors During Migration: How would you handle a migration script that fails? (For this challenge, you can log the error and stop the process or mark the migration as failed).

Examples

Example 1: Basic Migration Application

Let's assume we have a simple SQLite database and a migration directory.

Directory Structure:

/migrations
    001_create_users_table.py
    002_add_creation_date_to_users.py

001_create_users_table.py content:

# This file defines the SQL to create the users table
SQL = """
CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE
);
"""

002_add_creation_date_to_users.py content:

# This file defines the SQL to add a creation_date column
SQL = """
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
"""

Simulated Migration State (initially empty):

applied_migrations = []

apply_migrations call:

import sqlite3
import os

# Assume apply_migrations is defined elsewhere and works as described

db_file = "my_app.db"
if os.path.exists(db_file):
    os.remove(db_file) # Start fresh for the example

conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Simulate applying migrations
# In a real system, this would be your migration runner
# For this example, we'll manually simulate the application logic
def simulate_apply_migrations(migration_dir, db_conn):
    all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])
    applied_migrations = [] # In-memory state for this simulation

    for filename in all_migration_files:
        if filename not in applied_migrations:
            filepath = os.path.join(migration_dir, filename)
            try:
                with open(filepath, 'r') as f:
                    # Dynamically import the script
                    # This is a simplified approach; a real system might use a dedicated import mechanism
                    module_name = filename[:-3] # Remove .py
                    import_path = f"{migration_dir}.{module_name}" # This might require adjusting sys.path

                    # For simplicity in this example, let's directly execute the SQL from the file content
                    # A real system would likely define a migration class with 'up' and 'down' methods
                    with open(filepath, 'r') as script_file:
                        script_content = script_file.read()
                        # Execute the SQL found in the script
                        if "SQL =" in script_content:
                            sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
                            db_conn.execute(sql_statement)
                            db_conn.commit()
                            print(f"Applied migration: {filename}")
                            applied_migrations.append(filename)
                        else:
                            print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")

            except Exception as e:
                print(f"Error applying migration {filename}: {e}")
                db_conn.rollback() # Rollback if an error occurs
                # In a real system, you might want to mark this migration as failed and stop
                break # Stop processing further migrations on error

    return applied_migrations

# Create the migrations directory and files for the example
os.makedirs("migrations", exist_ok=True)
with open("migrations/001_create_users_table.py", "w") as f:
    f.write("""
SQL = \"\"\"
CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT NOT NULL UNIQUE
);
\"\"\"
""")
with open("migrations/002_add_creation_date_to_users.py", "w") as f:
    f.write("""
SQL = \"\"\"
ALTER TABLE users
ADD COLUMN creation_date DATETIME DEFAULT CURRENT_TIMESTAMP;
\"\"\"
""")

applied_state = simulate_apply_migrations("migrations", conn.cursor())
print(f"Final applied migrations state: {applied_state}")

conn.close()

Expected Output (after running simulate_apply_migrations):

Applied migration: 001_create_users_table.py
Applied migration: 002_add_creation_date_to_users.py
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']

Explanation:

Example 2: Running Migrations Again

Directory Structure: Same as Example 1.

Simulated Migration State (after Example 1):

applied_migrations = ['001_create_users_table.py', '002_add_creation_date_to_users.py']

apply_migrations call (with the same database file and state):

# Assume the database 'my_app.db' and migrations directory are already set up from Example 1

db_file = "my_app.db"
conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Re-run the simulation function with the existing state
# In a real scenario, the applied state would be loaded from storage
applied_state_after_rerun = simulate_apply_migrations("migrations", conn.cursor()) # Note: simulate_apply_migrations resets its internal state, a real system would load it.
# To accurately simulate, we'd need to pass the applied_migrations list IN to simulate_apply_migrations

# Let's simulate passing the state IN
def simulate_apply_migrations_with_state(migration_dir, db_conn, current_applied_state):
    all_migration_files = sorted([f for f in os.listdir(migration_dir) if f.endswith(".py") and not f.startswith("__")])

    for filename in all_migration_files:
        if filename not in current_applied_state:
            filepath = os.path.join(migration_dir, filename)
            try:
                with open(filepath, 'r') as script_file:
                    script_content = script_file.read()
                    if "SQL =" in script_content:
                        sql_statement = script_content.split("SQL = ")[1].strip().strip('"""')
                        db_conn.execute(sql_statement)
                        db_conn.commit()
                        print(f"Applied migration: {filename}")
                        current_applied_state.append(filename) # Update the state IN PLACE
                    else:
                        print(f"Warning: Migration file {filename} does not contain 'SQL =' definition.")

            except Exception as e:
                print(f"Error applying migration {filename}: {e}")
                db_conn.rollback()
                break
    return current_applied_state

# Re-initialize connection and cursor for clarity
conn = sqlite3.connect(db_file)
cursor = conn.cursor()

# Simulate loading existing applied state
existing_applied = ['001_create_users_table.py', '002_add_creation_date_to_users.py']
print(f"Starting with applied state: {existing_applied}")

final_applied_state = simulate_apply_migrations_with_state("migrations", conn.cursor(), existing_applied)
print(f"Final applied migrations state: {final_applied_state}")

conn.close()

Expected Output:

Starting with applied state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']
Final applied migrations state: ['001_create_users_table.py', '002_add_creation_date_to_users.py']

Explanation:

Since both migration files are already in the applied_migrations list, no new migrations are executed. The system correctly identifies that everything is up-to-date.

Constraints

Python Version: Python 3.7 or higher.
Database: The solution should be designed to work with a common relational database. For testing and simplicity, SQLite is a good choice, but the core logic should be adaptable to PostgreSQL, MySQL, etc., by using appropriate database connectors (e.g., psycopg2, mysql.connector).
Migration File Format: Migration scripts must be Python files (.py). Each file should contain a variable named SQL that holds the SQL statement(s) for the migration.
Migration Order: Migration files must be named with a zero-padded numerical prefix to ensure correct chronological ordering (e.g., 001_..., 002_..., 010_...).
Performance: For this challenge, micro-optimizations are not critical. The focus is on correctness and understandability. A typical migration process involves a few database operations.

Notes

Migration State Storage: For this challenge, you can maintain the state of applied migrations in memory during the script's execution. In a real-world application, this state would be stored persistently in a dedicated "migrations" table within your database itself. This allows the migration system to know its state even after the application restarts.
Rollback: Implementing rollback functionality (down migrations) is beyond the scope of this initial challenge, but it's a crucial part of a robust migration system.
Abstraction: Consider how you might abstract the database interaction. Instead of directly using sqlite3, you might use an ORM like SQLAlchemy, which can then handle different database backends more seamlessly.
Dynamic Importing: You'll need to figure out how to load and execute Python code from the migration files. Be mindful of security implications if you were to generalize this for untrusted sources.