Optimizing Database Queries with Indexes in Python

Database indexes are crucial for improving query performance, especially in large datasets. This challenge tasks you with creating a Python function that simulates the creation of indexes on a simplified database table represented as a list of dictionaries. Understanding how to create and utilize indexes is fundamental for efficient data retrieval.

Problem Description

You are given a list of dictionaries representing a database table. Each dictionary represents a row in the table, and the keys represent column names. Your task is to implement a function create_index that takes the table (list of dictionaries) and a column name as input and returns a dictionary representing the index. The index should map the unique values of the specified column to a list of row indices where those values appear.

Key Requirements:

The function must handle duplicate values in the specified column. The index should store all row indices associated with a given value.
The function should be efficient, especially for large tables.
The function should return an empty index if the specified column does not exist in any of the rows.

Expected Behavior:

The create_index function should return a dictionary where:

Keys are the unique values found in the specified column.
Values are lists of integers representing the indices of the rows in the original table where the corresponding key value is found.

Edge Cases to Consider:

Empty table: The function should return an empty dictionary.
Column not present: The function should return an empty dictionary.
Column with all identical values: The index should map the value to a list containing all row indices.
Large table: The function should perform reasonably well (avoiding excessive memory usage or slow execution).

Examples

Example 1:

Input: table = [{'id': 1, 'name': 'Alice', 'city': 'New York'}, {'id': 2, 'name': 'Bob', 'city': 'London'}, {'id': 3, 'name': 'Charlie', 'city': 'New York'}]
column_name = 'city'
Output: {'New York': [0, 2], 'London': [1]}
Explanation: The index maps 'New York' to rows 0 and 2, and 'London' to row 1.

Example 2:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 2], 'Bob': [1]}
Explanation: The index maps 'Alice' to rows 0 and 2, and 'Bob' to row 1.

Example 3:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Alice'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 1, 2]}
Explanation: All rows have the same value for 'name', so the index maps 'Alice' to all row indices.

Example 4:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
column_name = 'age'
Output: {}
Explanation: The column 'age' does not exist in the table, so an empty dictionary is returned.

Constraints

The table will be a list of dictionaries.
Each dictionary in the table will have string keys.
The column name will be a string.
The table size can be up to 10,000 rows.
The number of unique values in the specified column will be up to 1,000.
The function should execute in O(n) time, where n is the number of rows in the table.

Notes

Consider using a dictionary to store the index. Iterate through the table once to build the index. Think about how to efficiently handle duplicate values. The goal is to create a functional index, not to interact with a real database system. Focus on the logic of creating the index structure.