Hone logo
Hone
Problems

Optimizing Database Queries with Indexes in Python

Database indexes are crucial for improving query performance, especially in large datasets. This challenge tasks you with creating a Python function that simulates the creation of indexes on a simplified database table represented as a list of dictionaries. Understanding how to create and utilize indexes is fundamental for efficient data retrieval.

Problem Description

You are given a list of dictionaries representing a database table. Each dictionary represents a row in the table, and the keys represent column names. Your task is to implement a function create_index that takes the table (list of dictionaries) and a column name as input and returns a dictionary representing the index. The index should map the unique values of the specified column to a list of row indices where those values appear.

Key Requirements:

  • The function must handle duplicate values in the specified column. The index should store all row indices associated with a given value.
  • The function should be efficient, especially for large tables.
  • The function should return an empty index if the specified column does not exist in any of the rows.

Expected Behavior:

The create_index function should return a dictionary where:

  • Keys are the unique values found in the specified column.
  • Values are lists of integers representing the indices of the rows in the original table where the corresponding key value is found.

Edge Cases to Consider:

  • Empty table: The function should return an empty dictionary.
  • Column not present: The function should return an empty dictionary.
  • Column with all identical values: The index should map the value to a list containing all row indices.
  • Large table: The function should perform reasonably well (avoiding excessive memory usage or slow execution).

Examples

Example 1:

Input: table = [{'id': 1, 'name': 'Alice', 'city': 'New York'}, {'id': 2, 'name': 'Bob', 'city': 'London'}, {'id': 3, 'name': 'Charlie', 'city': 'New York'}]
column_name = 'city'
Output: {'New York': [0, 2], 'London': [1]}
Explanation: The index maps 'New York' to rows 0 and 2, and 'London' to row 1.

Example 2:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 2], 'Bob': [1]}
Explanation: The index maps 'Alice' to rows 0 and 2, and 'Bob' to row 1.

Example 3:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Alice'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 1, 2]}
Explanation: All rows have the same value for 'name', so the index maps 'Alice' to all row indices.

Example 4:

Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
column_name = 'age'
Output: {}
Explanation: The column 'age' does not exist in the table, so an empty dictionary is returned.

Constraints

  • The table will be a list of dictionaries.
  • Each dictionary in the table will have string keys.
  • The column name will be a string.
  • The table size can be up to 10,000 rows.
  • The number of unique values in the specified column will be up to 1,000.
  • The function should execute in O(n) time, where n is the number of rows in the table.

Notes

Consider using a dictionary to store the index. Iterate through the table once to build the index. Think about how to efficiently handle duplicate values. The goal is to create a functional index, not to interact with a real database system. Focus on the logic of creating the index structure.

Loading editor...
python