Hone logo
Hone
Problems

Filtering Records with a WHERE Clause in SQL

This challenge focuses on implementing a SQL WHERE clause to filter records based on specified conditions. Filtering data is a fundamental operation in database management, allowing you to retrieve only the relevant information from a large dataset. This is crucial for efficient data analysis, reporting, and application functionality.

Problem Description

You are tasked with writing pseudocode that simulates the functionality of a SQL WHERE clause. Given a dataset represented as a list of records (each record being a dictionary), and a filtering condition expressed as a string, your pseudocode should return a new list containing only the records that satisfy the condition. The condition string will follow a simple format: "column_name operator value".

  • What needs to be achieved: Implement pseudocode that filters a list of records based on a given condition string.
  • Key requirements:
    • The pseudocode must correctly parse the condition string.
    • It must evaluate the condition for each record in the dataset.
    • It must return a new list containing only the records that satisfy the condition.
  • Expected behavior: The pseudocode should handle different operators (=, >, <, >=, <=, !=) and data types (strings, integers).
  • Edge cases to consider:
    • Invalid condition string format (e.g., missing operator, invalid column name). Return an empty list in this case.
    • Column not present in the record. Return an empty list in this case.
    • Data type mismatch between the column value and the value in the condition string. Return an empty list in this case.
    • Empty dataset. Return an empty list.

Examples

Example 1:

Input:
dataset = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "London"},
    {"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "age > 25"
Output:
[
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Charlie", "age": 35, "city": "Paris"}
]
Explanation: The condition "age > 25" filters the dataset, keeping only records where the "age" is greater than 25.

Example 2:

Input:
dataset = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "London"},
    {"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "city = London"
Output:
[
    {"name": "Bob", "age": 25, "city": "London"}
]
Explanation: The condition "city = London" filters the dataset, keeping only the record where the "city" is equal to "London".

Example 3:

Input:
dataset = [
    {"name": "Alice", "age": 30, "city": "New York"},
    {"name": "Bob", "age": 25, "city": "London"},
    {"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "invalid_column > 25"
Output:
[]
Explanation: The condition refers to a non-existent column, so the dataset is not filtered and an empty list is returned.

Constraints

  • The dataset will contain a maximum of 100 records.
  • Each record will contain a maximum of 5 key-value pairs.
  • Column names will be strings of length up to 20 characters.
  • Values can be integers or strings.
  • The condition string will be a valid string.
  • The operators supported are =, >, <, >=, <=, !=.
  • Performance: The pseudocode should complete within 1 second for the given constraints.

Notes

  • Focus on the logic of filtering, not on specific SQL syntax.
  • Consider how to handle different data types when comparing values.
  • Error handling is important – gracefully handle invalid conditions and missing columns.
  • The pseudocode should be clear, concise, and easy to understand. Think about breaking down the problem into smaller, manageable steps.
  • Assume the values in the dataset are consistently typed (e.g., if a column is an integer, all values in that column will be integers).
Loading editor...
plaintext