Building a Simple Configuration DSL Parser in Python
This challenge focuses on creating a Domain-Specific Language (DSL) parser for a custom configuration format. You will implement a Python program that can read and interpret a simplified configuration language, transforming it into a structured data representation. This is a fundamental skill for many software development tasks, including building configuration systems, scripting languages, and data processing tools.
Problem Description
Your task is to build a Python parser for a simple configuration DSL. This DSL is designed to define settings with key-value pairs, nested sections, and basic list structures. The parser should take a string representing the DSL code as input and produce a Python dictionary representing the parsed configuration.
Key Requirements:
- Section Handling: The DSL supports nested sections, denoted by square brackets
[]. - Key-Value Pairs: Within sections or at the top level, settings are defined as
key = value. Values can be strings, integers, booleans (true,false), or lists. - List Support: Lists are defined using parentheses
()with comma-separated elements. Elements within a list can be of any supported type. - Comments: Lines starting with
#should be ignored. - Whitespace: Leading/trailing whitespace around keys, values, and section names should be ignored.
Expected Behavior:
The parser should correctly interpret the DSL syntax and convert it into a nested Python dictionary.
- Strings should be represented as Python strings.
- Integers should be represented as Python integers.
- Booleans
trueandfalseshould be represented as Python booleansTrueandFalse. - Lists should be represented as Python lists.
- Nested sections should translate to nested dictionaries.
Edge Cases to Consider:
- Empty input string.
- Input containing only comments or whitespace.
- Values containing spaces (should be treated as part of the string).
- Empty sections.
- Empty lists.
- Values that look like numbers but should be treated as strings (e.g.,
version = "1.0").
Examples
Example 1:
Input:
# This is a sample configuration
database {
host = localhost
port = 5432
enabled = true
}
logging {
level = info
file = app.log
rotations = ( 7, 30, 365 ) # Daily, weekly, yearly
}
Output:
{
"database": {
"host": "localhost",
"port": 5432,
"enabled": True
},
"logging": {
"level": "info",
"file": "app.log",
"rotations": [7, 30, 365]
}
}
Explanation: The input defines two top-level sections, database and logging. database contains simple key-value pairs. logging contains a list rotations with integer elements. Comments and whitespace are ignored.
Example 2:
Input:
api_key = "your_secret_key_here"
timeout_seconds = 30
feature_flags = ( enabled, beta_test, new_ui )
nested {
setting1 = value1
sub_nested [
option_a = 100
option_b = false
]
}
Output:
{
"api_key": "your_secret_key_here",
"timeout_seconds": 30,
"feature_flags": ["enabled", "beta_test", "new_ui"],
"nested": {
"setting1": "value1",
"sub_nested": {
"option_a": 100,
"option_b": False
}
}
}
Explanation: This example demonstrates string values, an integer, a list of strings, and nested sections with different value types, including a boolean.
Example 3:
Input:
# Empty configuration
Output:
{}
Explanation: An empty input or input containing only comments should result in an empty dictionary.
Constraints
- The input DSL string will not exceed 10,000 characters.
- Section names and keys will be alphanumeric strings (a-z, A-Z, 0-9, and underscore
_). - Values can be strings (enclosed in double quotes
"), integers, booleans (true,false), or lists. - Strings may contain spaces but not escaped quotes.
- The parser should be reasonably efficient, capable of parsing typical configuration files within a few milliseconds.
Notes
- You can approach this problem using regular expressions for simpler parsing or by implementing a more robust lexer/parser combination (e.g., using libraries or custom logic).
- Consider how to handle unquoted string values that might be mistaken for booleans or numbers. For this DSL, treat anything not explicitly quoted, a number, or a boolean keyword as a string.
- A good starting point is to iterate through lines, clean them up, and then parse each line's content.
- For lists and nested structures, you'll need to manage a parsing state or use recursive parsing.