Rust TOML Parser Challenge
The TOML (Tom's Obvious, Minimal Language) format is a popular configuration file format due to its human-readable nature. This challenge asks you to implement a parser for a simplified subset of TOML in Rust. Building a TOML parser is a great way to practice string manipulation, data structure design, and error handling in Rust.
Problem Description
Your task is to create a Rust library that can parse a string containing a simplified TOML configuration into a structured Rust representation. The parser should handle basic data types, key-value pairs, nested tables (sections), and arrays.
Key Requirements:
- Data Types: Support for strings, integers, floats, booleans, and arrays.
- Key-Value Pairs: Parse simple key-value assignments. Keys can be alphanumeric strings.
- Tables (Sections): Support nested tables using dot notation (e.g.,
[section.subsection]). - Arrays: Support arrays of primitive types and other arrays.
- Error Handling: The parser should return informative errors for invalid TOML syntax.
- Rust Representation: The parsed TOML should be represented using Rust
enums andstructs that you define.
Expected Behavior:
The parser will take a &str as input and return a Result containing either a structured representation of the TOML data or a custom error type.
Edge Cases to Consider:
- Empty input string.
- Comments (lines starting with
#). - Whitespace handling (leading/trailing whitespace around keys, values, and table headers).
- Empty tables.
- Arrays with mixed types (though for this challenge, assume arrays contain homogeneous types).
- Duplicate keys within the same table (how should this be handled? For simplicity, the last occurrence wins).
- Invalid syntax (e.g., missing values, misplaced brackets).
Examples
Example 1:
Input:
name = "Rust Parser"
version = 1.0
enabled = true
[owner]
name = "Hone"
dob = 1979-05-27T07:32:00Z # First class dates are not supported in this challenge
// Conceptual Rust representation (e.g., using nested HashMaps or custom structs)
{
"name": "Rust Parser",
"version": 1.0,
"enabled": true,
"owner": {
"name": "Hone",
"dob": "1979-05-27T07:32:00Z" // Represent as string for this challenge
}
}
Explanation:
This TOML defines top-level key-value pairs and a nested table [owner] with its own key-value pairs.
Example 2:
Input:
[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true
[database.connection]
type = "postgresql"
// Conceptual Rust representation
{
"database": {
"server": "192.168.1.1",
"ports": [8001, 8001, 8002],
"connection_max": 5000,
"enabled": true,
"connection": {
"type": "postgresql"
}
}
}
Explanation:
This example shows a nested table [database.connection] and an array of integers ports.
Example 3: (Error Case)
Input:
name = "Invalid TOML"
version = 1.0
[[invalid_array] # Incorrect syntax for array of tables
Output:
// A descriptive error indicating the syntax error, e.g.:
Err(ParseError::InvalidSyntax { line: 3, message: "Expected ']]' or identifier, found '['" })
Explanation: The third line has incorrect syntax for defining an array of tables, which is not supported by this simplified parser. The parser should report this error.
Constraints
- The input TOML string will not exceed 1000 lines.
- Keys will consist of alphanumeric characters and periods for nesting.
- Values will be strings (enclosed in double quotes), integers, floats, booleans (
true/false), or arrays of these types. - The parser should aim to be reasonably efficient, completing parsing within a few milliseconds for typical inputs.
- Do not use external TOML parsing crates (e.g.,
toml,serde_toml). You must implement the core parsing logic yourself.
Notes
- Consider defining an
enumto represent the different TOML value types (String, Integer, Float, Boolean, Array, Table). - A
HashMap<String, TomlValue>could be a good starting point for representing tables. - Think about how to manage the current table context as you parse the file.
- Error messages should be as helpful as possible, indicating the line number and a brief description of the problem.
- For this challenge, you do not need to support dates, times, datetimes, floats with exponents, or multi-line strings. Treat all dates/times as regular strings if they appear.