Hone logo
Hone
Problems

Type-Level String Parsers in TypeScript

This challenge involves creating a system of parser combinators that operate entirely at the TypeScript type level. The goal is to define parsers as types that can transform a literal string type into another type representing the parsed result or an error. This is a fascinating exercise in leveraging TypeScript's advanced type system for compile-time computation.

Problem Description

You need to implement a set of type-level parser combinators in TypeScript. These combinators will take literal string types as input and produce new types that represent either a successful parse with a value and remaining input, or a failure.

Key Requirements:

  1. Basic Parsers:
    • Literal<S extends string>: A parser that matches a specific literal string S. If the input string starts with S, it succeeds, returning S as the value and the rest of the string. Otherwise, it fails.
    • AnyChar: A parser that matches any single character. It succeeds if the input string is not empty, returning the first character as the value and the rest of the string.
  2. Combinators:
    • Sequence<P1, P2, ...>: A parser that applies parsers P1, P2, ... sequentially. It succeeds only if all parsers succeed. The result should be a tuple of the values returned by each individual parser. If any parser fails, the entire sequence fails.
    • Choice<P1, P2, ...>: A parser that tries parsers P1, P2, ... in order. It succeeds with the result of the first parser that successfully matches the input. If all parsers fail, the choice fails.
    • Many<P>: A parser that applies parser P zero or more times. It collects all successful results from P into a tuple. It always succeeds (even if P matches zero times).
    • Optional<P>: A parser that applies parser P at most once. If P succeeds, it returns its result. If P fails, it succeeds with undefined (or a designated "nothing" type) and the original input.
  3. Result Type: Each parser should have a return type that indicates success or failure. A good pattern would be:
    • Success<Value, RemainingInput extends string>: Represents a successful parse.
    • Failure<ErrorMessage extends string>: Represents a failed parse.
    • You can use a discriminated union or a specific structure to represent this. For simplicity, we can define a common "ParserResult" type.

Expected Behavior:

Parsers should correctly process literal string types. When a parser is applied to an input string type, it should produce a result type that reflects the outcome.

Important Edge Cases:

  • Empty input strings.
  • Parsers failing at the beginning of the input.
  • Parsers failing mid-input.
  • Combinators handling empty sequences or choices.
  • Many matching zero times.

Examples

Example 1: Literal Parser

type Input = "hello world";
type HelloParser = Literal<"hello">;
type Result = HelloParser extends Parser<Input, infer R> ? R : never;

// Expected: Result = Success<"hello", " world">

Explanation: The Literal<"hello"> parser is applied to "hello world". It successfully matches "hello" at the beginning, returning "hello" as the parsed value and " world" as the remaining input.

Example 2: Sequence Parser

type Input = "abcde";
type ParserA = Literal<"ab">;
type ParserB = Literal<"cd">;
type SeqParser = Sequence<ParserA, ParserB>;
type Result = SeqParser extends Parser<Input, infer R> ? R : never;

// Expected: Result = Success<["ab", "cd"], "e">

Explanation: The Sequence parser applies Literal<"ab"> first, which succeeds with "ab" and leaves "cde". Then, it applies Literal<"cd"> to "cde", which succeeds with "cd" and leaves "e". The final result is a tuple ["ab", "cd"] and the remaining input "e".

Example 3: Choice Parser

type Input = "apple";
type ParserApple = Literal<"apple">;
type ParserBanana = Literal<"banana">;
type ChoiceParser = Choice<ParserApple, ParserBanana>;
type Result = ChoiceParser extends Parser<Input, infer R> ? R : never;

// Expected: Result = Success<"apple", "">

Explanation: The Choice parser first tries Literal<"apple"> on "apple". It succeeds, returning "apple" and an empty string. The Choice parser short-circuits and doesn't try Literal<"banana">.

Example 4: Many Parser

type Input = "ababab";
type ParserA = Literal<"a">;
type ParserB = Literal<"b">;
type SeqAB = Sequence<ParserA, ParserB>;
type ManyAB = Many<SeqAB>;
type Result = ManyAB extends Parser<Input, infer R> ? R : never;

// Expected: Result = Success<[["a", "b"], ["a", "b"], ["a", "b"]], "">

Explanation: Many<SeqAB> repeatedly applies SeqAB to the input.

  1. SeqAB on "ababab" -> Success<["a", "b"], "abab">
  2. SeqAB on "abab" -> Success<["a", "b"], "ab">
  3. SeqAB on "ab" -> Success<["a", "b"], "">
  4. SeqAB on "" -> Failure<...> (or similar, as it expects "a") The Many combinator collects the successful results: [["a", "b"], ["a", "b"], ["a", "b"]], with an empty string remaining.

Example 5: Failure Case with Choice

type Input = "banana";
type ParserApple = Literal<"apple">;
type ParserBanana = Literal<"banana">;
type ChoiceParser = Choice<ParserApple, ParserBanana>;
type Result = ChoiceParser extends Parser<Input, infer R> ? R : never;

// Expected: Result = Success<"banana", "">

Example 6: Failure Case with Sequence

type Input = "abcd";
type ParserA = Literal<"ab">;
type ParserC = Literal<"de">; // This will fail
type SeqParser = Sequence<ParserA, ParserC>;
type Result = SeqParser extends Parser<Input, infer R> ? R : never;

// Expected: Result = Failure<"Expected 'de' but found 'cd'"> (or similar error message)

Example 7: Optional Parser

type Input1 = "abc";
type Input2 = "bc";
type ParserA = Literal<"a">;
type OptionalA = Optional<ParserA>;

type Result1 = OptionalA extends Parser<Input1, infer R> ? R : never;
// Expected: Result1 = Success<"a", "bc">

type Result2 = OptionalA extends Parser<Input2, infer R> ? R : never;
// Expected: Result2 = Success<undefined, "bc"> (or a specific "Nothing" type)

Constraints

  • All parser logic must be implemented using TypeScript's type-level features (type aliases, conditional types, infer, template literal types).
  • No runtime JavaScript code is allowed for the parser definitions or execution. The "execution" is effectively the compiler checking the type.
  • The input strings to parsers will always be string literals.
  • The output values can be string literals, tuples of string literals, or undefined (or a custom "Nothing" type for Optional).
  • Error messages can be descriptive string literals.

Notes

  • This challenge heavily relies on manipulating string literal types and using template literal types for substring operations.
  • Consider how to represent the "remaining input" after a successful parse.
  • Think about how to signal failure effectively. A dedicated Failure type is recommended.
  • The Sequence combinator will likely need variadic tuple types (...P) if you want it to handle an arbitrary number of parsers.
  • Many and Optional might require recursive type definitions.
  • For Sequence, you'll need a way to handle the return types of individual parsers to build the final tuple.
  • For Choice, the order of parsers matters.
  • When implementing Literal, you'll need to check if the input string startsWith the literal. This can be achieved by checking if the input string is the literal followed by some characters, or if it exactly matches the literal.
Loading editor...
typescript