Hone logo
Hone
Problems

Python Syntax Highlighter

This challenge asks you to build a basic syntax highlighter for Python code. Syntax highlighting is a crucial feature in code editors and IDEs, making code more readable and easier to debug by visually distinguishing different language elements like keywords, strings, comments, and numbers.

Problem Description

Your task is to create a Python function that takes a string containing Python code as input and returns a new string where different syntactic elements are marked with distinct ANSI escape codes for coloring. You will need to identify and highlight common Python syntax elements.

Key Requirements:

  • Identify and Highlight: The highlighter should recognize and apply distinct "colors" (represented by ANSI escape codes) to the following Python elements:
    • Keywords (e.g., if, for, while, def, class, import)
    • Strings (single-quoted, double-quoted, triple-quoted)
    • Numbers (integers, floats, scientific notation)
    • Comments (single-line starting with #)
    • Built-in functions/types (e.g., print, len, str, int)
    • Operators (e.g., +, -, *, /, =, ==, !=)
    • Punctuation (e.g., (, ), {, }, [, ], ,, ., :)
  • ANSI Escape Codes: Use standard ANSI escape codes to represent colors. For example, \033[94m for blue, \033[92m for green, \033[0m to reset.
  • Preserve Original Formatting: The output should retain the original code's indentation, newlines, and whitespace.
  • Handle Nested Structures: While a full-blown parser is not required, the highlighter should make a reasonable attempt at handling simple nested structures within strings or comments.

Expected Behavior: The function should process the input code line by line, or token by token, and wrap identified elements with appropriate ANSI escape codes. Elements not explicitly identified should retain their default terminal color.

Edge Cases to Consider:

  • Empty input string.
  • Code with only whitespace.
  • Strings spanning multiple lines (triple-quoted strings).
  • Escape sequences within strings (e.g., \n, \").
  • Numbers with different bases (binary, octal, hexadecimal - though basic decimal is sufficient for a start).
  • Inline comments that follow code on the same line.

Examples

Example 1:

Input:
def greet(name):
    print(f"Hello, {name}!")

# This is a comment
message = "World"
number = 123
pi = 3.14

Output:

\033[94mdef\033[0m \033[93mgreet\033[0m(\033[93mname\033[0m):\n    \033[95mprint\033[0m(\033[32m"Hello, {name}!"\033[0m)\n\n\033[90m# This is a comment\033[0m\n\033[93mmessage\033[0m = \033[32m"World"\033[0m\n\033[93mnumber\033[0m = \033[96m123\033[0m\n\033[93mpi\033[0m = \033[96m3.14\033[0m

Explanation: The keyword def is blue. The function name greet and parameter name are yellow. The built-in print is magenta. The string "Hello, {name}!" is green. The comment # This is a comment is dark gray. Variables message, name, number, pi are yellow, and numbers 123, 3.14 are cyan.

Example 2:

Input:
class MyClass:
    '''A simple class'''
    def __init__(self, value):
        self.value = value

result = 5 * (10 + 2)

Output:

\033[94mclass\033[0m \033[93mMyClass\033[0m:\n    \033[32m'''A simple class'''\033[0m\n    \033[94mdef\033[0m \033[93m__init__\033[0m(\033[93mself\033[0m, \033[93mvalue\033[0m):\n        \033[93mself.value\033[0m = \033[93mvalue\033[0m\n\n\033[93mresult\033[0m = \033[96m5\033[0m * (\033[96m10\033[0m + \033[96m2\033[0m)

Explanation: class and def are blue. Class name MyClass and method name __init__ and parameters self, value are yellow. Triple-quoted string is green. Numbers 5, 10, 2 are cyan. Operators like *, +, =, ( ) are not explicitly colored in this simplified output but should be handled by the logic.

Constraints

  • The input will be a single string representing valid or potentially invalid Python code.
  • The maximum length of the input string will not exceed 10,000 characters.
  • The solution should aim for reasonable performance, ideally processing the code in a time complexity proportional to the number of characters (O(N)).

Notes

  • You don't need to implement a full Python parser. Regular expressions and simple state management can be effective for this task.
  • Consider using a dictionary or set to store Python keywords and built-in functions for quick lookups.
  • Be mindful of the order in which you check for patterns. For example, check for comments before trying to identify keywords on the same line.
  • The ANSI escape codes provided are examples; you can choose your preferred color scheme. Ensure you include a reset code (\033[0m) after each highlighted element to prevent colors from bleeding.
Loading editor...
python