Hone logo
Hone
Problems

Python Code Completion Engine

This challenge asks you to build a simplified code completion engine. Given a partial line of Python code and a history of previously executed code, your engine should suggest the most likely next token(s). This is a fundamental component of modern IDEs and code editors, significantly improving developer productivity.

Problem Description

You need to implement a Python function, suggest_completions(current_line, code_history), that takes a partially written line of Python code and a list of previously executed code lines. The function should return a list of suggested tokens (words, operators, punctuation) that are most likely to follow the current_line based on the code_history.

Key Requirements:

  1. Tokenization: You'll need to tokenize both the current_line and the code_history into meaningful units (e.g., keywords, identifiers, operators, literals, punctuation).
  2. Contextual Suggestions: Suggestions should be based on the patterns observed in the code_history. This means understanding what typically follows certain tokens or sequences of tokens.
  3. Ranking: The suggestions should be ranked by likelihood. The most probable suggestions should appear first.
  4. Basic Scope: For this challenge, we'll focus on suggesting common keywords, built-in functions, and simple identifier completions (e.g., if a variable my_list was defined, suggest my_list.).

Expected Behavior:

  • If the current_line is empty, suggest common starting tokens like keywords (def, class, import, if, for) or common built-ins.
  • If the current_line ends with a dot (.), suggest attributes or methods that have been seen in the code_history for the object preceding the dot (this will be a simplified heuristic).
  • If the current_line ends with a partially typed word, suggest words from the code_history that start with that prefix.

Edge Cases to Consider:

  • Empty code_history.
  • current_line ending with whitespace.
  • current_line ending with special characters other than a dot.
  • Case sensitivity in suggestions.

Examples

Example 1:

Input:
current_line = "im"
code_history = [
    "import os",
    "import sys",
    "my_var = 10",
    "print(my_var)"
]

Output:
['import', 'int']

Explanation:
The `current_line` "im" is a prefix for "import". "int" is a common keyword that might follow an incomplete word and is considered a generally frequent token.

Example 2:

Input:
current_line = "my_list."
code_history = [
    "my_list = [1, 2, 3]",
    "another_list = [4, 5]",
    "my_list.append(4)"
]

Output:
['append', 'sort']

Explanation:
The `current_line` ends with a dot, indicating a potential method call on `my_list`. Based on `code_history`, `append` is a known method of list-like objects. `sort` is another common list method, making it a plausible suggestion.

Example 3:

Input:
current_line = ""
code_history = [
    "def my_function():",
    "    pass",
    "class MyClass:",
    "    pass"
]

Output:
['def', 'class', 'import', 'if', 'for', 'while', 'print']

Explanation:
When the `current_line` is empty, the engine suggests common starting points for Python code, prioritizing keywords and frequently used built-ins observed in the history.

Example 4:

Input:
current_line = "my_v"
code_history = [
    "my_var = 10",
    "my_variable = 'hello'"
]

Output:
['my_var', 'my_variable']

Explanation:
The `current_line` "my_v" is a prefix. The engine finds identifiers in the `code_history` that start with this prefix and suggests them.

Constraints

  • The code_history will be a list of strings, where each string represents a complete line of Python code.
  • The current_line will be a single string representing the partial line being edited.
  • The total number of lines in code_history will not exceed 1000.
  • The length of current_line will not exceed 200 characters.
  • The output list of suggestions should not exceed 10 items.
  • Suggestions should be returned in lowercase.

Notes

  • You'll need a way to tokenize Python code. Consider using Python's built-in tokenize module or a simpler custom approach for basic tokenization.
  • A frequency-based approach or n-gram model (even a simple bigram/trigram) can be a good starting point for ranking suggestions.
  • For this challenge, you don't need to parse the full Abstract Syntax Tree (AST) or handle complex Python syntax like decorators, lambdas, or complex imports. Focus on common patterns.
  • When suggesting attributes after a dot, you can infer common types (like list, string, int) and their associated methods based on simple observations in the code_history. For example, if you see my_list.append(...), you can infer append is a list method.
  • Think about how to handle both keyword suggestions and identifier suggestions.
Loading editor...
python