Python Code Completion Engine
This challenge asks you to build a simplified code completion engine. Given a partial line of Python code and a history of previously executed code, your engine should suggest the most likely next token(s). This is a fundamental component of modern IDEs and code editors, significantly improving developer productivity.
Problem Description
You need to implement a Python function, suggest_completions(current_line, code_history), that takes a partially written line of Python code and a list of previously executed code lines. The function should return a list of suggested tokens (words, operators, punctuation) that are most likely to follow the current_line based on the code_history.
Key Requirements:
- Tokenization: You'll need to tokenize both the
current_lineand thecode_historyinto meaningful units (e.g., keywords, identifiers, operators, literals, punctuation). - Contextual Suggestions: Suggestions should be based on the patterns observed in the
code_history. This means understanding what typically follows certain tokens or sequences of tokens. - Ranking: The suggestions should be ranked by likelihood. The most probable suggestions should appear first.
- Basic Scope: For this challenge, we'll focus on suggesting common keywords, built-in functions, and simple identifier completions (e.g., if a variable
my_listwas defined, suggestmy_list.).
Expected Behavior:
- If the
current_lineis empty, suggest common starting tokens like keywords (def,class,import,if,for) or common built-ins. - If the
current_lineends with a dot (.), suggest attributes or methods that have been seen in thecode_historyfor the object preceding the dot (this will be a simplified heuristic). - If the
current_lineends with a partially typed word, suggest words from thecode_historythat start with that prefix.
Edge Cases to Consider:
- Empty
code_history. current_lineending with whitespace.current_lineending with special characters other than a dot.- Case sensitivity in suggestions.
Examples
Example 1:
Input:
current_line = "im"
code_history = [
"import os",
"import sys",
"my_var = 10",
"print(my_var)"
]
Output:
['import', 'int']
Explanation:
The `current_line` "im" is a prefix for "import". "int" is a common keyword that might follow an incomplete word and is considered a generally frequent token.
Example 2:
Input:
current_line = "my_list."
code_history = [
"my_list = [1, 2, 3]",
"another_list = [4, 5]",
"my_list.append(4)"
]
Output:
['append', 'sort']
Explanation:
The `current_line` ends with a dot, indicating a potential method call on `my_list`. Based on `code_history`, `append` is a known method of list-like objects. `sort` is another common list method, making it a plausible suggestion.
Example 3:
Input:
current_line = ""
code_history = [
"def my_function():",
" pass",
"class MyClass:",
" pass"
]
Output:
['def', 'class', 'import', 'if', 'for', 'while', 'print']
Explanation:
When the `current_line` is empty, the engine suggests common starting points for Python code, prioritizing keywords and frequently used built-ins observed in the history.
Example 4:
Input:
current_line = "my_v"
code_history = [
"my_var = 10",
"my_variable = 'hello'"
]
Output:
['my_var', 'my_variable']
Explanation:
The `current_line` "my_v" is a prefix. The engine finds identifiers in the `code_history` that start with this prefix and suggests them.
Constraints
- The
code_historywill be a list of strings, where each string represents a complete line of Python code. - The
current_linewill be a single string representing the partial line being edited. - The total number of lines in
code_historywill not exceed 1000. - The length of
current_linewill not exceed 200 characters. - The output list of suggestions should not exceed 10 items.
- Suggestions should be returned in lowercase.
Notes
- You'll need a way to tokenize Python code. Consider using Python's built-in
tokenizemodule or a simpler custom approach for basic tokenization. - A frequency-based approach or n-gram model (even a simple bigram/trigram) can be a good starting point for ranking suggestions.
- For this challenge, you don't need to parse the full Abstract Syntax Tree (AST) or handle complex Python syntax like decorators, lambdas, or complex imports. Focus on common patterns.
- When suggesting attributes after a dot, you can infer common types (like list, string, int) and their associated methods based on simple observations in the
code_history. For example, if you seemy_list.append(...), you can inferappendis a list method. - Think about how to handle both keyword suggestions and identifier suggestions.