Python Code Static Analyzer
This challenge asks you to build a basic static analyzer for Python code. Static analysis is a crucial technique for improving code quality, detecting potential bugs, and enforcing coding standards without actually running the code. Your goal is to identify common, easily detectable issues in Python source files.
Problem Description
You need to create a Python script that takes a Python file path as input and performs static analysis to identify specific types of code defects. The analyzer should report findings in a structured format.
Key Requirements:
- Input: The script should accept a single command-line argument: the path to a Python source file (
.py). - Analysis Focus: The analyzer should detect the following:
- Unused Imports: Identify imported modules that are not used anywhere in the code.
- Unused Local Variables: Identify local variables defined within functions or methods that are never read.
- TODO Comments: Find comments that start with
# TODO:(case-insensitive). - Hardcoded Secrets: Detect common patterns that might indicate hardcoded sensitive information (e.g., passwords, API keys) like strings containing "password", "api_key", "secret" (case-insensitive) that are not part of a variable name.
- Output: The analyzer should output a list of findings. Each finding should include:
- The type of issue found.
- The line number where the issue occurs.
- A brief description or the relevant line of code.
- The output should be printed to standard output in a human-readable format.
- Error Handling: Gracefully handle cases where the input file does not exist or is not a valid Python file.
Expected Behavior:
The script should parse the provided Python file, apply the analysis rules, and print a summary of all detected issues.
Edge Cases to Consider:
- Empty Python files.
- Python files with only comments.
- Imports within functions/classes (these are still considered imports).
- Variables shadowed by imports (e.g.,
import osthenos = 5). Your analysis should consider the actual usage. - Case-insensitivity for TODO comments and hardcoded secrets.
- Distinguishing between a variable name containing "password" and a string literal containing "password".
Examples
Example 1:
Input File (sample_code_1.py):
import os
import sys
from collections import defaultdict
def process_data(data):
result = 0
unused_var = 10
print(data)
# TODO: Refactor this section
return result
if __name__ == "__main__":
my_data = "important_info"
api_key = "sk_test_12345" # Hardcoded API key
password = "complex_password_123" # Hardcoded password
os.path.join("dir1", "dir2")
sys.exit(0)
Output:
Findings:
- Unused Import: line 3, module: sys
- Unused Import: line 4, module: defaultdict
- Unused Local Variable: line 7, variable: unused_var
- TODO Comment: line 9, comment: # TODO: Refactor this section
- Hardcoded Secret: line 13, line: api_key = "sk_test_12345" # Hardcoded API key
- Hardcoded Secret: line 14, line: password = "complex_password_123" # Hardcoded password
Explanation:
sysanddefaultdictare imported but not used.unused_varis defined but never read withinprocess_data.- A comment starting with
# TODO:is found. - Strings assigned to
api_keyandpasswordare identified as potential hardcoded secrets.
Example 2:
Input File (sample_code_2.py):
import json
def calculate_sum(a, b):
total = a + b
# This is a regular comment
return total
if __name__ == "__main__":
value1 = 10
value2 = 20
print(calculate_sum(value1, value2))
Output:
Findings:
- No issues found.
Explanation:
All imports (json) and local variables are used. There are no TODO comments or obvious hardcoded secrets.
Example 3: Edge Case (Empty File)
Input File (empty_file.py):
Output:
Findings:
- No issues found.
Explanation:
An empty file contains no code, thus no issues can be detected.
Constraints
- The analysis should be performed on Python 3.7+ compatible code.
- The input file will be a standard
.pyfile. - The script should be reasonably performant for files up to 1000 lines.
- The analysis should not execute any code; it must be purely static.
Notes
- You will need to use Python's built-in
astmodule to parse the Python code into an Abstract Syntax Tree (AST). This will allow you to programmatically inspect the code's structure. - For identifying unused variables, you'll need to track variable definitions and usages within scopes (functions, methods, global scope).
- Regular expressions can be helpful for quickly scanning comments and string literals for specific patterns.
- Be mindful of how you distinguish between variable names containing keywords (like
my_api_key) and actual string literals that might be secrets. The challenge asks for strings that look like secrets.