Python Code Static Analyzer

This challenge asks you to build a basic static analyzer for Python code. Static analysis is a crucial technique for improving code quality, detecting potential bugs, and enforcing coding standards without actually running the code. Your goal is to identify common, easily detectable issues in Python source files.

Problem Description

You need to create a Python script that takes a Python file path as input and performs static analysis to identify specific types of code defects. The analyzer should report findings in a structured format.

Key Requirements:

Input: The script should accept a single command-line argument: the path to a Python source file (.py).
Analysis Focus: The analyzer should detect the following:
- Unused Imports: Identify imported modules that are not used anywhere in the code.
- Unused Local Variables: Identify local variables defined within functions or methods that are never read.
- TODO Comments: Find comments that start with # TODO: (case-insensitive).
- Hardcoded Secrets: Detect common patterns that might indicate hardcoded sensitive information (e.g., passwords, API keys) like strings containing "password", "api_key", "secret" (case-insensitive) that are not part of a variable name.
Output: The analyzer should output a list of findings. Each finding should include:
- The type of issue found.
- The line number where the issue occurs.
- A brief description or the relevant line of code.
- The output should be printed to standard output in a human-readable format.
Error Handling: Gracefully handle cases where the input file does not exist or is not a valid Python file.

Expected Behavior:

The script should parse the provided Python file, apply the analysis rules, and print a summary of all detected issues.

Edge Cases to Consider:

Empty Python files.
Python files with only comments.
Imports within functions/classes (these are still considered imports).
Variables shadowed by imports (e.g., import os then os = 5). Your analysis should consider the actual usage.
Case-insensitivity for TODO comments and hardcoded secrets.
Distinguishing between a variable name containing "password" and a string literal containing "password".

Examples

Example 1:

Input File (sample_code_1.py):

import os
import sys
from collections import defaultdict

def process_data(data):
    result = 0
    unused_var = 10
    print(data)
    # TODO: Refactor this section
    return result

if __name__ == "__main__":
    my_data = "important_info"
    api_key = "sk_test_12345" # Hardcoded API key
    password = "complex_password_123" # Hardcoded password
    os.path.join("dir1", "dir2")
    sys.exit(0)

Output:

Findings:
- Unused Import: line 3, module: sys
- Unused Import: line 4, module: defaultdict
- Unused Local Variable: line 7, variable: unused_var
- TODO Comment: line 9, comment: # TODO: Refactor this section
- Hardcoded Secret: line 13, line: api_key = "sk_test_12345" # Hardcoded API key
- Hardcoded Secret: line 14, line: password = "complex_password_123" # Hardcoded password

Explanation:

sys and defaultdict are imported but not used.
unused_var is defined but never read within process_data.
A comment starting with # TODO: is found.
Strings assigned to api_key and password are identified as potential hardcoded secrets.

Example 2:

Input File (sample_code_2.py):

import json

def calculate_sum(a, b):
    total = a + b
    # This is a regular comment
    return total

if __name__ == "__main__":
    value1 = 10
    value2 = 20
    print(calculate_sum(value1, value2))

Output:

Findings:
- No issues found.

Explanation:

All imports (json) and local variables are used. There are no TODO comments or obvious hardcoded secrets.

Example 3: Edge Case (Empty File)

Input File (empty_file.py):

Output:

Findings:
- No issues found.

Explanation:

An empty file contains no code, thus no issues can be detected.

Constraints

The analysis should be performed on Python 3.7+ compatible code.
The input file will be a standard .py file.
The script should be reasonably performant for files up to 1000 lines.
The analysis should not execute any code; it must be purely static.

Notes

You will need to use Python's built-in ast module to parse the Python code into an Abstract Syntax Tree (AST). This will allow you to programmatically inspect the code's structure.
For identifying unused variables, you'll need to track variable definitions and usages within scopes (functions, methods, global scope).
Regular expressions can be helpful for quickly scanning comments and string literals for specific patterns.
Be mindful of how you distinguish between variable names containing keywords (like my_api_key) and actual string literals that might be secrets. The challenge asks for strings that look like secrets.

Python Code Static Analyzer

Problem Description

Key Requirements:

Input: The script should accept a single command-line argument: the path to a Python source file (.py).
Analysis Focus: The analyzer should detect the following:
- Unused Imports: Identify imported modules that are not used anywhere in the code.
- Unused Local Variables: Identify local variables defined within functions or methods that are never read.
- TODO Comments: Find comments that start with # TODO: (case-insensitive).
- Hardcoded Secrets: Detect common patterns that might indicate hardcoded sensitive information (e.g., passwords, API keys) like strings containing "password", "api_key", "secret" (case-insensitive) that are not part of a variable name.
Output: The analyzer should output a list of findings. Each finding should include:
- The type of issue found.
- The line number where the issue occurs.
- A brief description or the relevant line of code.
- The output should be printed to standard output in a human-readable format.
Error Handling: Gracefully handle cases where the input file does not exist or is not a valid Python file.

Expected Behavior:

The script should parse the provided Python file, apply the analysis rules, and print a summary of all detected issues.

Edge Cases to Consider:

Empty Python files.
Python files with only comments.
Imports within functions/classes (these are still considered imports).
Variables shadowed by imports (e.g., import os then os = 5). Your analysis should consider the actual usage.
Case-insensitivity for TODO comments and hardcoded secrets.
Distinguishing between a variable name containing "password" and a string literal containing "password".

Examples

Example 1:

Input File (sample_code_1.py):

import os
import sys
from collections import defaultdict

def process_data(data):
    result = 0
    unused_var = 10
    print(data)
    # TODO: Refactor this section
    return result

if __name__ == "__main__":
    my_data = "important_info"
    api_key = "sk_test_12345" # Hardcoded API key
    password = "complex_password_123" # Hardcoded password
    os.path.join("dir1", "dir2")
    sys.exit(0)

Output:

Findings:
- Unused Import: line 3, module: sys
- Unused Import: line 4, module: defaultdict
- Unused Local Variable: line 7, variable: unused_var
- TODO Comment: line 9, comment: # TODO: Refactor this section
- Hardcoded Secret: line 13, line: api_key = "sk_test_12345" # Hardcoded API key
- Hardcoded Secret: line 14, line: password = "complex_password_123" # Hardcoded password

Explanation:

sys and defaultdict are imported but not used.
unused_var is defined but never read within process_data.
A comment starting with # TODO: is found.
Strings assigned to api_key and password are identified as potential hardcoded secrets.

Example 2:

Input File (sample_code_2.py):

import json

def calculate_sum(a, b):
    total = a + b
    # This is a regular comment
    return total

if __name__ == "__main__":
    value1 = 10
    value2 = 20
    print(calculate_sum(value1, value2))

Output:

Findings:
- No issues found.

Explanation:

All imports (json) and local variables are used. There are no TODO comments or obvious hardcoded secrets.

Example 3: Edge Case (Empty File)

Input File (empty_file.py):

Output:

Findings:
- No issues found.

Explanation:

An empty file contains no code, thus no issues can be detected.

Constraints

The analysis should be performed on Python 3.7+ compatible code.
The input file will be a standard .py file.
The script should be reasonably performant for files up to 1000 lines.
The analysis should not execute any code; it must be purely static.

Notes

You will need to use Python's built-in ast module to parse the Python code into an Abstract Syntax Tree (AST). This will allow you to programmatically inspect the code's structure.
For identifying unused variables, you'll need to track variable definitions and usages within scopes (functions, methods, global scope).
Regular expressions can be helpful for quickly scanning comments and string literals for specific patterns.
Be mindful of how you distinguish between variable names containing keywords (like my_api_key) and actual string literals that might be secrets. The challenge asks for strings that look like secrets.