Inspecting LLVM IR with Rust
This challenge focuses on the practical application of interacting with LLVM Intermediate Representation (IR) using Rust. You will learn how to load and parse LLVM IR into a Rust representation, allowing you to programmatically analyze and understand the structure and content of compiled code. This skill is crucial for compiler development, static analysis tools, and advanced debugging.
Problem Description
Your task is to implement a Rust program that can:
- Load LLVM IR: Read LLVM IR from a string.
- Parse LLVM IR: Convert the string representation into an in-memory Rust data structure that represents the LLVM IR.
- Inspect Basic Blocks: Iterate through the basic blocks of a function.
- Inspect Instructions: Within each basic block, iterate through the instructions.
- Identify Instruction Types: For each instruction, determine its opcode (e.g.,
add,load,call). - Count Instructions: Count the total number of instructions within a specific function.
You are expected to use the llvm-sys crate (or a similar FFI wrapper for LLVM) to achieve this. The goal is to demonstrate your ability to navigate and query LLVM IR programmatically.
Key Requirements:
- Use Rust for the implementation.
- Leverage the LLVM C API via Rust FFI bindings.
- The program should accept LLVM IR as a string.
- The program should output the count of instructions in a designated function.
- Handle potential errors during IR parsing (e.g., invalid IR).
Expected Behavior:
Given a valid LLVM IR string and a function name, the program should parse the IR and print the total number of instructions within the specified function.
Edge Cases:
- Empty LLVM IR string.
- LLVM IR string with syntax errors.
- Function name not found in the IR.
- Functions with no basic blocks or instructions.
Examples
Example 1:
Input IR String:
define i32 @add_one(i32 %x) {
%inc = add i32 %x, 1
ret i32 %inc
}
Function Name: "add_one"
Output:
Instruction count for function 'add_one': 2
Explanation: The IR defines a function add_one with two instructions: add and ret. The program correctly identifies and counts these.
Example 2:
Input IR String:
; ModuleID = 'example.ll'
source_filename = "example.c"
define void @simple_loop() {
entry:
br label %loop
loop:
%i = phi i32 [ 0, %entry ], [ %next, %loop ]
%next = add i32 %i, 1
%cmp = icmp slt i32 %next, 10
br i1 %cmp, label %loop, label %exit
exit:
ret void
}
Function Name: "simple_loop"
Output:
Instruction count for function 'simple_loop': 5
Explanation: The simple_loop function contains a br to entry, then in loop it has phi, add, icmp, and another br. Finally, exit has ret. This counts to 5 instructions in the simple_loop function (not counting the entry basic block as it's a single branch to loop).
Example 3: (Function Not Found)
Input IR String:
define i32 @add_one(i32 %x) {
%inc = add i32 %x, 1
ret i32 %inc
}
Function Name: "subtract_one"
Output:
Error: Function 'subtract_one' not found in the LLVM IR.
Explanation: The specified function name does not exist within the provided LLVM IR.
Constraints
- The LLVM IR will be provided as a single Rust
String. - The function name to inspect will also be a Rust
String. - You are expected to link against a system-installed LLVM library (version 10 or higher is recommended).
- Error handling for invalid LLVM IR should be robust, printing an informative message.
- Performance is not a primary concern for this challenge, but avoid overtly inefficient approaches (e.g., repeated parsing of the same IR).
Notes
- The
llvm-syscrate provides direct bindings to the LLVM C API. You will need to consult LLVM's C API documentation for functions related to parsing modules, iterating over functions, basic blocks, and instructions. - Key LLVM C API functions you might need include:
LLVMParseIRInContext(or similar for module parsing)LLVMModuleCreateWithName(if you need to create a module first)LLVMGetFirstFunctionLLVMGetNextFunctionLLVMGetFirstBasicBlockLLVMGetNextBasicBlockLLVMGetFirstInstructionLLVMGetNextInstructionLLVMGetInstructionOpcodeLLVMVerifyModule(for validating the parsed IR)LLVMDisposeModule(for memory management)
- Remember to handle the lifetime and ownership of LLVM objects returned by the C API.
- Error reporting from LLVM can be done via error strings or codes. You'll need to capture these.