Feature Name: tuffy_interpreter
- Status: Draft
- Created: 2026-02-07
- Completed: N/A
Summary
This RFC defines the design of the tuffy IR interpreter, a miri-like execution engine that directly interprets tuffy IR with full UB detection, poison tracking, pointer provenance enforcement, and four-state byte memory semantics. The interpreter serves as the primary testing and validation tool for transform correctness.
Motivation
Compiler testing traditionally relies on end-to-end execution: compile a program, run it, check the output. This approach has two fundamental limitations:
- Coarse granularity: end-to-end tests only detect bugs that manifest as observable output differences. Silent UB, provenance violations, and poison mishandling may not produce wrong outputs on any particular test input.
- Slow reduction: when a miscompilation is found, reducing the reproducer from a full program to a minimal IR fragment requires expensive tools (creduce, llvm-reduce) operating at the source or LLVM IR level.
The LLVM UB-aware interpreter (llubi) demonstrated that an IR-level interpreter with UB detection can reduce miscompilation reproducers from hours to minutes. Tuffy’s IR design — poison-only UB model, four-state bytes, abstract pointer provenance — is inherently more amenable to precise interpretation than LLVM IR.
Guide-level explanation
Core capabilities
The tuffy interpreter executes IR functions directly, maintaining a concrete machine state:
- Infinite precision integers: implemented via arbitrary-precision integer library (e.g.,
num-bigint). No overflow, no wrapping — arithmetic is mathematical. - Four-state byte memory: each byte in the interpreter’s memory is one of
Bits(u8),PtrFragment(AllocId, usize),Uninit, orPoison. - Pointer provenance tracking: every pointer carries an
(AllocId, offset)pair. Out-of-bounds access, use-after-free, and provenance violations are detected. - Poison propagation: poison values propagate through computations. Using poison as a branch condition, storing poison to observable memory, or violating an assert node is flagged as UB.
- Assert node enforcement:
assertsext(v, N)andassertzext(v, N)are checked at runtime; violations produce poison (and are reported as UB).
Execution modes
The interpreter supports multiple execution modes:
| Mode | Purpose |
|---|---|
| Normal | Execute and produce output, report UB as warnings |
| Strict | Abort on first UB detection |
| Differential | Execute two IR versions (pre/post optimization), compare results |
| Reduce | Deterministic execution for use with IR reducer tools |
Differential testing
The primary use case for transform validation:
tuffy-interp --differential --before=pre_opt.tir --after=post_opt.tir --input=test_case
The interpreter runs both IR versions on the same input and reports any behavioral divergence. Combined with a fuzzer, this provides high-confidence testing of individual transforms.
UB detection
The interpreter detects and reports:
| UB category | Detection mechanism |
|---|---|
| Assert node violation | Runtime range check on assertsext/assertzext |
| Uninit read | Memory byte is Uninit at load time |
| Poison observation | Poison used as branch condition or stored to observable memory |
| Out-of-bounds access | Pointer offset exceeds allocation bounds |
| Use-after-free | Access to freed allocation |
| Provenance violation | Pointer used with wrong AllocId |
| Division by zero | Divisor is zero |
| Null dereference | Pointer with null address |
Integration with fuzzing
The interpreter integrates with fuzzing tools:
- Rustlantis: generates random Rust programs, compiles with tuffy, runs through interpreter
- IR-level fuzzer: generates random tuffy IR fragments, runs through optimization pipeline, differential-tests with interpreter
- Reducer: when a bug is found, an IR reducer (similar to llvm-reduce) minimizes the reproducer using the interpreter as the oracle
Reference-level explanation
Value representation
enum Value {
Int(BigInt),
Bytes(Vec<AbstractByte>),
Ptr(Pointer),
Float(f32),
Double(f64),
Vec(Vec<Value>),
Poison,
}
enum AbstractByte {
Bits(u8),
PtrFragment(AllocId, usize),
Uninit,
Poison,
}
struct Pointer {
alloc_id: AllocId,
offset: usize,
address_space: u32,
}
Memory model
struct Memory {
allocations: HashMap<AllocId, Allocation>,
next_alloc_id: AllocId,
}
struct Allocation {
bytes: Vec<AbstractByte>,
size: usize,
alive: bool,
}
impl Memory {
fn alloc(&mut self, size: usize) -> Pointer;
fn free(&mut self, ptr: Pointer) -> Result<(), UBError>;
fn load(&self, ptr: Pointer, size: usize) -> Result<Vec<AbstractByte>, UBError>;
fn store(&mut self, ptr: Pointer, bytes: Vec<AbstractByte>) -> Result<(), UBError>;
}
Every memory operation checks:
ptr.alloc_idrefers to a live allocationptr.offset + size <= allocation.size- Provenance matches (the pointer was derived from this allocation)
Instruction execution
Each instruction maps directly to a Rust function:
fn exec_add(a: &Value, b: &Value) -> Value {
match (a, b) {
(Value::Int(x), Value::Int(y)) => Value::Int(x + y),
(Value::Poison, _) | (_, Value::Poison) => Value::Poison,
_ => panic!("type error"),
}
}
fn exec_assertsext(v: &Value, n: u32) -> Value {
match v {
Value::Int(x) => {
let min = -(BigInt::one() << (n - 1));
let max = (BigInt::one() << (n - 1)) - 1;
if *x >= min && *x <= max {
Value::Int(x.clone())
} else {
Value::Poison // UB: report to user
}
}
Value::Poison => Value::Poison,
_ => panic!("type error"),
}
}
Hierarchical CFG execution
The interpreter walks the hierarchical CFG:
- Region.Loop: execute the loop body repeatedly until the exit branch is taken
- Region.Branch: evaluate the condition, execute the corresponding sub-region
- Region.Sequence: execute sub-regions in order
- BasicBlock: execute instructions sequentially, then the terminator
MemSSA handling
The interpreter ignores MemSSA tokens — they are compile-time analysis artifacts. The interpreter uses its own concrete memory state for all load/store operations.
Drawbacks
- Performance: arbitrary-precision integers and per-byte tracking are slower than native execution. The interpreter is not suitable for running large programs at scale.
- Incomplete semantics: some IR features (atomics, concurrency) are difficult to interpret faithfully. Single-threaded sequential execution is a simplification.
- Maintenance burden: the interpreter must be kept in sync with IR semantics changes. Every new instruction or semantic change requires interpreter updates.
Rationale and alternatives
Why a custom interpreter instead of lowering to LLVM and using lli?
Lowering to LLVM IR loses tuffy-specific semantics (infinite precision integers, four-state bytes, provenance). The whole point of the interpreter is to validate tuffy IR semantics directly, not after translation.
Why arbitrary-precision integers in the interpreter?
The IR defines int as infinite precision. The interpreter must faithfully implement this — using fixed-width integers would mask overflow bugs that the IR semantics are designed to prevent.
Why not use Miri directly?
Miri interprets Rust MIR, not tuffy IR. The tuffy interpreter operates on tuffy IR after optimization, catching bugs introduced by transforms. Miri and the tuffy interpreter serve complementary roles: Miri validates Rust semantics, the tuffy interpreter validates tuffy IR semantics.
Prior art
- Miri: Rust MIR interpreter with UB detection, pointer provenance tracking, and abstract memory model. Primary inspiration for tuffy’s interpreter design.
- llubi (LLVM UB-aware interpreter): LLVM IR interpreter with poison tracking and UB detection. Demonstrated the value of IR-level interpretation for miscompilation debugging. Limited by LLVM’s undef/poison duality and lack of provenance tracking.
- lli (LLVM interpreter): basic LLVM IR interpreter without UB awareness. Insufficient for correctness validation.
- KLEE: symbolic execution engine for LLVM IR. More powerful but much slower; tuffy’s interpreter focuses on concrete execution for speed.
Unresolved questions
- Atomic semantics: how should the interpreter handle atomic operations? Sequential consistency by default, or model relaxed memory?
- External function calls: how to handle calls to external functions (libc, system calls)? Stub library, or FFI to native implementations?
- Vector operations: should the interpreter support scalable vectors, or only fixed-width vectors with a concrete vscale?
- Performance optimization: are there opportunities to fast-path common cases (e.g., when all bytes are
Bits, skip per-byte tracking)?
Future possibilities
- Symbolic execution: extend the interpreter to support symbolic values, enabling exhaustive exploration of execution paths for bounded verification.
- Interactive debugging: step-through execution with inspection of IR values, memory state, and provenance at each instruction.
- Coverage-guided fuzzing: instrument the interpreter to report coverage, guiding the fuzzer toward unexplored paths.
- Regression test generation: automatically generate minimal test cases from fuzzer-discovered bugs, adding them to the test suite.