Initial Design Discussion
- Created: 2026-02-07
- Status: Frozen — This document is a historical record. Do not modify.
Background
Tuffy is an experimental optimizing compiler written in Rust with LLM assistance. This document records the initial design discussion and key decisions made before implementation begins.
Long-term Vision
Tuffy aims to be a production-grade, language-agnostic optimizing compiler backend:
- Multi-language backend — Serve as a codegen backend for multiple programming languages, not just Rust
- Custom IR — Design and implement a custom intermediate representation with optimization passes
- Multi-platform — Support multiple target architectures from the ground up
- Balanced optimization — Pursue both fast compilation and competitive runtime performance
- Formal verification — Provide formal correctness guarantees for optimization passes
Short-term Milestone
The first milestone is rustc_codegen_tuffy, a codegen backend for rustc as an alternative to LLVM.
Key Design Decisions
- Target platform: Multi-platform architecture from the start, with x86-64 as the initial implementation target
- Optimization direction: Balance between compilation speed and runtime performance (not exclusively pursuing either)
- IR design: Custom IR — build our own intermediate representation, perform optimizations on it, then lower to machine code (rather than translating directly from rustc MIR)
- IR interpreter: Implement a tuffy IR interpreter for testing and validation, similar to Miri
- Backend strategy: Implement
rustc_codegen_ssatraits to reuse existing MIR lowering code (~10,000 lines). Builder generates tuffy IR instead of LLVM IR. Same approach asrustc_codegen_cranelift. May switch to direct MIR traversal later if more control is needed.
IR Design
Structure:
- Single data structure for all stages — no separate SDAG/MIR like LLVM. Different stages have different normalization (legalization) constraints on the same IR, progressively lowering toward machine code.
UB model (following Alive2):
- Only
poison— noundef. Simplifies semantics and avoids the well-known undef/poison confusion in LLVM. - Uninitialized memory is allowed at the memory level, but values are either concrete or poison.
Byte type (b<N>):
- Introduced from the start, based on the LLVM byte type RFC. Represents raw memory data distinct from integers, with per-byte poison tracking. Enables sound memcpy lowering and load merging.
Integer type:
- A single
inttype with infinite precision — no fixed bitwidth (no i8/i32/i64). - No overflow: arithmetic operations are mathematical (e.g., add is true addition).
- Range constraints via assert nodes: e.g.,
assertsext(val, 32)constrains a value to the 32-bit signed range, producing poison if violated. zext/sext/truncinstructions are eliminated from the IR.- Bitwise operations (and/or/xor/shift) are also defined on infinite precision integers.
At-use bit-level analysis (encoded in IR):
- Each use edge carries bit-level annotations: which bits are demanded, and which bits are known to be 0 or 1.
- UB constraints (from assert nodes, memory operations, ABI boundaries) propagate backward through the IR, refining known bits on upstream values.
- This is analogous to LLVM’s
KnownBitsandDemandedBitsanalyses, but promoted to first-class IR constructs rather than side analyses. - Follows the “analysis is also a transformation” principle: analysis results are encoded directly into the IR, not stored in external tables.
- Instruction selection in the final stage derives concrete machine types from these at-use annotations.
Representation form:
- Hierarchical CFG with nested SESE regions. Loops and branches are explicitly represented as regions, not inferred from CFG back-edges.
- Region internals use traditional SSA basic blocks with fixed instruction ordering.
- Translation from rustc MIR requires control flow structuralization to recover nested regions from the flat CFG.
- Enables early loop optimizations (vectorization, LICM, unrolling) by preserving loop structure in the IR.
- Inspired by LLVM VPlan’s hierarchical CFG and recipe-based transformation model.
Memory model:
- Abstract pointer model: pointer = allocation ID + offset, preserving full provenance.
- Two ptr-to-int instructions with distinct semantics:
ptrtoint: converts pointer to integer, capturing provenance.inttoptr(ptrtoint(p))is a safe roundtrip.ptrtoaddr: extracts only the address portion, discarding provenance.
- Four-state byte representation:
Bits(u8) | PtrFragment(Pointer, n) | Uninit | Poison. Each byte in memory is one of these states. - Uninitialized memory (
Uninit) is a distinct abstract state, not “random bits”. Any operation on uninit values is UB (produces poison). - Poison can exist at the byte level in memory, aligned with the per-byte poison tracking of the byte type.
Testing Strategy
Validation and benchmarking will use the following targets (in progressive order):
- Internal unit tests — small test cases for basic codegen correctness (arithmetic, control flow, function calls)
no_stdprograms — minimal programs to validate the backend without standard library dependencies- serde — widely used crate with heavy generics and macros, tests monomorphization and trait dispatch
- rustc test suite — rustc’s ui tests for broad correctness coverage
- ripgrep — real-world CLI project for end-to-end correctness and performance comparison against LLVM backend
- clippy — later-stage validation target due to tight coupling with rustc internals
Notes
- Self-hosting is not a separate goal since tuffy is written in Rust — once the backend can compile Rust, self-hosting follows naturally
- JIT support and outperforming LLVM are not current goals but may be revisited later
Target Abstraction
- Triple-based target identification, following LLVM’s convention.
- TargetLowering design follows LLVM’s approach (legal types, operation lowering, register classes).
- Sub-method override via command-line: individual TargetLowering properties (pointer size, legal integer widths, max vector width, etc.) can be overridden via
--target-override key=valueflags. This enables testing IR-level passes against specific target properties without implementing a full architecture backend. - Cost model: deferred, not yet designed.
IR Memory Layout
- Encapsulation: memory layout is an implementation detail, hidden behind a stable API. The rest of the compiler operates on opaque handles, not raw pointers. Layout can be changed without affecting pass code.
- Arena + index: IR nodes are allocated in arenas. References are
u32indices (opaque handles), not pointers. Reduces reference size and enables serialization. - Contiguous instruction storage: instructions within the same basic block are stored contiguously in the arena. Iterating a BB’s instructions is a linear memory scan, not pointer chasing.
- Tombstone + periodic compaction: deleted instructions are marked as tombstones. Periodic compaction restores contiguity.
- No SOA: struct-of-arrays layout rejected; standard AOS with contiguous storage is sufficient.
Crate Structure
Tuffy is a language-agnostic compiler backend. rustc_codegen_tuffy is one frontend adapter; future language frontends reuse the entire optimization and backend stack.
tuffy/
├── tuffy_ir/ — IR definition: types, instructions, CFG, memory model
├── tuffy_ir_interp/ — IR interpreter (miri-like, UB-aware)
├── tuffy_opt/ — optimization passes (generated from declarative rewrite rules)
├── tuffy_target/ — target abstraction layer (trait + CLI override)
├── tuffy_target_x86/ — x86 backend (i386 + x86-64)
├── tuffy_codegen/ — instruction selection, register allocation, machine code emission
├── tuffy_trace/ — rewrite path tracing
├── tuffy_tv/ — translation validation (Alive2-style SMT verifier)
├── rustc_codegen_tuffy/ — rustc frontend adapter (MIR → tuffy IR)
Dependencies: tuffy_ir is the bottom layer with no internal dependencies. All other crates depend on it directly or transitively.
Object file emission: use the object crate (same as Cranelift and rustc) for writing ELF, Mach-O, COFF object files. No custom object file handling.