Building a Compiler from Scratch: How Miri Turns Code into Machine Instructions

I spent years convinced that compilers were dark magic — something only wizards at Mozilla, Microsoft, or Google built. Then I decided to write one anyway. Turns out, the transformation from human-readable text to executable binary isn’t magic at all. It’s a pipeline — six phases, each doing one thing well, each handing its output to the next.

What follows is how Miri’s compiler actually works: six phases that turn a .mi source file into a native executable. You don’t need a compiler background to follow along. In fact, that’s kind of the point: compilers are less arcane than they seem, and the ideas behind them are elegant in a way that I think every engineer can appreciate.

Six phases, one job each

When you type miri run hello.mi, here’s what actually happens:

Source Code (.mi)
     |
     v
   Lexer -----> Token Stream
     |
     v
   Parser ----> Abstract Syntax Tree (AST)
     |
     v
Type Checker -> Validated AST + Type Info
     |
     v
MIR Lowering -> Basic Blocks + Control Flow Graph
     |
     v
  Codegen ----> Native Machine Code (via Cranelift)
     |
     v
  Linker -----> Executable Binary

Each phase has one job: take the output of the previous phase, transform it, and hand it to the next one. The lexer doesn’t know about types. The type checker doesn’t know about machine code. This separation is what makes the whole thing manageable — you can think about one problem at a time, and each layer is independently testable.

Let me take you through each phase.

The Lexer: teaching a machine to read

The lexer is the simplest phase conceptually, and yet it’s where some of the trickiest edge cases live.

Its job is to turn raw characters into tokens — meaningful chunks that the next phase can work with. Think of it like reading a sentence. Your brain doesn’t process individual letters; it recognizes words, punctuation, spacing. The lexer does the same for code.

Source:   let x = 42 + 3
Tokens:   [Let] [Identifier("x")] [Assign] [Integer(42)] [Plus] [Integer(3)]

Miri’s lexer is built on top of the Logos crate — very fast lexer generator for Rust. Logos handles the boring parts: matching keywords, recognizing operators, parsing number formats. But Miri adds a stateful layer on top that handles things Logos can’t.

The most interesting of those is indentation tracking. Like Python, Miri uses significant whitespace — indentation determines code blocks, not curly braces. This means the lexer has to emit synthetic Indent and Dedent tokens whenever the indentation level changes:

if x > 0                     # indent_stack: [0]
    println("positive")      # indent_stack: [0, 4] -> emit Indent
    println("and nonzero")
println("always runs")       # indent_stack: [0]    -> emit Dedent

The lexer maintains a stack of indentation levels and compares each new line against the top of the stack. More indentation? Push and emit Indent. Less? Pop (possibly multiple times) and emit Dedent for each level unwound. Same? Just a regular statement boundary.

There’s an important exception here: indentation changes are suppressed inside parentheses, brackets, and curly braces. Otherwise, you couldn’t write multi-line function calls or collection literals without the lexer panicking about inconsistent indentation. The lexer tracks bracket nesting depth and simply ignores whitespace changes when any bracket counter is above zero.

Another subtle problem the lexer solves: when it sees 5., that could be the start of a float (5.0), a range operator (5..10), or a method call on an integer (5.method). Logos initially matches this as an ambiguous FloatOrRange token, and then Miri’s stateful layer peeks at the next character to figure out which one it is. These kinds of disambiguation decisions — where one character of lookahead changes the entire meaning — are what make lexers more interesting than they first appear.

The Parser: where structure emerges

The parser takes the flat stream of tokens and builds a tree — the Abstract Syntax Tree, or AST. This is where structure emerges. The AST represents the hierarchical relationships in your code: this expression is the condition of that if statement, this + operator has these two operands, this function has these parameters and this body.

Tokens:   [Let] [x] [=] [1] [+] [2] [*] [3]

AST:            VariableDeclaration
                ├── name: "x"
                └── init: Binary(+)
                      ├── left: Literal(1)
                      └── right: Binary(*)
                            ├── left: Literal(2)
                            └── right: Literal(3)

Notice that * binds tighter than + — the tree correctly represents 1 + (2 * 3), not (1 + 2) * 3. Getting operator precedence right is one of the parser’s core responsibilities.

Miri uses a hybrid parsing strategy. For statements and declarations — if, for, fn, class — it uses recursive descent: each grammar rule is a function, and the functions call each other based on what token they see. This is the most intuitive approach; the code structure mirrors the grammar.

For expressions with operators, Miri uses Pratt parsing (also called precedence climbing). The idea is elegant: each precedence level has a function that parses the operand at the next-higher level, then loops while it sees an operator at its own level, combining left and right into a binary node. Twelve precedence levels — from or at the lowest to primary literals at the highest — chain together naturally, and the whole thing handles complex expressions like a + b * c == d and e or f without any ambiguity.

The parser only needs one token of lookahead to decide what to do. It peeks at the next token without consuming it: see if? Parse an if-statement. See fn? Parse a function declaration. See a number? Start parsing an expression. This simplicity — one token is enough — is a sign that the grammar is well-designed.

One design choice I like about Miri’s parser: every AST node gets a unique sequential ID, generated by an atomic counter. The type checker later uses these IDs to store inferred types in a separate HashMap<NodeId, Type>, without ever modifying the AST. This keeps the AST immutable after construction — a small decision that prevents an entire category of subtle bugs.

The Type Checker: catching mistakes before they happen

The parser validates syntax — is the code structured correctly? — but it doesn’t check whether the code makes logical sense. That’s the type checker’s job.

let x = 42
let y = "hello"
let z = x + y       # Type error: can’t add Int and String

Without type checking, this would blow up at runtime. With it, the error is caught before the program ever runs.

Miri’s type checker uses a two-pass architecture, and the reason is simple. Consider this perfectly valid code:

fn main()
    greet("Alice")    # greet() hasn’t been defined yet!

fn greet(name String)
    println(f"Hello, {name}")

In a single pass, the call to greet() would fail because the function hasn’t been seen yet. So Pass 1 walks the entire program and registers all function signatures, class definitions, struct definitions, and type hierarchies. Pass 2 then checks everything else — function bodies, expressions, type compatibility — with full knowledge of what exists.

The type checker supports type inference, which means you don’t have to annotate everything:

let x = 42              # Inferred: Int
let name = "Alice"      # Inferred: String
let items = [1, 2, 3]   # Inferred: [Int]

For collections, it infers the element type from the first element, then verifies all other elements match. For function calls, it looks up the function’s declared return type. For generics, it infers type parameters from the arguments you pass — identity(42) infers T = Int without you ever specifying it.

Type compatibility in Miri follows a decision tree with about ten steps: exact match, error suppression (to prevent cascading errors), option compatibility (T is compatible with T?), numeric widening (i8 can be used where i64 is expected, but not the reverse), subtype checking (class inheritance), and so on. The order matters — earlier checks are faster and more common, so the system short-circuits quickly.

One detail I’m particularly pleased with: the type checker reports multiple errors in a single pass. If your program has three type errors, you see all three at once, not just the first one. To make this work without drowning you in noise, it uses an Error type as a sentinel — once an expression fails type checking, any operation on that Error value is silently accepted. A single typo produces one error message, not twenty cascading ones.

MIR: flattening the world

This is where things get really interesting. The AST is too high-level for code generation — it has nested if/else, for loops, match expressions, classes, all these rich constructs. Machine code understands none of that. It only knows: move a value, add two numbers, jump to an address.

MIR — the Mid-level Intermediate Representation — bridges the gap. It decomposes everything into basic blocks: straight-line sequences of simple instructions, each ending with a “terminator” that says where to go next.

// What you write:                // What MIR produces:
if x > 0                         bb0:
    return x                        _2 = Gt(_1, 0)
else                                SwitchInt(_2) -> [true: bb1, else: bb2]
    return -x
                                  bb1:
                                    _0 = Copy(_1)
                                    Return

                                  bb2:
                                    _0 = Neg(_1)
                                    Return

A for loop becomes a while-like pattern with an index variable and a bounds check. A match expression becomes a SwitchInt with branches. Method calls become mangled function calls like String_to_upper. Every high-level construct is “desugared” into these primitives.

After the initial lowering, MIR runs through optimization passes — constant propagation (replace 5 + 3 with 8 at compile time), copy propagation (eliminate redundant copies), dead code elimination (remove computations whose results are never used), and control flow simplification (thread chains of empty blocks). These passes iterate until nothing changes, up to ten rounds.

But the most important MIR pass isn’t an optimization in the traditional sense. It’s Perceus — automatic reference counting insertion — and it’s at the heart of how Miri manages memory. More on that in a moment.

Memory management: why I bet against garbage collection

Every language has to answer the question: who frees the memory?

In C, you do it manually — and get it wrong constantly. In Java and Go, a garbage collector does it for you — but it runs on its own schedule, introducing latency spikes that are murder for real-time systems. In Rust, the borrow checker enforces ownership rules at compile time — brilliant, but the learning curve is steep and the annotation burden is real.

I spent weeks going back and forth on this. GC would have been the path of least resistance — well-understood, easy to implement, handles cycles for free. And yes, you could run a GC on the CPU to manage GPU memory — it’s not technically impossible. But the more I thought about it, the worse the fit became.

GPU computing demands predictable memory lifetimes. You need to know exactly when a buffer is safe to reuse, when data can be transferred back, when GPU memory can be freed. A GC gives you none of that — it frees things eventually, on its own schedule, after a tracing pass that might pause the CPU and starve the GPU pipeline of work. And if Miri eventually compiles code that runs on the GPU, the kernel side would need a completely different memory strategy anyway — meaning two memory models in one language, which is exactly the kind of complexity I’m trying to avoid.

Reference counting sidesteps all of this. It’s deterministic, it’s per-object, and — critically — it works the same way on both CPU and GPU code paths. One memory model, one set of semantics, no special cases.

So Miri takes that path: automatic reference counting, inserted at compile time by the Perceus algorithm.

The idea is straightforward. Every heap-allocated object — strings, lists, maps, class instances — has a reference count: how many variables point to it. When you assign a list to a new variable, the count goes up. When a variable goes out of scope, the count goes down. When it hits zero, the memory is freed immediately.

let a = [1, 2, 3]     # List allocated, RC = 1
let b = a              # RC = 2 (b aliases a)
# b goes out of scope  # RC = 1
# a goes out of scope  # RC = 0 -> freed immediately

What makes Perceus special is that the programmer never writes any of this. The MIR pass analyzes every function and inserts IncRef and DecRef instructions automatically. When a managed value is copied, Perceus adds an IncRef before the copy. When a local goes out of scope, it adds a DecRef. Primitives and small structs (128 bytes or less, all primitive fields) are just copied by value — no reference counting overhead at all.

This gives Miri a distinctive position in the memory management landscape:

  • vs. C/C++: No manual malloc/free, no dangling pointers, no double-frees. You write code and the compiler handles cleanup.
  • vs. Python/Java/Go: No garbage collector, no GC pauses, no unpredictable latency. Memory is freed deterministically, the instant it’s no longer needed.
  • vs. Rust: No borrow checker, no lifetime annotations, no fighting the compiler about references. The trade-off is that Miri’s approach has slightly more runtime overhead (those reference count increments and decrements aren’t free), and it can’t detect reference cycles automatically. But for the vast majority of programs, the ergonomics win is significant.
  • vs. Swift: Swift also uses automatic reference counting, but requires the programmer to think about weak and unowned references to break cycles. Miri’s Perceus approach aims to handle more of this automatically.

The deterministic cleanup is particularly important for Miri’s long-term vision. When you’re coordinating memory across CPU and GPU, you need tight control over object lifetimes — not a GC that might pause at the worst possible moment. Reference counting gives you that: each object manages its own lifetime, cleanup is immediate and predictable, and the same IncRef/DecRef semantics work regardless of where the code runs.

Code generation: the final transformation

MIR’s basic blocks become actual machine instructions. This is where the abstract finally becomes concrete.

Miri uses Cranelift as its code generation backend — a fast, lightweight code generator originally built for WebAssembly. Cranelift isn’t as aggressive at optimization as LLVM (the backend used by Rust, Clang, and Swift), but it compiles much faster. For a language still in alpha, where fast iteration on the compiler itself matters more than squeezing out every last CPU cycle, that’s the right trade-off.

The type mapping is simple by design: integers map to native integer types, floats to native float types, booleans to a single byte. Everything else is a pointer. Strings, lists, maps, structs, class instances — they all live on the heap, and the compiled code manipulates pointers to them.

Every heap-allocated object follows a uniform layout: a reference count stored just before the payload data. String literals are special — their reference count is set to a sentinel “immortal” value (the high bit is set), so they’re never freed. This is a small optimization that eliminates pointless reference counting on data that lives for the entire program.

After Cranelift produces an object file, the system linker (cc) combines it with Miri’s runtime library — a separate Rust crate compiled as a static library. The runtime provides all the functions that compiled Miri code calls at execution time: string operations, collection management (creating lists, pushing elements, growing arrays), memory allocation, I/O.

The runtime is type-erased — it doesn’t know about Int or String, only about byte sizes. List<Int> and List<String> both become the same MiriList struct with a different elem_size field. One implementation handles all element types. This is the opposite of Rust’s monomorphization approach (which generates specialized code for each generic instantiation), and the trade-off is worth it: smaller binaries, simpler runtime, at the cost of losing some type-specific optimizations.

The GPU-first vision: why all of this matters

Everything I’ve described so far runs on the CPU. But Miri’s architecture was designed with a bigger ambition in mind: GPU-first computing.

The MIR already tracks an ExecutionModel for each function — Cpu, Async, GpuKernel, or GpuDevice. The type checker already registers GPU-specific built-in types like Dim3 and GpuContext. The reference counting memory model was specifically chosen because it works without a global garbage collector — a requirement for GPU execution where there’s no runtime thread managing memory.

The backend doesn’t support GPU code generation yet. But the foundation is deliberately designed so that when it does, the programmer’s experience will be seamless: write a function, annotate it with gpu, and the compiler handles the rest — memory transfers, kernel launches, synchronization. Not two separate languages stitched together with FFI, but one language that targets both CPU and GPU from the same source.

This is the bet: that the next generation of programming languages needs to treat GPU compute as a first-class citizen, not an afterthought bolted on through libraries.

The trade-offs that keep me up at night

Building a compiler forces you to make dozens of decisions where reasonable people would disagree. Here are the ones I think about most:

Cranelift over LLVM. LLVM produces better-optimized code, but it’s enormous, slow to compile, and a nightmare to link against. Cranelift trades some code quality for dramatically faster compilation. For a language in active development, where I’m rebuilding the compiler many times a day, that speed matters more. The backend trait abstraction means LLVM can be added later without touching the rest of the pipeline.

Reference counting over garbage collection. This was the biggest decision. GC would have been easier to implement and handles cycles automatically. But it introduces non-determinism (when does the GC run?), latency spikes, and a unified CPU/GPU memory model becomes much harder — you’d need GC on the CPU side and something else entirely for GPU kernels. Perceus-style automatic RC gives me deterministic destruction and one consistent memory model across both targets, at the cost of slightly more compiler complexity and the future challenge of cycle detection.

Significant whitespace over braces. This is a taste decision, but it has real consequences. The lexer is more complex because it has to track indentation and emit synthetic tokens. But the resulting code is cleaner, and — importantly — it parses more naturally for LLM-based tooling. When the code’s structure is visible in its indentation rather than hidden in braces, both humans and AI can read it more reliably.

Two-pass type checking over single-pass. A single pass would be simpler but would require forward declarations or specific ordering. Two passes let you write functions in any order, which is how most programmers think. The cost is an extra walk over the AST, but it’s fast enough that it’s never been a bottleneck.

Type erasure in the runtime over monomorphization. Monomorphization (generating specialized code for each concrete type) produces faster code. Type erasure (one implementation for all types, parameterized by size) produces smaller binaries and a simpler runtime. For a language still finding its feet, simplicity wins. If profiling later shows the type erasure overhead matters, monomorphization can be added for hot paths.

Multiple error reporting over fail-fast. The type checker collects all errors and presents them at once. The parser stops at the first error. I’d like to make the parser more resilient eventually, but error recovery in parsing is genuinely hard — you need heuristics for “where does the next valid statement probably start?” — and getting it wrong produces confusing phantom errors. The type checker’s approach is easier because type errors are more localized.

Every one of these decisions could have gone the other way, and the language would still work. That’s the thing about compiler design — there are very few objectively right answers, just trade-offs that align with your priorities. Miri’s priorities are clear: fast iteration, ergonomic code, GPU readiness, and the kind of simplicity that lets a small team (currently just me) keep the whole system in their head.

Years ago, I thought compilers were dark magic. Now I have one that takes a .mi file and spits out a native binary — and I can trace every step of that journey through code I wrote. The compiler is open source. Next up: making it talk to GPUs :)