HomeGadgetsOpen-Source 80386 CPU Boots Doom Using Recovered Intel Microcode

Open-Source 80386 CPU Boots Doom Using Recovered Intel Microcode

  • The open-source 80386 CPU project z386 boots DOS 6 and 7, runs protected-mode software, and plays Doom on real FPGA hardware.
  • Unlike emulators, the open-source 80386 CPU is built around the actual recovered Intel microcode ROM — 2,560 entries wide.
  • z386 performs roughly like a 70MHz cached 386, bridging the gap between historical hardware accuracy and practical usability.
  • Four months of evenings and weekends went into decoding dense, hand-tuned microcode that assumes significant hidden hardware context.
  • The open-source 80386 CPU project z386 boots DOS 6 and 7, runs protected-mode software, and plays Doom on real FPGA hardware.
  • Unlike emulators, the open-source 80386 CPU is built around the actual recovered Intel microcode ROM — 2,560 entries wide.
  • z386 performs roughly like a 70MHz cached 386, bridging the gap between historical hardware accuracy and practical usability.
  • Four months of evenings and weekends went into decoding dense, hand-tuned microcode that assumes significant hidden hardware context.

An Open-Source 80386 CPU That Actually Runs Doom

The open-source 80386 CPU known as z386 has reached a milestone that felt theoretical not long ago: it boots DOS, runs protected-mode programs, and yes, plays Doom. Built around the original Intel microcode rather than a software emulator, z386 is the work of developer nand2mario and represents one of the most architecturally faithful recreations of a classic x86 processor ever attempted on an FPGA.

Doom II running on z386
Doom II running on z386. — nand2mario.github.io

This is the fifth post in a series that has methodically reverse-engineered the 80386’s internals — covering the multiplication and division datapath, the barrel shifter, protection and paging mechanisms, and the memory pipeline. Now the project is complete enough to run real workloads, which makes it the right moment to explain how the whole open-source 80386 CPU hangs together.

Why Microcode — and Why the 386 Is Hard

Most FPGA CPU projects take one of two approaches: a clean-room reimplementation that matches the instruction set architecture, or a cycle-accurate emulator. z386 does neither. Instead, it takes the recovered control ROM from the original 80386 silicon — 2,560 entries at 37 bits wide — and builds hardware that the microcode can actually drive. The goal isn’t to pretend to be a 386; it’s to give the original microcode a real machine to run on.

The project follows directly from z8086, an earlier effort that did the same thing for the 8086 using reenigne’s disassembly work. That project proved the concept. The open-source 80386 CPU, though, is a fundamentally harder problem. The instruction set is bigger, the internal state is richer, and the chip has to enforce segmentation, paging, privilege levels, and precise exception handling — all while the microcode assumes the existence of hardware mechanisms that Intel never publicly documented in full detail.

As nand2mario puts it, if the 8086 microcode reads like a straightforward C program, the 386 microcode reads like hand-tuned assembly: short, subtle, and packed with assumptions about hidden hardware state. Decoding that took roughly four months of evenings and weekends. The breakthrough came when reenigne and several collaborators extracted the 80386 microcode and shared their disassembly work — without that, this open-source 80386 CPU simply wouldn’t exist.

Eight Units, One Machine

At a high level, the open-source 80386 CPU is organised around eight cooperating functional units, mirroring Intel’s own block diagram closely enough that the original documentation still serves as a useful map.

80386 block diagram showing bus interface, prefetch, instruction decode, control, data, protection test, segmentation, a
The 80386 as eight cooperating units · Image: Intel, The Intel 80386 – Architecture and Implementation, Figure 8. — nand2mario.github.io

Those eight units are the prefetch unit, decoder, microcode sequencer, ALU and shifter, segmentation unit, protection unit, paging unit, and the bus interface and cache path. What’s striking is that this isn’t a clean linear pipeline. It’s better understood as several large, partly independent state machines that overlap in time. The prefetch unit fills its 16-byte code queue while execution is busy elsewhere. The decoder can prepare later instructions ahead of when they’re needed. Address translation can start before the bus is required. Intel’s own papers describe up to six instructions being in different phases of processing simultaneously — yet the execution unit still consumes exactly one micro-operation per cycle.

Intel 80386 die shot labeled with the major functional units
The same eight-unit organization on the 80386 die. Base · Image: Intel 80386 DX die, Wikimedia Commons. — nand2mario.github.io

That distinction matters historically. The 486, which arrived three years after the 386, reorganised this design into a finer-grained pipeline explicitly aimed at one full instruction per clock. The 386 never quite gets there — even a simple register-to-register move takes at least two microcode cycles. z386 preserves that characteristic faithfully, making the open-source 80386 CPU a structurally honest recreation rather than a modernised approximation.

The Front End: Prefetch and Decode

Instruction prefetch sounds simple until you do the bandwidth arithmetic. A 1986 Intel paper by Jim Slager, “Performance Optimizations of the 80386”, lays out the math clearly: the average 386 instruction is about four bytes long and takes roughly four clock cycles to execute, which means steady-state operation needs approximately one byte of code fetched per clock. In practice, the prefetcher has to sustain burst bandwidth well above that average to smooth over variable-length instructions, taken branches, and data cycles that steal bus slots away from code fetch.

The decoder handles the full complexity of x86 variable-length encoding — prefix bytes, ModR/M fields, SIB bytes, immediates, and displacements — and maps each instruction to its microcode entry point. This is where the ROM/PLA-style decoding that Intel used becomes both a constraint and a guide. z386 recreates that decoding structure rather than replacing it with a lookup table, which keeps the recovered microcode’s assumptions intact.

The microcode sequencer then takes over: fetching expanded microcode words, handling jumps and delay slots, managing faults, and knowing when to move to the next instruction. It’s the closest thing the open-source 80386 CPU has to a control unit in the classical sense, and it’s where most of the subtle behaviour lives.

Performance: Fast 386 or Slow 486?

In practical terms, z386 performs like a fast 386 running at around 70MHz — which puts it roughly in low-end 486 territory in terms of real-world throughput. That’s a deliberately interesting design point. Historical 386 chips topped out around 40MHz, but they typically relied on large external caches in the 32KB to 128KB range to stay fed. z386 instead uses a 16KB 4-way set-associative unified L1 cache implemented with FPGA-friendly structures, which keeps the clock high at the cost of somewhat worse cycles-per-instruction compared to the best historical 386 configurations.

Compared to ao486 — the other well-known open-source 486-class FPGA core — z386 trades some raw performance for architectural authenticity. ao486 is a clean-room design optimised for compatibility and speed on FPGAs. The open-source 80386 CPU z386 is trying to do something different: preserve the structural decisions Intel made in 1985, including the ones that weren’t optimal, because those decisions are part of what the recovered microcode was written to exploit.

What This Project Actually Demonstrates

It would be easy to file z386 under “impressive hobbyist project” and move on. That undersells what’s happening here. The ability to extract, disassemble, and then actually execute original manufacturer microcode on recreated hardware represents a meaningful form of digital preservation. The 80386 was the foundation of the 32-bit PC era — it introduced protected mode to the mass market, enabled DOS extenders like DOS/4GW that powered an entire generation of games, and defined the x86 architecture that still runs most of the world’s servers and desktops today.

Projects like z386 create something that no amount of documentation can fully replace: a running, testable implementation that exposes how the original hardware actually behaved, not just how Intel said it would behave. Every bug found during bring-up — and there were plenty — teaches something about the original design that no datasheet captures. The open-source 80386 CPU also sits in a broader trend of hardware archaeology gaining serious momentum, from the open hardware community’s work on classic architectures to academic efforts to preserve and understand legacy silicon before the people who designed it are no longer around to answer questions.

The project isn’t done. z386 isn’t a perfect open-source 80386 CPU yet, and nand2mario has been explicit about that. But it boots DOS, it runs Doom, and it does so by making the original Intel microcode do the work. That’s not emulation. That’s resurrection.

Source: https://nand2mario.github.io/posts/2026/z386/

Muhammad Zayn Emad
Muhammad Zayn Emad
Hi! I am Zayn 21-year-old boy immersed in the world of blogging, I blend creativity with digital savvy. Hailing from a diverse background, I bring fresh perspectives to every post. Whether crafting compelling narratives or diving deep into niche topics, I strive to engage and inspire readers, making every word count.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular