retroDE_ps2/docs/decisions/0007-ee-core-reality-checkpoint.md

# 0007 — EE Core Reality Checkpoint (Ch306)

Status: Accepted
Date: 2026-05-28
Chapter: Ch306 (strategic recon / design — no RTL)
Supersedes: nothing. Companion to 0006-vram-roadmap.md.
Authors: lead architect, with Codex co-review.

---

## 1. Executive Summary

`rtl/ee/ee_core_stub.sv` (2155 lines) is **a behavioral compatibility oracle, not a CPU.**

It is an interpreter-style multicycle FSM that has been grown chapter-by-chapter (Ch67 → Ch305) to boot `qbert.elf` ~1.49M instructions deep by adding, one blocker at a time, exactly the opcodes, syscall HLE cases, MMIO stubs, and testbench-side pokes that the next blocker demanded. It has been extraordinarily productive *as a discovery instrument*: it told us precisely which 67 instruction behaviors a real PS2 game touches during boot, which syscalls the EE kernel must service, and which MMIO regions matter. That is its value, and that value is real.

But it is now load-bearing in a way it was never designed to be. The owner and Codex have called the key question correctly: **we are about to confuse the oracle for the deliverable.** The stub mixes three layers that real hardware keeps strictly separate (CPU / BIOS-kernel / async-hardware), and several of the things that make qbert "boot" are fabrications — a `$v0=1` longjmp fib (Ch215) that *created* the BIOS treadmill we then chased for 50 chapters, an `$a0`-aware bit-17 syscall return (Ch294/0x7A) that fakes an interrupt that never fired, and a testbench poke (Ch299) that writes `1` into qbert's private global from inside the TB.

**Go / No-Go on a synthesizable R5900 subset: GO, with caveats.**

A deliberately-scoped multicycle R5900 subset (fetch/decode, 32-bit ALU, load/store, branches + delay slots, HI/LO, and the existing gpr128/MMI subset) is **straightforwardly synthesizable on Agilex 5 (DE25-Nano)**. There are no language-level blockers in the current RTL, the microarchitecture is a clean synchronous `always_ff` FSM with handshaked memory ports, and ~63 of the 67 decoded behaviors graduate essentially as-is. The path is **bounded and validatable**. The danger is not technical intractability — it is *layer confusion*: letting oracle hacks leak into the real core.

This document splits the work into two explicit, permanently-separate tracks and defines the graduation path.

- **Track A — EE Behavioral Oracle**: keep `ee_core_stub` as a *discovery-only* instrument. Its output is a living opcode/syscall/MMIO checklist. It is never the CPU.
- **Track B — Synthesizable EE Core**: a new, clean core built to the checklist Track A produces, validated against the existing ~50 focused EE TBs (re-pointed for full-width semantics).

---

## 2. The Three-Layer Separation

Real PS2 hardware keeps three things in three places: the **CPU** executes instructions; the **BIOS/kernel ROM** services syscalls and implements `longjmp`/`_ReturnFromException`; **async hardware** (INTC / DMAC / GS / VBLANK / SIF) produces the events and flags that kernel code polls. The stub collapses all three into one FSM. The table below re-classifies every feature in the inventories by where it actually belongs.

| Stub feature | Layer | Graduates to Track B CPU? | Where it really belongs |
|---|---|---|---|
| SPECIAL ALU/shift/HILO set (SLL…SRAV, ADD…SLTU, MFHI/MFLO, MULTU, DIVU) | (a) CPU-architectural | **Yes** | CPU core |
| Immediate ALU (ADDI…LUI), branches (BEQ/BNE/BLEZ/BGTZ + REGIMM BLTZ/BGEZ + BEQL/BNEL), jumps (J/JAL/JR/JALR) | (a) CPU-architectural | **Yes** | CPU core |
| Loads/stores (LB/LH/LW/LBU/LHU + multi-beat LD/LQ/SD/SQ), SB/SH/SW | (a) CPU-architectural | **Yes** | CPU core |
| MMI subset (PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH) + gpr128 shadow | (a) CPU-architectural | **Yes** (if MMI in scope) | CPU core |
| COP0 MFC0/MTC0/RFE/EI, SYNC, CACHE | (a) CPU-architectural (partial) | **Yes** (needs widening) | CPU core; RFE↔ERET to reconcile |
| SYSCALL **exception-entry mechanism** (EPC / Cause.ExcCode=Sys / vector) | (a) CPU-architectural | **Yes** (the *mechanism* only) | CPU core |
| SYSCALL **$v1 case table** (0x3C EndOfHeap, 0x3D InitMainThread, 0x40, 0x64 FlushCache, 0x6B, 0x77, 0x78, 0x79, 0x13, 0x17, 0x16, 0x12) | (b) BIOS/kernel HLE | **No** | PS2 BIOS ROM, or a dedicated EE-kernel HLE companion module between CPU and memory map |
| Ch199 `_ReturnFromException(2)` RFE-on-syscall-8 shortcut | (b) BIOS/kernel HLE | **No** | BIOS kernel exception-return path (ROM). The status-stack pop is architectural; *selecting it by syscall number* is kernel behavior |
| Ch215 `jmp_buf` restore FSM (hardcoded base `0xA000B1E0`, 12-slot libc layout, forced `$v0=1`) | (b) BIOS/kernel HLE | **No** | BIOS ROM `longjmp()`. **This `$v0=1` fib is the documented source of the Ch215 treadmill (Ch269).** It is a workaround, not behavior |
| Syscall 0x7A `$a0`-aware bit-17 readiness return | (c) async-hardware stand-in | **No** | INTC/DMAC-completion/event delivery (real interrupt fires the flag). Labeled "Not architectural truth" |
| Ch299 TB-side library-ready poke (`useg_shadow_mem[0x4CA70]=1` on qbert-specific arg guard) | (c) async-hardware stand-in | **No** | Memory side effect of the RegisterLibraryEntries (0x77) kernel callback. **Most fragile, ship-blocking hack in the inventory** |
| Syscall 0x12/0x16 (Add/EnableDmacHandler) registration | (b) BIOS/kernel HLE → (c) | **No** | Kernel handler table; the *enable* arms real INTC/DMAC dispatch (unbuilt hardware) |
| Syscall default-case halt (`retired_flag_halt` → S_HALT, expose $v1/$a0-$a3) | (c) TB-only scaffolding | **No** | Diagnostic only; real CPU vectors to kernel |
| Trace port cluster (`ev_*` + `retired_*` shadows) | (c) TB-only scaffolding | **No** (strip) | Test instrumentation; no hardware counterpart |
| Per-syscall runner observers (snapshots, tuple tables, $a0 counters) | (c) TB-only scaffolding | **No** | Passive measurement; correct to live in the TB |
| BIOS reset-vector LUI/ORI/JR trampoline + ELF `$readmemh` loader | (c) TB-only scaffolding | **No** | Real BIOS boot + program loader |

**The crisp rule:** the CPU core contains *faithful instructions and the exception-entry mechanism, and nothing else.* Every syscall service moves to a BIOS/HLE companion. Every fabricated flag moves to the async-hardware layer (and until that hardware exists, it stays in the oracle/TB — never in the real core).

---

## 3. Track A — EE Behavioral Oracle

**Role: discovery only. This is `ee_core_stub` as it exists today, plus the ELF runner harness.**

Track A continues exactly as Ch67→Ch305 did: when a new game/BIOS path blocks, Track A finds out *why* and *what is missing*, cheaply, by adding the minimum stub behavior to push past the blocker. It is allowed to lie (the `$v0=1` fib, the bit-17 fake, the TB poke) because its job is to *map the territory*, not to be the territory.

**Output: a living checklist.** Track A's deliverable is not silicon — it is three growing lists:

1. **Opcode checklist** — every instruction a real workload touches, with required fidelity (see §6).
2. **Syscall checklist** — every EE kernel service number, its observed arg shape, and its required return contract.
3. **MMIO checklist** — every device region touched (DMAC global/per-channel, INTC, timers, GIF, SIF), with the access pattern.

These lists are the *specification* Track B builds to. Every entry on them is evidence-backed by a real boot trace, which is worth more than any datasheet table because it tells us what *actually matters* for the games we run.

**The one inviolable rule:** Track A output must **never be mistaken for the CPU.** Specifically:
- An oracle hack (`$v0=1`, bit-17, TB poke) is a *flag that hardware is missing*, not a feature to copy. When Track B implements the real mechanism, the corresponding oracle hack must be **backed out**, and a TB must prove the real mechanism produces the same observable result the hack faked.
- Any conclusion drawn "after the Ch215 shim fires" must be labeled "under jmp_buf fallback semantics" (per the Ch269 finding). Track A conclusions downstream of a known fib are suspect by construction.

---

## 4. Track B — Synthesizable EE Core

**A new, clean RTL core (`rtl/ee/ee_core.sv`, distinct from `ee_core_stub.sv`), built deliberately to the Track A checklist.**

### 4.1 The first synthesizable subset (concrete)

Scope the first Track B core to exactly what qbert boot proves is needed, and no more:

- **Fetch / decode / retire**: handshaked instruction fetch over the existing BIU/memory-map ports; fully combinational decode (the `is_*` assign pile is fine).
- **32-bit integer ALU**: SLL/SRL/SRA/SLLV/SRLV/SRAV, ADD/ADDU/SUB/SUBU, AND/OR/XOR/NOR, SLT/SLTU, all immediate forms (ADDI/ADDIU/SLTI/SLTIU/ANDI/ORI/LUI). **Add the Arithmetic Overflow trap** for ADD/SUB/ADDI (the stub defers it; a real core must trap, Cause.ExcCode=12).
- **HI/LO**: MFHI/MFLO/MTHI/MTLO, MULTU (infers DSP), and DIVU **as a multi-cycle iterative divider FSM** (not the combinational `/`+`%` — see §4.3).
- **Load/store**: LB/LH/LW/LBU/LHU/SB/SH/SW with AdEL/AdES alignment exceptions, plus multi-beat LD/LQ/SD/SQ via the proven `sq_beat` counter pattern.
- **Branches + delay slots**: BEQ/BNE/BLEZ/BGTZ, REGIMM BLTZ/BGEZ, branch-likely BEQL/BNEL (squash semantics), jumps J/JAL/JR/JALR. Keep the `branch_pending` latch model.
- **128-bit GPR + MMI subset**: `gpr128[0:31]` and PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH. **Gate this behind a parameter** (`EE_ENABLE_MMI`) so a minimal build can fall back to a 32×32 regfile and save ~4096 FFs.
- **COP0**: MFC0/MTC0 for the 5 modeled regs + the proper **exception-entry mechanism** (EPC save, Cause.ExcCode, BEV vectoring) and **ERET** (reconciled against the stub's R3000-style RFE — R5900 uses EXL/ERL/EPC). SYNC and CACHE are faithful no-ops on a cacheless in-order core.

**Explicitly out of the first subset:** the syscall $v1 table (moves to a BIOS/HLE companion fed by the real SYSCALL exception), COP0 64-bit upper lanes beyond what MMI needs, FPU/COP1, VU0/VU1 macro-mode, and full TLB. These are later chapters or separate tracks.

### 4.2 Recommended microarchitecture: **start multicycle/interpreter-style, pipeline later**

Keep the current 8-state FSM shape (S_IFETCH_REQ → S_IFETCH_WAIT → S_EXECUTE → optional S_MEM_*; drop the two Ch215 shim states). **Reasons:**

1. **It already synthesizes cleanly.** The synthesizability assessment is unambiguous: clean synchronous `always_ff`, handshaked ports, no latches (both `unique case` blocks carry defaults), constant-bound loops. There are *no language-level blockers*.
2. **It is the correct altitude for first-silicon correctness.** A multicycle core has no hazards, no forwarding, no branch prediction — delay slots are a single `branch_pending` latch. This is the smallest correct design, and correctness-first is the only sane order when the goal is "prove a real R5900 RTL works."
3. **Pipelining is a pure-performance follow-on**, addable once the multicycle core passes the full TB suite and boots qbert. The R5900 is a dual-issue in-order pipeline; that is a *known, bounded* later effort, not a prerequisite for graduation.
4. **It matches the proven `iop_core_stub` shape**, so the platform integration patterns already exist.

Minimum ~4 cycles/instruction is acceptable for bring-up. The DE25-Nano has the headroom.

### 4.3 What must be stripped / gated for synthesis

From the synthesizability assessment, ranked:

- **STRIP_HW_DIVIDER=1 is mandatory** for any fit. The inferred combinational divider is the documented ~32 ns STA critical path (Ch162). Track B must replace it with a **multi-cycle sequential divider FSM** if DIVU semantics are needed (they are — qbert uses it).
- **Strip the trace port cluster** (`ev_valid/ev_subsys/ev_event/ev_arg0-3/ev_flags` + the `retired_*` shadow registers + the divu/multu trace arms). These are pure observability (~4×64 + 32 + several 32-bit FFs of dead weight) that force the synthesizer to keep otherwise-dead arg-computation logic. Replace with a thin, optional debug-readout port if needed.
- **Gate the gpr128 shadow** (`EE_ENABLE_MMI`). 32×128 = 4096 FFs is the dominant flop cost and Quartus will build it in ALMs (async multi-port read), not M20K. Keep only if MMI/quadword is in scope.
- **The CH215 jmp_buf FSM and the EE_SYSCALL_HLE dispatcher do not enter Track B at all.** In the stub they are param-gated OFF; in Track B they are simply absent — they move to the BIOS/HLE companion.
- `unique case`, constant-bound for-loops: **keep** (not blockers; defaults prevent latches).

---

## 5. Validation Strategy

**The existing ~50 focused `tb_ee_core_*` benches + the qbert boot path ARE Track B's compliance harness.** This is the single strongest asset we have, and it directly answers the owner's worry.

### 5.1 Why the existing suite transfers

The compliance inventory confirms **all 50 focused TBs are reusable** with only mechanical adaptation. The uniform pattern is port-driven: each TB hand-assembles a tiny program into the BIOS/bootstrap slots, lets the DUT fetch/decode/execute *through the public memory-map ports* to a PASS-syscall halt, then checks results. **Step 2 (execution) is already fully port-driven — there are no internal pokes to make the core run.** Many TBs embed an in-program BNE/BEQ-to-FAIL self-check, so the expected architectural behavior is encoded in the program itself and is checkable purely from observable halt-PC/RAM. There is a strict 1:1 opcode→TB discipline (Ch271–Ch293), so **there are no implemented-but-untested opcodes.**

### 5.2 The two required adaptations (both mechanical, both bounded)

1. **Hierarchical-peek → architectural readout.** Most TBs read the *post-halt* result via `u_core.regfile[...]` (and `u_ee_ram.mem[]`/`u_bios.mem[]` for stores). Against a renamed/synthesized core these peeks break. Fix: change each test program to **store its result register to a known RAM/MMIO address** and read it back through the map port. This is a per-TB swap that does not change the encoded expected behavior. Store-class TBs (memops, sb, sh, sd, sq, lq, ld) already verify partly through `u_ee_ram.mem[]` and are closest to a real memory boundary.

2. **Stub-accurate golden values → architecture-accurate golden values.** Several TBs deliberately encode *simplified* semantics: DADDU/DSUBU/DSLL as low-32 only, and (per stale comments — actually now full-128 via gpr128) the SQ/SD/LQ/LD width expectations. Against a true 64/128-bit Track B core, the low-32 expectations would FAIL and **must be upgraded to full-width**. The TBs are reusable as scaffolding and as behavior encodings; their golden values need a width pass.

### 5.3 Known coverage gaps to close (new TBs for Track B)

- **gpr128 invariant**: add a dedicated TB asserting `gpr128[i][31:0] === regfile[i]` directly (today only transitive via PCPYUD/etc.).
- **COP0 exception state**: EPC save/restore, ERET, Cause.ExcCode encoding — no focused TB today beyond BEV and Count. This is the *most important* new TB, because the SYSCALL exception-entry mechanism is the CPU's only legitimate connection to the kernel.
- **Arithmetic Overflow trap** for ADD/SUB/ADDI (stub defers it; Track B implements it).
- **DI positive semantics** (today only a negative/still-trapping companion in tb_ee_core_ei).

### 5.4 Directly addressing the owner's worry

> *"Are we even able to verify a real R5900 RTL would work / model the hardware to finalize?"*

**Yes — and we are unusually well-positioned to, for three concrete reasons:**

1. **We have a behavioral golden model.** Track A (the stub) is, for the scoped subset, a working executable specification. Track B can be **co-simulated against Track A instruction-by-instruction**: run the same program through both, compare retire-by-retire (PC, GPR writeback, memory effects). Divergence is an immediate, localized bug report. This is the gold-standard CPU-verification methodology (lockstep against a reference model), and we already own the reference model.

2. **We have an evidence-backed requirements list.** We are not guessing what an R5900 needs — qbert's 1.49M-instruction boot trace *tells us* exactly the opcode/syscall/MMIO surface that matters. Track B's "done" is defined by a real workload, not a datasheet wishlist.

3. **We have a port-driven, near-complete compliance suite** (§5.1) that runs entirely through the public bus interface — i.e., it validates the core the same way the rest of the system will use it.

**The honest qualifier:** "verify a *real R5900*" means verify the *scoped subset we implement*, in lockstep against the oracle and the TB suite, booting the workloads we target. It does **not** mean bit-exact cycle-accuracy against Sony silicon (multiply/divide latency, dual-issue timing, cache timing are not modeled and are out of scope for first-silicon). For a "boots and runs the game correctly" goal — which is the project goal — that scope is sufficient and verifiable. For a "cycle-perfect deterministic netplay" goal it is not, and we should not pretend otherwise.

---

## 6. Master Opcode / Feature Checklist

This is the deliverable Codex asked for: every decoded behavior, its fidelity, whether it is synthesizable, and whether it graduates to the Track B CPU core.

| Mnemonic | Encoding | Fidelity | Synth | Graduates |
|---|---|---|---|---|
| SLL | SPECIAL 0x00 | faithful | yes | **Yes** |
| SRL | SPECIAL 0x02 | faithful | yes | **Yes** |
| SRA | SPECIAL 0x03 | faithful | yes | **Yes** |
| SLLV | SPECIAL 0x04 | faithful | yes | **Yes** |
| SRLV | SPECIAL 0x06 | faithful | yes | **Yes** |
| SRAV | SPECIAL 0x07 | faithful | yes | **Yes** |
| JR | SPECIAL 0x08 | faithful | yes | **Yes** |
| JALR | SPECIAL 0x09 | faithful | yes | **Yes** |
| SYSCALL | SPECIAL 0x0C | hle_or_shim | needs_work | **No** (only the exception-entry mechanism graduates; the $v1 table is kernel HLE) |
| SYNC | SPECIAL 0x0F | faithful | yes | **Yes** |
| MFHI | SPECIAL 0x10 | faithful | yes | **Yes** |
| MFLO | SPECIAL 0x12 | faithful | yes | **Yes** |
| MULTU | SPECIAL 0x19 | faithful | yes | **Yes** (infers DSP; latency not modeled) |
| DIVU | SPECIAL 0x1B | faithful | needs_work | **Yes** (needs multi-cycle iterative divider; STRIP_HW_DIVIDER for fit) |
| DSLL | SPECIAL 0x38 | low32_approx | yes | **Yes** (needs full 64-bit shifter + DSLL32) |
| ADD | SPECIAL 0x20 | faithful | yes | **Yes** (needs overflow trap, ExcCode 12) |
| ADDU | SPECIAL 0x21 | faithful | yes | **Yes** |
| DADDU | SPECIAL 0x2D | low32_approx | yes | **Yes** (needs full 64-bit adder) |
| SUB | SPECIAL 0x22 | faithful | yes | **Yes** (needs overflow trap) |
| SUBU | SPECIAL 0x23 | faithful | yes | **Yes** |
| DSUBU | SPECIAL 0x2F | low32_approx | yes | **Yes** (needs full 64-bit subtract) |
| AND | SPECIAL 0x24 | faithful | yes | **Yes** |
| OR | SPECIAL 0x25 | faithful | yes | **Yes** |
| XOR | SPECIAL 0x26 | faithful | yes | **Yes** |
| NOR | SPECIAL 0x27 | faithful | yes | **Yes** |
| SLT | SPECIAL 0x2A | faithful | yes | **Yes** |
| SLTU | SPECIAL 0x2B | faithful | yes | **Yes** |
| BLTZ | REGIMM rt=0x00 | faithful | yes | **Yes** (BLTZAL link variant not modeled) |
| BGEZ | REGIMM rt=0x01 | faithful | yes | **Yes** (BGEZAL link variant not modeled) |
| J | 0x02 | faithful | yes | **Yes** |
| JAL | 0x03 | faithful | yes | **Yes** |
| BEQ | 0x04 | faithful | yes | **Yes** |
| BNE | 0x05 | faithful | yes | **Yes** |
| BLEZ | 0x06 | faithful | yes | **Yes** |
| BGTZ | 0x07 | faithful | yes | **Yes** |
| ADDI | 0x08 | faithful | yes | **Yes** (needs overflow trap) |
| ADDIU | 0x09 | faithful | yes | **Yes** |
| SLTI | 0x0A | faithful | yes | **Yes** |
| SLTIU | 0x0B | faithful | yes | **Yes** |
| ANDI | 0x0C | faithful | yes | **Yes** |
| ORI | 0x0D | faithful | yes | **Yes** |
| LUI | 0x0F | faithful | yes | **Yes** |
| MFC0 | COP0 rs=0x00 | low32_approx | yes | **Yes** (only 5 regs modeled; Count at full clock vs half) |
| MTC0 | COP0 rs=0x04 | low32_approx | yes | **Yes** (partial Status/Cause fields; Count write dropped) |
| RFE | COP0/CO funct 0x10 | faithful | yes | **Yes** (reconcile vs R5900 ERET) |
| EI | COP0/CO 0x42000038 | low32_approx | yes | **Yes** (should set Status.EIE; companion DI still traps) |
| LB | 0x20 | faithful | yes | **Yes** |
| LH | 0x21 | faithful | yes | **Yes** |
| LW | 0x23 | faithful | yes | **Yes** |
| LBU | 0x24 | faithful | yes | **Yes** |
| LHU | 0x25 | faithful | yes | **Yes** |
| LD | 0x37 | faithful | yes | **Yes** (full 64-bit via gpr128) |
| LQ | 0x1E | faithful | yes | **Yes** (full 128-bit via gpr128) |
| SB | 0x28 | faithful | yes | **Yes** |
| SH | 0x29 | faithful | yes | **Yes** |
| SW | 0x2B | faithful | yes | **Yes** |
| SD | 0x3F | faithful | yes | **Yes** (full 64-bit via gpr128; stale "beat1=0" comments) |
| SQ | 0x1F | faithful | yes | **Yes** (full 128-bit via gpr128; stale "beats 1-3=0" comments) |
| CACHE | 0x2F | hle_or_shim | yes | **Yes** (no-op correct for cacheless model) |
| PCPYLD | MMI2 sa 0x0E | faithful | yes | **Yes** (full-128) |
| PSUBB | MMI0 sa 0x09 | faithful | yes | **Yes** (full-128, no cross-byte borrow) |
| PNOR | MMI3 sa 0x13 | faithful | yes | **Yes** (full-128) |
| PAND | MMI2 sa 0x12 | faithful | yes | **Yes** (full-128) |
| PCPYUD | MMI3 sa 0x0E | faithful | yes | **Yes** (reads upper 64; drove gpr128) |
| PCPYH | MMI3 sa 0x1B | faithful | yes | **Yes** (full-128 halfword broadcast) |
| BEQL | 0x14 | faithful | yes | **Yes** (branch-likely squash) |
| BNEL | 0x15 | faithful | yes | **Yes** (branch-likely squash) |
| NOP | 0x00000000 | faithful | yes | **Yes** |

**Tally: 63 of 67 decoded behaviors graduate to the Track B CPU core.** The 4 that do not: SYSCALL (only its exception-entry mechanism graduates; the $v1 table is kernel HLE) — and that is the only true non-graduate, since CACHE graduates as an accepted no-op. The genuinely-approximate-but-graduating ops are DADDU/DSUBU/DSLL (need full 64-bit datapath) and MFC0/MTC0/EI (need fuller COP0 coverage). **The MMI/128-bit infrastructure is the strongest, most faithful part of the stub and is genuinely synthesizable.**

---

## 7. Go / No-Go + Recommended Next Chapters

**Does a scoped R5900 subset fit Agilex 5 and pass the TBs? — YES, with the §4.3 caveats honored.**

- **Fit**: Agilex 5 has hundreds of K ALMs. The dominant cost is gpr128 (~4096 FFs) — wasteful but not fatal, and gateable. MULTU infers DSP (fine). The divider must be stripped/replaced. The trace cluster should be stripped. No structural blocker.
- **TBs**: passes in simulation against the stub as-is; a stripped/gated synthesis config (no trace, divider replaced, HLE absent) needs the §5.2 adaptations (architectural readout + full-width golden values) and the §5.3 new TBs. Hence **go_with_caveats**, not unqualified go.

### Track A — next chapters (discovery only)

- **Ch307**: Autopsy the next qbert wait loop (post-Ch294/0x7A unblock; the steady-state hot-PC, e.g. the suspected `0x00106154` region). Classify the gate (memory flag? MMIO poll? handler-fire?) the same way Ch294 did. **No RTL** — produce the checklist entry, not a hack, unless a one-shot stub is the cheapest way to see the *next* blocker.
- **Ch308 (A)**: Begin backing out fabrications into the async-hardware layer: replace the Ch299 TB poke with the real RegisterLibraryEntries (0x77) memory side effect, modeled in the HLE companion, and prove qbert still progresses. This *de-risks* Track B by validating the real mechanism in the cheap environment first.
- **Ch309 (A)**: Capture a full lockstep retire-trace export from the oracle for a fixed qbert prefix, to serve as Track B's co-sim golden reference (§5.4.1).

### Track B — next chapters (the real core)

- **Ch308 (B)**: Scaffold `rtl/ee/ee_core.sv` — a clean multicycle skeleton: fetch/decode/retire FSM + 32-bit ALU (SLL…SLTU, ADDI…LUI) + HI/LO + branches/delay slots. **Validate immediately against the existing ALU/shift/branch TBs** (tb_ee_core_shift, _varshift, _rtype_logic, _rtype_addu, _add_sub, _slt, _slti, _branch_zero, _jal, _jalr) re-pointed to architectural readout. No MMI, no MMIO, no syscalls yet.
- **Ch309 (B)**: Add load/store (LB…SW + multi-beat LD/LQ/SD/SQ) with AdEL/AdES, and the multi-cycle DIVU FSM (replacing the combinational divider). Validate against _memops, _lb/_lbu/_lh/_lhu/_sb/_sh, _ld/_lq/_sd/_sq, _align/_align_exc, _divu_mflo, _multu_mflo.
- **Ch310 (B)**: Add the COP0 exception-entry mechanism (EPC/Cause.ExcCode/BEV vectoring) + ERET, plus the gated gpr128/MMI subset. Add the **new** TBs: gpr128 invariant, COP0 exception state, overflow trap. Wire the SYSCALL exception to vector into the BIOS/HLE companion (not an internal $v1 switch). First lockstep co-sim run against the Ch309(A) golden trace.

---

## 8. Risks / Rabbit-Holes to Avoid

**Be honest about what could make this unrecoverable — and what is merely hard.**

1. **THE primary risk: conflating oracle hacks into the real core.** This is the single thing that turns a bounded project into an unrecoverable one. If the `$v0=1` fib, the bit-17 fake, or a syscall stub leaks into `ee_core.sv`, Track B becomes a second oracle wearing a CPU costume — and we will chase phantom "blockers" (the Ch264–Ch268 thunk-chain hunt is the cautionary tale: 5 chapters chasing a treadmill that was *our own shim*, per Ch269). **Mitigation: the §2 rule is non-negotiable — the CPU core contains faithful instructions + exception entry, full stop. Every backed-out hack gets a TB proving the real mechanism reproduces the faked result.**

2. **GS fillrate and VU0/VU1 parallelism are separate mountains — do not let them contaminate the EE-core decision.** The EE *integer/MMI core* is tractable and is what this document scopes. The GS (rasterizer fillrate, VRAM bandwidth — see 0006-vram-roadmap) and the VUs (two SIMD vector coprocessors with their own microcode, macro/micro mode, and tight EE coupling) are each *larger* than the EE core and have their own roadmaps. **Risk: scope creep that bundles "boot qbert's CPU code" with "render qbert's graphics." Keep them separate; the EE core graduating does NOT imply the frame renders.** FPU/COP1 is a smaller but real adjacent piece, also deferred.

3. **Cycle-accuracy ambition.** If the goal silently drifts from "boots and runs correctly" to "cycle-perfect," the project becomes unbounded (multiply/divide latency, dual-issue scheduling, cache timing, bus contention). **Mitigation: §5.4 names the scope explicitly. First-silicon is behavioral correctness, not timing fidelity.**

4. **The divider critical path.** Known, measured (~32 ns, Ch162), and already gated. The only risk is *forgetting* to replace it with a sequential FSM when DIVU semantics are required. Tracked as a Ch309(B) deliverable.

5. **TB golden-value drift.** Several TBs encode stub-accurate (low-32 / truncated) golden values. If Track B is validated against *unmodified* stub TBs, a correct full-width core will FAIL spuriously, or worse, a buggy core will PASS against a too-lax expectation. **Mitigation: the §5.2 width pass is a prerequisite, not an afterthought.**

6. **Hierarchical-peek brittleness.** Not unrecoverable, but if ignored it blocks the entire compliance suite from running against the new core. Mechanical (§5.2.1) but must be budgeted.

**Bottom line: the EE core itself is tractable, bounded, and validatable.** We have a golden behavioral model, an evidence-backed requirements list, and a near-complete port-driven compliance suite — three assets most from-scratch CPU projects never have. The path is not a rabbit hole *provided* we hold the layer separation. The unrecoverable scenarios all share one root cause — letting the oracle and the CPU be the same artifact. This document exists to make sure they never are.