Files

T

thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)

RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-29 20:10:50 -04:00

27 KiB

Raw Blame History

0007 — EE Core Reality Checkpoint (Ch306)

Status: Accepted Date: 2026-05-28 Chapter: Ch306 (strategic recon / design — no RTL) Supersedes: nothing. Companion to 0006-vram-roadmap.md. Authors: lead architect, with Codex co-review.

1. Executive Summary

rtl/ee/ee_core_stub.sv (2155 lines) is a behavioral compatibility oracle, not a CPU.

It is an interpreter-style multicycle FSM that has been grown chapter-by-chapter (Ch67 → Ch305) to boot qbert.elf ~1.49M instructions deep by adding, one blocker at a time, exactly the opcodes, syscall HLE cases, MMIO stubs, and testbench-side pokes that the next blocker demanded. It has been extraordinarily productive as a discovery instrument: it told us precisely which 67 instruction behaviors a real PS2 game touches during boot, which syscalls the EE kernel must service, and which MMIO regions matter. That is its value, and that value is real.

But it is now load-bearing in a way it was never designed to be. The owner and Codex have called the key question correctly: we are about to confuse the oracle for the deliverable. The stub mixes three layers that real hardware keeps strictly separate (CPU / BIOS-kernel / async-hardware), and several of the things that make qbert "boot" are fabrications — a $v0=1 longjmp fib (Ch215) that created the BIOS treadmill we then chased for 50 chapters, an $a0-aware bit-17 syscall return (Ch294/0x7A) that fakes an interrupt that never fired, and a testbench poke (Ch299) that writes 1 into qbert's private global from inside the TB.

Go / No-Go on a synthesizable R5900 subset: GO, with caveats.

A deliberately-scoped multicycle R5900 subset (fetch/decode, 32-bit ALU, load/store, branches + delay slots, HI/LO, and the existing gpr128/MMI subset) is straightforwardly synthesizable on Agilex 5 (DE25-Nano). There are no language-level blockers in the current RTL, the microarchitecture is a clean synchronous always_ff FSM with handshaked memory ports, and ~63 of the 67 decoded behaviors graduate essentially as-is. The path is bounded and validatable. The danger is not technical intractability — it is layer confusion: letting oracle hacks leak into the real core.

This document splits the work into two explicit, permanently-separate tracks and defines the graduation path.

Track A — EE Behavioral Oracle: keep ee_core_stub as a discovery-only instrument. Its output is a living opcode/syscall/MMIO checklist. It is never the CPU.
Track B — Synthesizable EE Core: a new, clean core built to the checklist Track A produces, validated against the existing ~50 focused EE TBs (re-pointed for full-width semantics).

2. The Three-Layer Separation

Real PS2 hardware keeps three things in three places: the CPU executes instructions; the BIOS/kernel ROM services syscalls and implements longjmp/_ReturnFromException; async hardware (INTC / DMAC / GS / VBLANK / SIF) produces the events and flags that kernel code polls. The stub collapses all three into one FSM. The table below re-classifies every feature in the inventories by where it actually belongs.

Stub feature	Layer	Graduates to Track B CPU?	Where it really belongs
SPECIAL ALU/shift/HILO set (SLL…SRAV, ADD…SLTU, MFHI/MFLO, MULTU, DIVU)	(a) CPU-architectural	Yes	CPU core
Immediate ALU (ADDI…LUI), branches (BEQ/BNE/BLEZ/BGTZ + REGIMM BLTZ/BGEZ + BEQL/BNEL), jumps (J/JAL/JR/JALR)	(a) CPU-architectural	Yes	CPU core
Loads/stores (LB/LH/LW/LBU/LHU + multi-beat LD/LQ/SD/SQ), SB/SH/SW	(a) CPU-architectural	Yes	CPU core
MMI subset (PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH) + gpr128 shadow	(a) CPU-architectural	Yes (if MMI in scope)	CPU core
COP0 MFC0/MTC0/RFE/EI, SYNC, CACHE	(a) CPU-architectural (partial)	Yes (needs widening)	CPU core; RFE↔ERET to reconcile
SYSCALL exception-entry mechanism (EPC / Cause.ExcCode=Sys / vector)	(a) CPU-architectural	Yes (the mechanism only)	CPU core
SYSCALL $v1 case table (0x3C EndOfHeap, 0x3D InitMainThread, 0x40, 0x64 FlushCache, 0x6B, 0x77, 0x78, 0x79, 0x13, 0x17, 0x16, 0x12)	(b) BIOS/kernel HLE	No	PS2 BIOS ROM, or a dedicated EE-kernel HLE companion module between CPU and memory map
Ch199 `_ReturnFromException(2)` RFE-on-syscall-8 shortcut	(b) BIOS/kernel HLE	No	BIOS kernel exception-return path (ROM). The status-stack pop is architectural; selecting it by syscall number is kernel behavior
Ch215 `jmp_buf` restore FSM (hardcoded base `0xA000B1E0`, 12-slot libc layout, forced `$v0=1`)	(b) BIOS/kernel HLE	No	BIOS ROM `longjmp()`. This `$v0=1` fib is the documented source of the Ch215 treadmill (Ch269). It is a workaround, not behavior
Syscall 0x7A `$a0`-aware bit-17 readiness return	(c) async-hardware stand-in	No	INTC/DMAC-completion/event delivery (real interrupt fires the flag). Labeled "Not architectural truth"
Ch299 TB-side library-ready poke (`useg_shadow_mem[0x4CA70]=1` on qbert-specific arg guard)	(c) async-hardware stand-in	No	Memory side effect of the RegisterLibraryEntries (0x77) kernel callback. Most fragile, ship-blocking hack in the inventory
Syscall 0x12/0x16 (Add/EnableDmacHandler) registration	(b) BIOS/kernel HLE → (c)	No	Kernel handler table; the enable arms real INTC/DMAC dispatch (unbuilt hardware)
Syscall default-case halt (`retired_flag_halt` → S_HALT, expose $v1/$a0-$a3)	(c) TB-only scaffolding	No	Diagnostic only; real CPU vectors to kernel
Trace port cluster (`ev_` + `retired_` shadows)	(c) TB-only scaffolding	No (strip)	Test instrumentation; no hardware counterpart
Per-syscall runner observers (snapshots, tuple tables, $a0 counters)	(c) TB-only scaffolding	No	Passive measurement; correct to live in the TB
BIOS reset-vector LUI/ORI/JR trampoline + ELF `$readmemh` loader	(c) TB-only scaffolding	No	Real BIOS boot + program loader

The crisp rule: the CPU core contains faithful instructions and the exception-entry mechanism, and nothing else. Every syscall service moves to a BIOS/HLE companion. Every fabricated flag moves to the async-hardware layer (and until that hardware exists, it stays in the oracle/TB — never in the real core).

3. Track A — EE Behavioral Oracle

Role: discovery only. This is ee_core_stub as it exists today, plus the ELF runner harness.

Track A continues exactly as Ch67→Ch305 did: when a new game/BIOS path blocks, Track A finds out why and what is missing, cheaply, by adding the minimum stub behavior to push past the blocker. It is allowed to lie (the $v0=1 fib, the bit-17 fake, the TB poke) because its job is to map the territory, not to be the territory.

Output: a living checklist. Track A's deliverable is not silicon — it is three growing lists:

Opcode checklist — every instruction a real workload touches, with required fidelity (see §6).
Syscall checklist — every EE kernel service number, its observed arg shape, and its required return contract.
MMIO checklist — every device region touched (DMAC global/per-channel, INTC, timers, GIF, SIF), with the access pattern.

These lists are the specification Track B builds to. Every entry on them is evidence-backed by a real boot trace, which is worth more than any datasheet table because it tells us what actually matters for the games we run.

The one inviolable rule: Track A output must never be mistaken for the CPU. Specifically:

An oracle hack ($v0=1, bit-17, TB poke) is a flag that hardware is missing, not a feature to copy. When Track B implements the real mechanism, the corresponding oracle hack must be backed out, and a TB must prove the real mechanism produces the same observable result the hack faked.
Any conclusion drawn "after the Ch215 shim fires" must be labeled "under jmp_buf fallback semantics" (per the Ch269 finding). Track A conclusions downstream of a known fib are suspect by construction.

4. Track B — Synthesizable EE Core

A new, clean RTL core (rtl/ee/ee_core.sv, distinct from ee_core_stub.sv), built deliberately to the Track A checklist.

4.1 The first synthesizable subset (concrete)

Scope the first Track B core to exactly what qbert boot proves is needed, and no more:

Fetch / decode / retire: handshaked instruction fetch over the existing BIU/memory-map ports; fully combinational decode (the is_* assign pile is fine).
32-bit integer ALU: SLL/SRL/SRA/SLLV/SRLV/SRAV, ADD/ADDU/SUB/SUBU, AND/OR/XOR/NOR, SLT/SLTU, all immediate forms (ADDI/ADDIU/SLTI/SLTIU/ANDI/ORI/LUI). Add the Arithmetic Overflow trap for ADD/SUB/ADDI (the stub defers it; a real core must trap, Cause.ExcCode=12).
HI/LO: MFHI/MFLO/MTHI/MTLO, MULTU (infers DSP), and DIVU as a multi-cycle iterative divider FSM (not the combinational /+% — see §4.3).
Load/store: LB/LH/LW/LBU/LHU/SB/SH/SW with AdEL/AdES alignment exceptions, plus multi-beat LD/LQ/SD/SQ via the proven sq_beat counter pattern.
Branches + delay slots: BEQ/BNE/BLEZ/BGTZ, REGIMM BLTZ/BGEZ, branch-likely BEQL/BNEL (squash semantics), jumps J/JAL/JR/JALR. Keep the branch_pending latch model.
128-bit GPR + MMI subset: gpr128[0:31] and PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH. Gate this behind a parameter (EE_ENABLE_MMI) so a minimal build can fall back to a 32×32 regfile and save ~4096 FFs.
COP0: MFC0/MTC0 for the 5 modeled regs + the proper exception-entry mechanism (EPC save, Cause.ExcCode, BEV vectoring) and ERET (reconciled against the stub's R3000-style RFE — R5900 uses EXL/ERL/EPC). SYNC and CACHE are faithful no-ops on a cacheless in-order core.

Explicitly out of the first subset: the syscall $v1 table (moves to a BIOS/HLE companion fed by the real SYSCALL exception), COP0 64-bit upper lanes beyond what MMI needs, FPU/COP1, VU0/VU1 macro-mode, and full TLB. These are later chapters or separate tracks.

4.2 Recommended microarchitecture: start multicycle/interpreter-style, pipeline later

Keep the current 8-state FSM shape (S_IFETCH_REQ → S_IFETCH_WAIT → S_EXECUTE → optional S_MEM_*; drop the two Ch215 shim states). Reasons:

It already synthesizes cleanly. The synthesizability assessment is unambiguous: clean synchronous always_ff, handshaked ports, no latches (both unique case blocks carry defaults), constant-bound loops. There are no language-level blockers.
It is the correct altitude for first-silicon correctness. A multicycle core has no hazards, no forwarding, no branch prediction — delay slots are a single branch_pending latch. This is the smallest correct design, and correctness-first is the only sane order when the goal is "prove a real R5900 RTL works."
Pipelining is a pure-performance follow-on, addable once the multicycle core passes the full TB suite and boots qbert. The R5900 is a dual-issue in-order pipeline; that is a known, bounded later effort, not a prerequisite for graduation.
It matches the proven iop_core_stub shape, so the platform integration patterns already exist.

Minimum ~4 cycles/instruction is acceptable for bring-up. The DE25-Nano has the headroom.

4.3 What must be stripped / gated for synthesis

From the synthesizability assessment, ranked:

STRIP_HW_DIVIDER=1 is mandatory for any fit. The inferred combinational divider is the documented ~32 ns STA critical path (Ch162). Track B must replace it with a multi-cycle sequential divider FSM if DIVU semantics are needed (they are — qbert uses it).
Strip the trace port cluster (ev_valid/ev_subsys/ev_event/ev_arg0-3/ev_flags + the retired_* shadow registers + the divu/multu trace arms). These are pure observability (~4×64 + 32 + several 32-bit FFs of dead weight) that force the synthesizer to keep otherwise-dead arg-computation logic. Replace with a thin, optional debug-readout port if needed.
Gate the gpr128 shadow (EE_ENABLE_MMI). 32×128 = 4096 FFs is the dominant flop cost and Quartus will build it in ALMs (async multi-port read), not M20K. Keep only if MMI/quadword is in scope.
The CH215 jmp_buf FSM and the EE_SYSCALL_HLE dispatcher do not enter Track B at all. In the stub they are param-gated OFF; in Track B they are simply absent — they move to the BIOS/HLE companion.
unique case, constant-bound for-loops: keep (not blockers; defaults prevent latches).

5. Validation Strategy

The existing ~50 focused tb_ee_core_* benches + the qbert boot path ARE Track B's compliance harness. This is the single strongest asset we have, and it directly answers the owner's worry.

5.1 Why the existing suite transfers

The compliance inventory confirms all 50 focused TBs are reusable with only mechanical adaptation. The uniform pattern is port-driven: each TB hand-assembles a tiny program into the BIOS/bootstrap slots, lets the DUT fetch/decode/execute through the public memory-map ports to a PASS-syscall halt, then checks results. Step 2 (execution) is already fully port-driven — there are no internal pokes to make the core run. Many TBs embed an in-program BNE/BEQ-to-FAIL self-check, so the expected architectural behavior is encoded in the program itself and is checkable purely from observable halt-PC/RAM. There is a strict 1:1 opcode→TB discipline (Ch271–Ch293), so there are no implemented-but-untested opcodes.

5.2 The two required adaptations (both mechanical, both bounded)

Hierarchical-peek → architectural readout. Most TBs read the post-halt result via u_core.regfile[...] (and u_ee_ram.mem[]/u_bios.mem[] for stores). Against a renamed/synthesized core these peeks break. Fix: change each test program to store its result register to a known RAM/MMIO address and read it back through the map port. This is a per-TB swap that does not change the encoded expected behavior. Store-class TBs (memops, sb, sh, sd, sq, lq, ld) already verify partly through u_ee_ram.mem[] and are closest to a real memory boundary.
Stub-accurate golden values → architecture-accurate golden values. Several TBs deliberately encode simplified semantics: DADDU/DSUBU/DSLL as low-32 only, and (per stale comments — actually now full-128 via gpr128) the SQ/SD/LQ/LD width expectations. Against a true 64/128-bit Track B core, the low-32 expectations would FAIL and must be upgraded to full-width. The TBs are reusable as scaffolding and as behavior encodings; their golden values need a width pass.

5.3 Known coverage gaps to close (new TBs for Track B)

gpr128 invariant: add a dedicated TB asserting gpr128[i][31:0] === regfile[i] directly (today only transitive via PCPYUD/etc.).
COP0 exception state: EPC save/restore, ERET, Cause.ExcCode encoding — no focused TB today beyond BEV and Count. This is the most important new TB, because the SYSCALL exception-entry mechanism is the CPU's only legitimate connection to the kernel.
Arithmetic Overflow trap for ADD/SUB/ADDI (stub defers it; Track B implements it).
DI positive semantics (today only a negative/still-trapping companion in tb_ee_core_ei).

5.4 Directly addressing the owner's worry

"Are we even able to verify a real R5900 RTL would work / model the hardware to finalize?"

Yes — and we are unusually well-positioned to, for three concrete reasons:

We have a behavioral golden model. Track A (the stub) is, for the scoped subset, a working executable specification. Track B can be co-simulated against Track A instruction-by-instruction: run the same program through both, compare retire-by-retire (PC, GPR writeback, memory effects). Divergence is an immediate, localized bug report. This is the gold-standard CPU-verification methodology (lockstep against a reference model), and we already own the reference model.
We have an evidence-backed requirements list. We are not guessing what an R5900 needs — qbert's 1.49M-instruction boot trace tells us exactly the opcode/syscall/MMIO surface that matters. Track B's "done" is defined by a real workload, not a datasheet wishlist.
We have a port-driven, near-complete compliance suite (§5.1) that runs entirely through the public bus interface — i.e., it validates the core the same way the rest of the system will use it.

The honest qualifier: "verify a real R5900" means verify the scoped subset we implement, in lockstep against the oracle and the TB suite, booting the workloads we target. It does not mean bit-exact cycle-accuracy against Sony silicon (multiply/divide latency, dual-issue timing, cache timing are not modeled and are out of scope for first-silicon). For a "boots and runs the game correctly" goal — which is the project goal — that scope is sufficient and verifiable. For a "cycle-perfect deterministic netplay" goal it is not, and we should not pretend otherwise.

6. Master Opcode / Feature Checklist

This is the deliverable Codex asked for: every decoded behavior, its fidelity, whether it is synthesizable, and whether it graduates to the Track B CPU core.

Mnemonic	Encoding	Fidelity	Synth	Graduates
SLL	SPECIAL 0x00	faithful	yes	Yes
SRL	SPECIAL 0x02	faithful	yes	Yes
SRA	SPECIAL 0x03	faithful	yes	Yes
SLLV	SPECIAL 0x04	faithful	yes	Yes
SRLV	SPECIAL 0x06	faithful	yes	Yes
SRAV	SPECIAL 0x07	faithful	yes	Yes
JR	SPECIAL 0x08	faithful	yes	Yes
JALR	SPECIAL 0x09	faithful	yes	Yes
SYSCALL	SPECIAL 0x0C	hle_or_shim	needs_work	No (only the exception-entry mechanism graduates; the $v1 table is kernel HLE)
SYNC	SPECIAL 0x0F	faithful	yes	Yes
MFHI	SPECIAL 0x10	faithful	yes	Yes
MFLO	SPECIAL 0x12	faithful	yes	Yes
MULTU	SPECIAL 0x19	faithful	yes	Yes (infers DSP; latency not modeled)
DIVU	SPECIAL 0x1B	faithful	needs_work	Yes (needs multi-cycle iterative divider; STRIP_HW_DIVIDER for fit)
DSLL	SPECIAL 0x38	low32_approx	yes	Yes (needs full 64-bit shifter + DSLL32)
ADD	SPECIAL 0x20	faithful	yes	Yes (needs overflow trap, ExcCode 12)
ADDU	SPECIAL 0x21	faithful	yes	Yes
DADDU	SPECIAL 0x2D	low32_approx	yes	Yes (needs full 64-bit adder)
SUB	SPECIAL 0x22	faithful	yes	Yes (needs overflow trap)
SUBU	SPECIAL 0x23	faithful	yes	Yes
DSUBU	SPECIAL 0x2F	low32_approx	yes	Yes (needs full 64-bit subtract)
AND	SPECIAL 0x24	faithful	yes	Yes
OR	SPECIAL 0x25	faithful	yes	Yes
XOR	SPECIAL 0x26	faithful	yes	Yes
NOR	SPECIAL 0x27	faithful	yes	Yes
SLT	SPECIAL 0x2A	faithful	yes	Yes
SLTU	SPECIAL 0x2B	faithful	yes	Yes
BLTZ	REGIMM rt=0x00	faithful	yes	Yes (BLTZAL link variant not modeled)
BGEZ	REGIMM rt=0x01	faithful	yes	Yes (BGEZAL link variant not modeled)
J	0x02	faithful	yes	Yes
JAL	0x03	faithful	yes	Yes
BEQ	0x04	faithful	yes	Yes
BNE	0x05	faithful	yes	Yes
BLEZ	0x06	faithful	yes	Yes
BGTZ	0x07	faithful	yes	Yes
ADDI	0x08	faithful	yes	Yes (needs overflow trap)
ADDIU	0x09	faithful	yes	Yes
SLTI	0x0A	faithful	yes	Yes
SLTIU	0x0B	faithful	yes	Yes
ANDI	0x0C	faithful	yes	Yes
ORI	0x0D	faithful	yes	Yes
LUI	0x0F	faithful	yes	Yes
MFC0	COP0 rs=0x00	low32_approx	yes	Yes (only 5 regs modeled; Count at full clock vs half)
MTC0	COP0 rs=0x04	low32_approx	yes	Yes (partial Status/Cause fields; Count write dropped)
RFE	COP0/CO funct 0x10	faithful	yes	Yes (reconcile vs R5900 ERET)
EI	COP0/CO 0x42000038	low32_approx	yes	Yes (should set Status.EIE; companion DI still traps)
LB	0x20	faithful	yes	Yes
LH	0x21	faithful	yes	Yes
LW	0x23	faithful	yes	Yes
LBU	0x24	faithful	yes	Yes
LHU	0x25	faithful	yes	Yes
LD	0x37	faithful	yes	Yes (full 64-bit via gpr128)
LQ	0x1E	faithful	yes	Yes (full 128-bit via gpr128)
SB	0x28	faithful	yes	Yes
SH	0x29	faithful	yes	Yes
SW	0x2B	faithful	yes	Yes
SD	0x3F	faithful	yes	Yes (full 64-bit via gpr128; stale "beat1=0" comments)
SQ	0x1F	faithful	yes	Yes (full 128-bit via gpr128; stale "beats 1-3=0" comments)
CACHE	0x2F	hle_or_shim	yes	Yes (no-op correct for cacheless model)
PCPYLD	MMI2 sa 0x0E	faithful	yes	Yes (full-128)
PSUBB	MMI0 sa 0x09	faithful	yes	Yes (full-128, no cross-byte borrow)
PNOR	MMI3 sa 0x13	faithful	yes	Yes (full-128)
PAND	MMI2 sa 0x12	faithful	yes	Yes (full-128)
PCPYUD	MMI3 sa 0x0E	faithful	yes	Yes (reads upper 64; drove gpr128)
PCPYH	MMI3 sa 0x1B	faithful	yes	Yes (full-128 halfword broadcast)
BEQL	0x14	faithful	yes	Yes (branch-likely squash)
BNEL	0x15	faithful	yes	Yes (branch-likely squash)
NOP	0x00000000	faithful	yes	Yes

Tally: 63 of 67 decoded behaviors graduate to the Track B CPU core. The 4 that do not: SYSCALL (only its exception-entry mechanism graduates; the $v1 table is kernel HLE) — and that is the only true non-graduate, since CACHE graduates as an accepted no-op. The genuinely-approximate-but-graduating ops are DADDU/DSUBU/DSLL (need full 64-bit datapath) and MFC0/MTC0/EI (need fuller COP0 coverage). The MMI/128-bit infrastructure is the strongest, most faithful part of the stub and is genuinely synthesizable.

7. Go / No-Go + Recommended Next Chapters

Does a scoped R5900 subset fit Agilex 5 and pass the TBs? — YES, with the §4.3 caveats honored.

Fit: Agilex 5 has hundreds of K ALMs. The dominant cost is gpr128 (~4096 FFs) — wasteful but not fatal, and gateable. MULTU infers DSP (fine). The divider must be stripped/replaced. The trace cluster should be stripped. No structural blocker.
TBs: passes in simulation against the stub as-is; a stripped/gated synthesis config (no trace, divider replaced, HLE absent) needs the §5.2 adaptations (architectural readout + full-width golden values) and the §5.3 new TBs. Hence go_with_caveats, not unqualified go.

Track A — next chapters (discovery only)

Ch307: Autopsy the next qbert wait loop (post-Ch294/0x7A unblock; the steady-state hot-PC, e.g. the suspected 0x00106154 region). Classify the gate (memory flag? MMIO poll? handler-fire?) the same way Ch294 did. No RTL — produce the checklist entry, not a hack, unless a one-shot stub is the cheapest way to see the next blocker.
Ch308 (A): Begin backing out fabrications into the async-hardware layer: replace the Ch299 TB poke with the real RegisterLibraryEntries (0x77) memory side effect, modeled in the HLE companion, and prove qbert still progresses. This de-risks Track B by validating the real mechanism in the cheap environment first.
Ch309 (A): Capture a full lockstep retire-trace export from the oracle for a fixed qbert prefix, to serve as Track B's co-sim golden reference (§5.4.1).

Track B — next chapters (the real core)

Ch308 (B): Scaffold rtl/ee/ee_core.sv — a clean multicycle skeleton: fetch/decode/retire FSM + 32-bit ALU (SLL…SLTU, ADDI…LUI) + HI/LO + branches/delay slots. Validate immediately against the existing ALU/shift/branch TBs (tb_ee_core_shift, _varshift, _rtype_logic, _rtype_addu, _add_sub, _slt, _slti, _branch_zero, _jal, _jalr) re-pointed to architectural readout. No MMI, no MMIO, no syscalls yet.
Ch309 (B): Add load/store (LB…SW + multi-beat LD/LQ/SD/SQ) with AdEL/AdES, and the multi-cycle DIVU FSM (replacing the combinational divider). Validate against _memops, _lb/_lbu/_lh/_lhu/_sb/_sh, _ld/_lq/_sd/_sq, _align/_align_exc, _divu_mflo, _multu_mflo.
Ch310 (B): Add the COP0 exception-entry mechanism (EPC/Cause.ExcCode/BEV vectoring) + ERET, plus the gated gpr128/MMI subset. Add the new TBs: gpr128 invariant, COP0 exception state, overflow trap. Wire the SYSCALL exception to vector into the BIOS/HLE companion (not an internal $v1 switch). First lockstep co-sim run against the Ch309(A) golden trace.

8. Risks / Rabbit-Holes to Avoid

Be honest about what could make this unrecoverable — and what is merely hard.

THE primary risk: conflating oracle hacks into the real core. This is the single thing that turns a bounded project into an unrecoverable one. If the $v0=1 fib, the bit-17 fake, or a syscall stub leaks into ee_core.sv, Track B becomes a second oracle wearing a CPU costume — and we will chase phantom "blockers" (the Ch264–Ch268 thunk-chain hunt is the cautionary tale: 5 chapters chasing a treadmill that was our own shim, per Ch269). Mitigation: the §2 rule is non-negotiable — the CPU core contains faithful instructions + exception entry, full stop. Every backed-out hack gets a TB proving the real mechanism reproduces the faked result.
GS fillrate and VU0/VU1 parallelism are separate mountains — do not let them contaminate the EE-core decision. The EE integer/MMI core is tractable and is what this document scopes. The GS (rasterizer fillrate, VRAM bandwidth — see 0006-vram-roadmap) and the VUs (two SIMD vector coprocessors with their own microcode, macro/micro mode, and tight EE coupling) are each larger than the EE core and have their own roadmaps. Risk: scope creep that bundles "boot qbert's CPU code" with "render qbert's graphics." Keep them separate; the EE core graduating does NOT imply the frame renders. FPU/COP1 is a smaller but real adjacent piece, also deferred.
Cycle-accuracy ambition. If the goal silently drifts from "boots and runs correctly" to "cycle-perfect," the project becomes unbounded (multiply/divide latency, dual-issue scheduling, cache timing, bus contention). Mitigation: §5.4 names the scope explicitly. First-silicon is behavioral correctness, not timing fidelity.
The divider critical path. Known, measured (~32 ns, Ch162), and already gated. The only risk is forgetting to replace it with a sequential FSM when DIVU semantics are required. Tracked as a Ch309(B) deliverable.
TB golden-value drift. Several TBs encode stub-accurate (low-32 / truncated) golden values. If Track B is validated against unmodified stub TBs, a correct full-width core will FAIL spuriously, or worse, a buggy core will PASS against a too-lax expectation. Mitigation: the §5.2 width pass is a prerequisite, not an afterthought.
Hierarchical-peek brittleness. Not unrecoverable, but if ignored it blocks the entire compliance suite from running against the new core. Mechanical (§5.2.1) but must be budgeted.

Bottom line: the EE core itself is tractable, bounded, and validatable. We have a golden behavioral model, an evidence-backed requirements list, and a near-complete port-driven compliance suite — three assets most from-scratch CPU projects never have. The path is not a rabbit hole provided we hold the layer separation. The unrecoverable scenarios all share one root cause — letting the oracle and the CPU be the same artifact. This document exists to make sure they never are.

27 KiB Raw Blame History Unescape Escape