ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
272 lines
27 KiB
Markdown
272 lines
27 KiB
Markdown
# 0007 — EE Core Reality Checkpoint (Ch306)
|
||
|
||
Status: Accepted
|
||
Date: 2026-05-28
|
||
Chapter: Ch306 (strategic recon / design — no RTL)
|
||
Supersedes: nothing. Companion to 0006-vram-roadmap.md.
|
||
Authors: lead architect, with Codex co-review.
|
||
|
||
---
|
||
|
||
## 1. Executive Summary
|
||
|
||
`rtl/ee/ee_core_stub.sv` (2155 lines) is **a behavioral compatibility oracle, not a CPU.**
|
||
|
||
It is an interpreter-style multicycle FSM that has been grown chapter-by-chapter (Ch67 → Ch305) to boot `qbert.elf` ~1.49M instructions deep by adding, one blocker at a time, exactly the opcodes, syscall HLE cases, MMIO stubs, and testbench-side pokes that the next blocker demanded. It has been extraordinarily productive *as a discovery instrument*: it told us precisely which 67 instruction behaviors a real PS2 game touches during boot, which syscalls the EE kernel must service, and which MMIO regions matter. That is its value, and that value is real.
|
||
|
||
But it is now load-bearing in a way it was never designed to be. The owner and Codex have called the key question correctly: **we are about to confuse the oracle for the deliverable.** The stub mixes three layers that real hardware keeps strictly separate (CPU / BIOS-kernel / async-hardware), and several of the things that make qbert "boot" are fabrications — a `$v0=1` longjmp fib (Ch215) that *created* the BIOS treadmill we then chased for 50 chapters, an `$a0`-aware bit-17 syscall return (Ch294/0x7A) that fakes an interrupt that never fired, and a testbench poke (Ch299) that writes `1` into qbert's private global from inside the TB.
|
||
|
||
**Go / No-Go on a synthesizable R5900 subset: GO, with caveats.**
|
||
|
||
A deliberately-scoped multicycle R5900 subset (fetch/decode, 32-bit ALU, load/store, branches + delay slots, HI/LO, and the existing gpr128/MMI subset) is **straightforwardly synthesizable on Agilex 5 (DE25-Nano)**. There are no language-level blockers in the current RTL, the microarchitecture is a clean synchronous `always_ff` FSM with handshaked memory ports, and ~63 of the 67 decoded behaviors graduate essentially as-is. The path is **bounded and validatable**. The danger is not technical intractability — it is *layer confusion*: letting oracle hacks leak into the real core.
|
||
|
||
This document splits the work into two explicit, permanently-separate tracks and defines the graduation path.
|
||
|
||
- **Track A — EE Behavioral Oracle**: keep `ee_core_stub` as a *discovery-only* instrument. Its output is a living opcode/syscall/MMIO checklist. It is never the CPU.
|
||
- **Track B — Synthesizable EE Core**: a new, clean core built to the checklist Track A produces, validated against the existing ~50 focused EE TBs (re-pointed for full-width semantics).
|
||
|
||
---
|
||
|
||
## 2. The Three-Layer Separation
|
||
|
||
Real PS2 hardware keeps three things in three places: the **CPU** executes instructions; the **BIOS/kernel ROM** services syscalls and implements `longjmp`/`_ReturnFromException`; **async hardware** (INTC / DMAC / GS / VBLANK / SIF) produces the events and flags that kernel code polls. The stub collapses all three into one FSM. The table below re-classifies every feature in the inventories by where it actually belongs.
|
||
|
||
| Stub feature | Layer | Graduates to Track B CPU? | Where it really belongs |
|
||
|---|---|---|---|
|
||
| SPECIAL ALU/shift/HILO set (SLL…SRAV, ADD…SLTU, MFHI/MFLO, MULTU, DIVU) | (a) CPU-architectural | **Yes** | CPU core |
|
||
| Immediate ALU (ADDI…LUI), branches (BEQ/BNE/BLEZ/BGTZ + REGIMM BLTZ/BGEZ + BEQL/BNEL), jumps (J/JAL/JR/JALR) | (a) CPU-architectural | **Yes** | CPU core |
|
||
| Loads/stores (LB/LH/LW/LBU/LHU + multi-beat LD/LQ/SD/SQ), SB/SH/SW | (a) CPU-architectural | **Yes** | CPU core |
|
||
| MMI subset (PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH) + gpr128 shadow | (a) CPU-architectural | **Yes** (if MMI in scope) | CPU core |
|
||
| COP0 MFC0/MTC0/RFE/EI, SYNC, CACHE | (a) CPU-architectural (partial) | **Yes** (needs widening) | CPU core; RFE↔ERET to reconcile |
|
||
| SYSCALL **exception-entry mechanism** (EPC / Cause.ExcCode=Sys / vector) | (a) CPU-architectural | **Yes** (the *mechanism* only) | CPU core |
|
||
| SYSCALL **$v1 case table** (0x3C EndOfHeap, 0x3D InitMainThread, 0x40, 0x64 FlushCache, 0x6B, 0x77, 0x78, 0x79, 0x13, 0x17, 0x16, 0x12) | (b) BIOS/kernel HLE | **No** | PS2 BIOS ROM, or a dedicated EE-kernel HLE companion module between CPU and memory map |
|
||
| Ch199 `_ReturnFromException(2)` RFE-on-syscall-8 shortcut | (b) BIOS/kernel HLE | **No** | BIOS kernel exception-return path (ROM). The status-stack pop is architectural; *selecting it by syscall number* is kernel behavior |
|
||
| Ch215 `jmp_buf` restore FSM (hardcoded base `0xA000B1E0`, 12-slot libc layout, forced `$v0=1`) | (b) BIOS/kernel HLE | **No** | BIOS ROM `longjmp()`. **This `$v0=1` fib is the documented source of the Ch215 treadmill (Ch269).** It is a workaround, not behavior |
|
||
| Syscall 0x7A `$a0`-aware bit-17 readiness return | (c) async-hardware stand-in | **No** | INTC/DMAC-completion/event delivery (real interrupt fires the flag). Labeled "Not architectural truth" |
|
||
| Ch299 TB-side library-ready poke (`useg_shadow_mem[0x4CA70]=1` on qbert-specific arg guard) | (c) async-hardware stand-in | **No** | Memory side effect of the RegisterLibraryEntries (0x77) kernel callback. **Most fragile, ship-blocking hack in the inventory** |
|
||
| Syscall 0x12/0x16 (Add/EnableDmacHandler) registration | (b) BIOS/kernel HLE → (c) | **No** | Kernel handler table; the *enable* arms real INTC/DMAC dispatch (unbuilt hardware) |
|
||
| Syscall default-case halt (`retired_flag_halt` → S_HALT, expose $v1/$a0-$a3) | (c) TB-only scaffolding | **No** | Diagnostic only; real CPU vectors to kernel |
|
||
| Trace port cluster (`ev_*` + `retired_*` shadows) | (c) TB-only scaffolding | **No** (strip) | Test instrumentation; no hardware counterpart |
|
||
| Per-syscall runner observers (snapshots, tuple tables, $a0 counters) | (c) TB-only scaffolding | **No** | Passive measurement; correct to live in the TB |
|
||
| BIOS reset-vector LUI/ORI/JR trampoline + ELF `$readmemh` loader | (c) TB-only scaffolding | **No** | Real BIOS boot + program loader |
|
||
|
||
**The crisp rule:** the CPU core contains *faithful instructions and the exception-entry mechanism, and nothing else.* Every syscall service moves to a BIOS/HLE companion. Every fabricated flag moves to the async-hardware layer (and until that hardware exists, it stays in the oracle/TB — never in the real core).
|
||
|
||
---
|
||
|
||
## 3. Track A — EE Behavioral Oracle
|
||
|
||
**Role: discovery only. This is `ee_core_stub` as it exists today, plus the ELF runner harness.**
|
||
|
||
Track A continues exactly as Ch67→Ch305 did: when a new game/BIOS path blocks, Track A finds out *why* and *what is missing*, cheaply, by adding the minimum stub behavior to push past the blocker. It is allowed to lie (the `$v0=1` fib, the bit-17 fake, the TB poke) because its job is to *map the territory*, not to be the territory.
|
||
|
||
**Output: a living checklist.** Track A's deliverable is not silicon — it is three growing lists:
|
||
|
||
1. **Opcode checklist** — every instruction a real workload touches, with required fidelity (see §6).
|
||
2. **Syscall checklist** — every EE kernel service number, its observed arg shape, and its required return contract.
|
||
3. **MMIO checklist** — every device region touched (DMAC global/per-channel, INTC, timers, GIF, SIF), with the access pattern.
|
||
|
||
These lists are the *specification* Track B builds to. Every entry on them is evidence-backed by a real boot trace, which is worth more than any datasheet table because it tells us what *actually matters* for the games we run.
|
||
|
||
**The one inviolable rule:** Track A output must **never be mistaken for the CPU.** Specifically:
|
||
- An oracle hack (`$v0=1`, bit-17, TB poke) is a *flag that hardware is missing*, not a feature to copy. When Track B implements the real mechanism, the corresponding oracle hack must be **backed out**, and a TB must prove the real mechanism produces the same observable result the hack faked.
|
||
- Any conclusion drawn "after the Ch215 shim fires" must be labeled "under jmp_buf fallback semantics" (per the Ch269 finding). Track A conclusions downstream of a known fib are suspect by construction.
|
||
|
||
---
|
||
|
||
## 4. Track B — Synthesizable EE Core
|
||
|
||
**A new, clean RTL core (`rtl/ee/ee_core.sv`, distinct from `ee_core_stub.sv`), built deliberately to the Track A checklist.**
|
||
|
||
### 4.1 The first synthesizable subset (concrete)
|
||
|
||
Scope the first Track B core to exactly what qbert boot proves is needed, and no more:
|
||
|
||
- **Fetch / decode / retire**: handshaked instruction fetch over the existing BIU/memory-map ports; fully combinational decode (the `is_*` assign pile is fine).
|
||
- **32-bit integer ALU**: SLL/SRL/SRA/SLLV/SRLV/SRAV, ADD/ADDU/SUB/SUBU, AND/OR/XOR/NOR, SLT/SLTU, all immediate forms (ADDI/ADDIU/SLTI/SLTIU/ANDI/ORI/LUI). **Add the Arithmetic Overflow trap** for ADD/SUB/ADDI (the stub defers it; a real core must trap, Cause.ExcCode=12).
|
||
- **HI/LO**: MFHI/MFLO/MTHI/MTLO, MULTU (infers DSP), and DIVU **as a multi-cycle iterative divider FSM** (not the combinational `/`+`%` — see §4.3).
|
||
- **Load/store**: LB/LH/LW/LBU/LHU/SB/SH/SW with AdEL/AdES alignment exceptions, plus multi-beat LD/LQ/SD/SQ via the proven `sq_beat` counter pattern.
|
||
- **Branches + delay slots**: BEQ/BNE/BLEZ/BGTZ, REGIMM BLTZ/BGEZ, branch-likely BEQL/BNEL (squash semantics), jumps J/JAL/JR/JALR. Keep the `branch_pending` latch model.
|
||
- **128-bit GPR + MMI subset**: `gpr128[0:31]` and PCPYLD/PSUBB/PNOR/PAND/PCPYUD/PCPYH. **Gate this behind a parameter** (`EE_ENABLE_MMI`) so a minimal build can fall back to a 32×32 regfile and save ~4096 FFs.
|
||
- **COP0**: MFC0/MTC0 for the 5 modeled regs + the proper **exception-entry mechanism** (EPC save, Cause.ExcCode, BEV vectoring) and **ERET** (reconciled against the stub's R3000-style RFE — R5900 uses EXL/ERL/EPC). SYNC and CACHE are faithful no-ops on a cacheless in-order core.
|
||
|
||
**Explicitly out of the first subset:** the syscall $v1 table (moves to a BIOS/HLE companion fed by the real SYSCALL exception), COP0 64-bit upper lanes beyond what MMI needs, FPU/COP1, VU0/VU1 macro-mode, and full TLB. These are later chapters or separate tracks.
|
||
|
||
### 4.2 Recommended microarchitecture: **start multicycle/interpreter-style, pipeline later**
|
||
|
||
Keep the current 8-state FSM shape (S_IFETCH_REQ → S_IFETCH_WAIT → S_EXECUTE → optional S_MEM_*; drop the two Ch215 shim states). **Reasons:**
|
||
|
||
1. **It already synthesizes cleanly.** The synthesizability assessment is unambiguous: clean synchronous `always_ff`, handshaked ports, no latches (both `unique case` blocks carry defaults), constant-bound loops. There are *no language-level blockers*.
|
||
2. **It is the correct altitude for first-silicon correctness.** A multicycle core has no hazards, no forwarding, no branch prediction — delay slots are a single `branch_pending` latch. This is the smallest correct design, and correctness-first is the only sane order when the goal is "prove a real R5900 RTL works."
|
||
3. **Pipelining is a pure-performance follow-on**, addable once the multicycle core passes the full TB suite and boots qbert. The R5900 is a dual-issue in-order pipeline; that is a *known, bounded* later effort, not a prerequisite for graduation.
|
||
4. **It matches the proven `iop_core_stub` shape**, so the platform integration patterns already exist.
|
||
|
||
Minimum ~4 cycles/instruction is acceptable for bring-up. The DE25-Nano has the headroom.
|
||
|
||
### 4.3 What must be stripped / gated for synthesis
|
||
|
||
From the synthesizability assessment, ranked:
|
||
|
||
- **STRIP_HW_DIVIDER=1 is mandatory** for any fit. The inferred combinational divider is the documented ~32 ns STA critical path (Ch162). Track B must replace it with a **multi-cycle sequential divider FSM** if DIVU semantics are needed (they are — qbert uses it).
|
||
- **Strip the trace port cluster** (`ev_valid/ev_subsys/ev_event/ev_arg0-3/ev_flags` + the `retired_*` shadow registers + the divu/multu trace arms). These are pure observability (~4×64 + 32 + several 32-bit FFs of dead weight) that force the synthesizer to keep otherwise-dead arg-computation logic. Replace with a thin, optional debug-readout port if needed.
|
||
- **Gate the gpr128 shadow** (`EE_ENABLE_MMI`). 32×128 = 4096 FFs is the dominant flop cost and Quartus will build it in ALMs (async multi-port read), not M20K. Keep only if MMI/quadword is in scope.
|
||
- **The CH215 jmp_buf FSM and the EE_SYSCALL_HLE dispatcher do not enter Track B at all.** In the stub they are param-gated OFF; in Track B they are simply absent — they move to the BIOS/HLE companion.
|
||
- `unique case`, constant-bound for-loops: **keep** (not blockers; defaults prevent latches).
|
||
|
||
---
|
||
|
||
## 5. Validation Strategy
|
||
|
||
**The existing ~50 focused `tb_ee_core_*` benches + the qbert boot path ARE Track B's compliance harness.** This is the single strongest asset we have, and it directly answers the owner's worry.
|
||
|
||
### 5.1 Why the existing suite transfers
|
||
|
||
The compliance inventory confirms **all 50 focused TBs are reusable** with only mechanical adaptation. The uniform pattern is port-driven: each TB hand-assembles a tiny program into the BIOS/bootstrap slots, lets the DUT fetch/decode/execute *through the public memory-map ports* to a PASS-syscall halt, then checks results. **Step 2 (execution) is already fully port-driven — there are no internal pokes to make the core run.** Many TBs embed an in-program BNE/BEQ-to-FAIL self-check, so the expected architectural behavior is encoded in the program itself and is checkable purely from observable halt-PC/RAM. There is a strict 1:1 opcode→TB discipline (Ch271–Ch293), so **there are no implemented-but-untested opcodes.**
|
||
|
||
### 5.2 The two required adaptations (both mechanical, both bounded)
|
||
|
||
1. **Hierarchical-peek → architectural readout.** Most TBs read the *post-halt* result via `u_core.regfile[...]` (and `u_ee_ram.mem[]`/`u_bios.mem[]` for stores). Against a renamed/synthesized core these peeks break. Fix: change each test program to **store its result register to a known RAM/MMIO address** and read it back through the map port. This is a per-TB swap that does not change the encoded expected behavior. Store-class TBs (memops, sb, sh, sd, sq, lq, ld) already verify partly through `u_ee_ram.mem[]` and are closest to a real memory boundary.
|
||
|
||
2. **Stub-accurate golden values → architecture-accurate golden values.** Several TBs deliberately encode *simplified* semantics: DADDU/DSUBU/DSLL as low-32 only, and (per stale comments — actually now full-128 via gpr128) the SQ/SD/LQ/LD width expectations. Against a true 64/128-bit Track B core, the low-32 expectations would FAIL and **must be upgraded to full-width**. The TBs are reusable as scaffolding and as behavior encodings; their golden values need a width pass.
|
||
|
||
### 5.3 Known coverage gaps to close (new TBs for Track B)
|
||
|
||
- **gpr128 invariant**: add a dedicated TB asserting `gpr128[i][31:0] === regfile[i]` directly (today only transitive via PCPYUD/etc.).
|
||
- **COP0 exception state**: EPC save/restore, ERET, Cause.ExcCode encoding — no focused TB today beyond BEV and Count. This is the *most important* new TB, because the SYSCALL exception-entry mechanism is the CPU's only legitimate connection to the kernel.
|
||
- **Arithmetic Overflow trap** for ADD/SUB/ADDI (stub defers it; Track B implements it).
|
||
- **DI positive semantics** (today only a negative/still-trapping companion in tb_ee_core_ei).
|
||
|
||
### 5.4 Directly addressing the owner's worry
|
||
|
||
> *"Are we even able to verify a real R5900 RTL would work / model the hardware to finalize?"*
|
||
|
||
**Yes — and we are unusually well-positioned to, for three concrete reasons:**
|
||
|
||
1. **We have a behavioral golden model.** Track A (the stub) is, for the scoped subset, a working executable specification. Track B can be **co-simulated against Track A instruction-by-instruction**: run the same program through both, compare retire-by-retire (PC, GPR writeback, memory effects). Divergence is an immediate, localized bug report. This is the gold-standard CPU-verification methodology (lockstep against a reference model), and we already own the reference model.
|
||
|
||
2. **We have an evidence-backed requirements list.** We are not guessing what an R5900 needs — qbert's 1.49M-instruction boot trace *tells us* exactly the opcode/syscall/MMIO surface that matters. Track B's "done" is defined by a real workload, not a datasheet wishlist.
|
||
|
||
3. **We have a port-driven, near-complete compliance suite** (§5.1) that runs entirely through the public bus interface — i.e., it validates the core the same way the rest of the system will use it.
|
||
|
||
**The honest qualifier:** "verify a *real R5900*" means verify the *scoped subset we implement*, in lockstep against the oracle and the TB suite, booting the workloads we target. It does **not** mean bit-exact cycle-accuracy against Sony silicon (multiply/divide latency, dual-issue timing, cache timing are not modeled and are out of scope for first-silicon). For a "boots and runs the game correctly" goal — which is the project goal — that scope is sufficient and verifiable. For a "cycle-perfect deterministic netplay" goal it is not, and we should not pretend otherwise.
|
||
|
||
---
|
||
|
||
## 6. Master Opcode / Feature Checklist
|
||
|
||
This is the deliverable Codex asked for: every decoded behavior, its fidelity, whether it is synthesizable, and whether it graduates to the Track B CPU core.
|
||
|
||
| Mnemonic | Encoding | Fidelity | Synth | Graduates |
|
||
|---|---|---|---|---|
|
||
| SLL | SPECIAL 0x00 | faithful | yes | **Yes** |
|
||
| SRL | SPECIAL 0x02 | faithful | yes | **Yes** |
|
||
| SRA | SPECIAL 0x03 | faithful | yes | **Yes** |
|
||
| SLLV | SPECIAL 0x04 | faithful | yes | **Yes** |
|
||
| SRLV | SPECIAL 0x06 | faithful | yes | **Yes** |
|
||
| SRAV | SPECIAL 0x07 | faithful | yes | **Yes** |
|
||
| JR | SPECIAL 0x08 | faithful | yes | **Yes** |
|
||
| JALR | SPECIAL 0x09 | faithful | yes | **Yes** |
|
||
| SYSCALL | SPECIAL 0x0C | hle_or_shim | needs_work | **No** (only the exception-entry mechanism graduates; the $v1 table is kernel HLE) |
|
||
| SYNC | SPECIAL 0x0F | faithful | yes | **Yes** |
|
||
| MFHI | SPECIAL 0x10 | faithful | yes | **Yes** |
|
||
| MFLO | SPECIAL 0x12 | faithful | yes | **Yes** |
|
||
| MULTU | SPECIAL 0x19 | faithful | yes | **Yes** (infers DSP; latency not modeled) |
|
||
| DIVU | SPECIAL 0x1B | faithful | needs_work | **Yes** (needs multi-cycle iterative divider; STRIP_HW_DIVIDER for fit) |
|
||
| DSLL | SPECIAL 0x38 | low32_approx | yes | **Yes** (needs full 64-bit shifter + DSLL32) |
|
||
| ADD | SPECIAL 0x20 | faithful | yes | **Yes** (needs overflow trap, ExcCode 12) |
|
||
| ADDU | SPECIAL 0x21 | faithful | yes | **Yes** |
|
||
| DADDU | SPECIAL 0x2D | low32_approx | yes | **Yes** (needs full 64-bit adder) |
|
||
| SUB | SPECIAL 0x22 | faithful | yes | **Yes** (needs overflow trap) |
|
||
| SUBU | SPECIAL 0x23 | faithful | yes | **Yes** |
|
||
| DSUBU | SPECIAL 0x2F | low32_approx | yes | **Yes** (needs full 64-bit subtract) |
|
||
| AND | SPECIAL 0x24 | faithful | yes | **Yes** |
|
||
| OR | SPECIAL 0x25 | faithful | yes | **Yes** |
|
||
| XOR | SPECIAL 0x26 | faithful | yes | **Yes** |
|
||
| NOR | SPECIAL 0x27 | faithful | yes | **Yes** |
|
||
| SLT | SPECIAL 0x2A | faithful | yes | **Yes** |
|
||
| SLTU | SPECIAL 0x2B | faithful | yes | **Yes** |
|
||
| BLTZ | REGIMM rt=0x00 | faithful | yes | **Yes** (BLTZAL link variant not modeled) |
|
||
| BGEZ | REGIMM rt=0x01 | faithful | yes | **Yes** (BGEZAL link variant not modeled) |
|
||
| J | 0x02 | faithful | yes | **Yes** |
|
||
| JAL | 0x03 | faithful | yes | **Yes** |
|
||
| BEQ | 0x04 | faithful | yes | **Yes** |
|
||
| BNE | 0x05 | faithful | yes | **Yes** |
|
||
| BLEZ | 0x06 | faithful | yes | **Yes** |
|
||
| BGTZ | 0x07 | faithful | yes | **Yes** |
|
||
| ADDI | 0x08 | faithful | yes | **Yes** (needs overflow trap) |
|
||
| ADDIU | 0x09 | faithful | yes | **Yes** |
|
||
| SLTI | 0x0A | faithful | yes | **Yes** |
|
||
| SLTIU | 0x0B | faithful | yes | **Yes** |
|
||
| ANDI | 0x0C | faithful | yes | **Yes** |
|
||
| ORI | 0x0D | faithful | yes | **Yes** |
|
||
| LUI | 0x0F | faithful | yes | **Yes** |
|
||
| MFC0 | COP0 rs=0x00 | low32_approx | yes | **Yes** (only 5 regs modeled; Count at full clock vs half) |
|
||
| MTC0 | COP0 rs=0x04 | low32_approx | yes | **Yes** (partial Status/Cause fields; Count write dropped) |
|
||
| RFE | COP0/CO funct 0x10 | faithful | yes | **Yes** (reconcile vs R5900 ERET) |
|
||
| EI | COP0/CO 0x42000038 | low32_approx | yes | **Yes** (should set Status.EIE; companion DI still traps) |
|
||
| LB | 0x20 | faithful | yes | **Yes** |
|
||
| LH | 0x21 | faithful | yes | **Yes** |
|
||
| LW | 0x23 | faithful | yes | **Yes** |
|
||
| LBU | 0x24 | faithful | yes | **Yes** |
|
||
| LHU | 0x25 | faithful | yes | **Yes** |
|
||
| LD | 0x37 | faithful | yes | **Yes** (full 64-bit via gpr128) |
|
||
| LQ | 0x1E | faithful | yes | **Yes** (full 128-bit via gpr128) |
|
||
| SB | 0x28 | faithful | yes | **Yes** |
|
||
| SH | 0x29 | faithful | yes | **Yes** |
|
||
| SW | 0x2B | faithful | yes | **Yes** |
|
||
| SD | 0x3F | faithful | yes | **Yes** (full 64-bit via gpr128; stale "beat1=0" comments) |
|
||
| SQ | 0x1F | faithful | yes | **Yes** (full 128-bit via gpr128; stale "beats 1-3=0" comments) |
|
||
| CACHE | 0x2F | hle_or_shim | yes | **Yes** (no-op correct for cacheless model) |
|
||
| PCPYLD | MMI2 sa 0x0E | faithful | yes | **Yes** (full-128) |
|
||
| PSUBB | MMI0 sa 0x09 | faithful | yes | **Yes** (full-128, no cross-byte borrow) |
|
||
| PNOR | MMI3 sa 0x13 | faithful | yes | **Yes** (full-128) |
|
||
| PAND | MMI2 sa 0x12 | faithful | yes | **Yes** (full-128) |
|
||
| PCPYUD | MMI3 sa 0x0E | faithful | yes | **Yes** (reads upper 64; drove gpr128) |
|
||
| PCPYH | MMI3 sa 0x1B | faithful | yes | **Yes** (full-128 halfword broadcast) |
|
||
| BEQL | 0x14 | faithful | yes | **Yes** (branch-likely squash) |
|
||
| BNEL | 0x15 | faithful | yes | **Yes** (branch-likely squash) |
|
||
| NOP | 0x00000000 | faithful | yes | **Yes** |
|
||
|
||
**Tally: 63 of 67 decoded behaviors graduate to the Track B CPU core.** The 4 that do not: SYSCALL (only its exception-entry mechanism graduates; the $v1 table is kernel HLE) — and that is the only true non-graduate, since CACHE graduates as an accepted no-op. The genuinely-approximate-but-graduating ops are DADDU/DSUBU/DSLL (need full 64-bit datapath) and MFC0/MTC0/EI (need fuller COP0 coverage). **The MMI/128-bit infrastructure is the strongest, most faithful part of the stub and is genuinely synthesizable.**
|
||
|
||
---
|
||
|
||
## 7. Go / No-Go + Recommended Next Chapters
|
||
|
||
**Does a scoped R5900 subset fit Agilex 5 and pass the TBs? — YES, with the §4.3 caveats honored.**
|
||
|
||
- **Fit**: Agilex 5 has hundreds of K ALMs. The dominant cost is gpr128 (~4096 FFs) — wasteful but not fatal, and gateable. MULTU infers DSP (fine). The divider must be stripped/replaced. The trace cluster should be stripped. No structural blocker.
|
||
- **TBs**: passes in simulation against the stub as-is; a stripped/gated synthesis config (no trace, divider replaced, HLE absent) needs the §5.2 adaptations (architectural readout + full-width golden values) and the §5.3 new TBs. Hence **go_with_caveats**, not unqualified go.
|
||
|
||
### Track A — next chapters (discovery only)
|
||
|
||
- **Ch307**: Autopsy the next qbert wait loop (post-Ch294/0x7A unblock; the steady-state hot-PC, e.g. the suspected `0x00106154` region). Classify the gate (memory flag? MMIO poll? handler-fire?) the same way Ch294 did. **No RTL** — produce the checklist entry, not a hack, unless a one-shot stub is the cheapest way to see the *next* blocker.
|
||
- **Ch308 (A)**: Begin backing out fabrications into the async-hardware layer: replace the Ch299 TB poke with the real RegisterLibraryEntries (0x77) memory side effect, modeled in the HLE companion, and prove qbert still progresses. This *de-risks* Track B by validating the real mechanism in the cheap environment first.
|
||
- **Ch309 (A)**: Capture a full lockstep retire-trace export from the oracle for a fixed qbert prefix, to serve as Track B's co-sim golden reference (§5.4.1).
|
||
|
||
### Track B — next chapters (the real core)
|
||
|
||
- **Ch308 (B)**: Scaffold `rtl/ee/ee_core.sv` — a clean multicycle skeleton: fetch/decode/retire FSM + 32-bit ALU (SLL…SLTU, ADDI…LUI) + HI/LO + branches/delay slots. **Validate immediately against the existing ALU/shift/branch TBs** (tb_ee_core_shift, _varshift, _rtype_logic, _rtype_addu, _add_sub, _slt, _slti, _branch_zero, _jal, _jalr) re-pointed to architectural readout. No MMI, no MMIO, no syscalls yet.
|
||
- **Ch309 (B)**: Add load/store (LB…SW + multi-beat LD/LQ/SD/SQ) with AdEL/AdES, and the multi-cycle DIVU FSM (replacing the combinational divider). Validate against _memops, _lb/_lbu/_lh/_lhu/_sb/_sh, _ld/_lq/_sd/_sq, _align/_align_exc, _divu_mflo, _multu_mflo.
|
||
- **Ch310 (B)**: Add the COP0 exception-entry mechanism (EPC/Cause.ExcCode/BEV vectoring) + ERET, plus the gated gpr128/MMI subset. Add the **new** TBs: gpr128 invariant, COP0 exception state, overflow trap. Wire the SYSCALL exception to vector into the BIOS/HLE companion (not an internal $v1 switch). First lockstep co-sim run against the Ch309(A) golden trace.
|
||
|
||
---
|
||
|
||
## 8. Risks / Rabbit-Holes to Avoid
|
||
|
||
**Be honest about what could make this unrecoverable — and what is merely hard.**
|
||
|
||
1. **THE primary risk: conflating oracle hacks into the real core.** This is the single thing that turns a bounded project into an unrecoverable one. If the `$v0=1` fib, the bit-17 fake, or a syscall stub leaks into `ee_core.sv`, Track B becomes a second oracle wearing a CPU costume — and we will chase phantom "blockers" (the Ch264–Ch268 thunk-chain hunt is the cautionary tale: 5 chapters chasing a treadmill that was *our own shim*, per Ch269). **Mitigation: the §2 rule is non-negotiable — the CPU core contains faithful instructions + exception entry, full stop. Every backed-out hack gets a TB proving the real mechanism reproduces the faked result.**
|
||
|
||
2. **GS fillrate and VU0/VU1 parallelism are separate mountains — do not let them contaminate the EE-core decision.** The EE *integer/MMI core* is tractable and is what this document scopes. The GS (rasterizer fillrate, VRAM bandwidth — see 0006-vram-roadmap) and the VUs (two SIMD vector coprocessors with their own microcode, macro/micro mode, and tight EE coupling) are each *larger* than the EE core and have their own roadmaps. **Risk: scope creep that bundles "boot qbert's CPU code" with "render qbert's graphics." Keep them separate; the EE core graduating does NOT imply the frame renders.** FPU/COP1 is a smaller but real adjacent piece, also deferred.
|
||
|
||
3. **Cycle-accuracy ambition.** If the goal silently drifts from "boots and runs correctly" to "cycle-perfect," the project becomes unbounded (multiply/divide latency, dual-issue scheduling, cache timing, bus contention). **Mitigation: §5.4 names the scope explicitly. First-silicon is behavioral correctness, not timing fidelity.**
|
||
|
||
4. **The divider critical path.** Known, measured (~32 ns, Ch162), and already gated. The only risk is *forgetting* to replace it with a sequential FSM when DIVU semantics are required. Tracked as a Ch309(B) deliverable.
|
||
|
||
5. **TB golden-value drift.** Several TBs encode stub-accurate (low-32 / truncated) golden values. If Track B is validated against *unmodified* stub TBs, a correct full-width core will FAIL spuriously, or worse, a buggy core will PASS against a too-lax expectation. **Mitigation: the §5.2 width pass is a prerequisite, not an afterthought.**
|
||
|
||
6. **Hierarchical-peek brittleness.** Not unrecoverable, but if ignored it blocks the entire compliance suite from running against the new core. Mechanical (§5.2.1) but must be budgeted.
|
||
|
||
**Bottom line: the EE core itself is tractable, bounded, and validatable.** We have a golden behavioral model, an evidence-backed requirements list, and a near-complete port-driven compliance suite — three assets most from-scratch CPU projects never have. The path is not a rabbit hole *provided* we hold the layer separation. The unrecoverable scenarios all share one root cause — letting the oracle and the CPU be the same artifact. This document exists to make sure they never are.
|