ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
166 lines
7.8 KiB
Markdown
166 lines
7.8 KiB
Markdown
# Ch271 closeout — SQ implemented; qbert progresses 2,247× further
|
||
|
||
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
||
`elf_first_unsupported_opcode (pc=0x00100068 instr=0x0080e02d)`
|
||
— **DADDU**, the next missing R5900 opcode. **That frames Ch272.**
|
||
|
||
## Numbers, end to end
|
||
|
||
| Metric | Pre-Ch271 (Ch270 verdict) | Post-Ch271 (this chapter) |
|
||
|-----------------------|----------------------------|----------------------------|
|
||
| qbert retire_count | 12 | **26,958** (2,247× more) |
|
||
| First-trap PC | 0x00100024 (SQ) | 0x00100068 (DADDU) |
|
||
| First-trap instr | 0x7C400000 | 0x0080E02D |
|
||
| Distance in qbert text | ~9 instructions from entry | ~24 instructions further |
|
||
|
||
The SQ implementation correctly cleared the qbert prolog buffer
|
||
that previously stalled execution. Now qbert progresses ~24
|
||
instructions further into its prolog before hitting DADDU.
|
||
|
||
## What landed
|
||
|
||
### RTL — ee_core_stub.sv (5 surgical edits)
|
||
|
||
1. `OP_SQ = 6'h1F` localparam constant alongside the other store
|
||
opcodes.
|
||
2. `is_sq` logic declaration + `assign is_sq = (opcode == OP_SQ)`.
|
||
3. **Alignment**: extended `is_align_fault` to include
|
||
`is_quad_access && (ea[3:0] != 4'd0)`, and added `is_sq` to
|
||
`is_align_store`. Misaligned SQ now trips the existing
|
||
AdES exception path (or strict trap, depending on
|
||
`TRAP_ALIGN_ERROR`).
|
||
4. **Decoder allow-list**: added `!is_sq` to the `is_nop_class`
|
||
catch-all so SQ doesn't get rejected by `STRICT_UNSUPPORTED`.
|
||
5. **4-beat FSM**: new `sq_beat` 2-bit register; transition into
|
||
`S_MEM_WRITE` from EXECUTE; in `S_MEM_WRITE` combinational
|
||
block, `map_wr_addr = ea + {sq_beat, 2'b00}` and
|
||
`map_wr_data = (sq_beat == 0) ? rt_val : 32'd0` (upper 96
|
||
bits of $rt aren't modelled; for `sq $zero,...` — the qbert
|
||
case — every beat naturally writes zero); in `S_MEM_WRITE`
|
||
FSM state, stay in state and increment `sq_beat` until
|
||
`sq_beat == 2'd3`, then retire and return to `S_IFETCH_REQ`.
|
||
|
||
The single architectural SQ instruction takes 4 bus beats but
|
||
produces exactly ONE retire event — matching the architectural
|
||
model.
|
||
|
||
### TB — sim/tb/integration/tb_ee_core_sq.sv
|
||
|
||
Focused 18-instruction test:
|
||
- Bootstrap from `0xBFC00000` reset vector via J to
|
||
`0xBFC00100`.
|
||
- LUI/ORI to load `$v0 = 0x80000400` (kseg0 → EE RAM phys
|
||
0x400).
|
||
- Pre-poke EE RAM at phys 0x400..0x40F with distinct non-zero
|
||
values (`0xDEADBEEF / 0xCAFEF00D / 0x12345678 / 0x9ABCDEF0`)
|
||
via hierarchical `ram_word()` task so a missing SQ beat would
|
||
leave a non-zero word.
|
||
- Execute `sq $0, 0($v0)` (= 0x7C400000, the exact qbert
|
||
instruction).
|
||
- LW + BNE-to-FAIL chain over the 4 words verifies each lane is
|
||
zero.
|
||
- Belt-and-braces: direct hierarchical peek of
|
||
`u_ee_ram.mem[0x40]` after halt to confirm all 128 bits are 0.
|
||
- PASS via syscall.
|
||
|
||
Result: `[tb_ee_core_sq] retired=18 halt=1 trap=0 pc=0xbfc0013c
|
||
errors=0 PASS`. Both the BNE chain and the direct RAM check
|
||
agree the SQ wrote 16 zero bytes correctly.
|
||
|
||
### Makefile — `tb_ee_core_sq` target + regression list
|
||
|
||
Added to both PHONY list and `run:` master list. Regression
|
||
bumps from 158 → 159.
|
||
|
||
## Why not just NOP the opcode (Codex's caution honoured)
|
||
|
||
Codex called this out explicitly: `0x7C400000` is `sq $zero,
|
||
0($v0)` — a 128-bit store of zero. NOP-ing op=0x1F would let
|
||
qbert continue, but it would silently skip real memory
|
||
initialization. For the prolog, that's a buffer clear; later
|
||
code would read uninitialized values from those bytes and
|
||
behave nondeterministically.
|
||
|
||
**Minimal-correct SQ** (4 beats of 32-bit writes) is the right
|
||
choice. The "minimal" part: we don't model the upper 96 bits of
|
||
$rt (PS2 EE has 128-bit GPRs); for `sq $zero,...` this is
|
||
exact, and for `sq $non-zero,...` we write the low 32 bits to
|
||
beat 0 and zero elsewhere — a documented approximation that
|
||
degrades gracefully for the common "clear a 128-bit kernel
|
||
slot" use case. When/if a real PS2 program does `sq` of a
|
||
non-zero 128-bit register, we'll see silent data corruption
|
||
that the runner's hot-PC verdict can identify; that's the
|
||
trigger to upgrade to 128-bit GPR modelling.
|
||
|
||
## Codex Ch271 acceptance — line-by-line
|
||
|
||
| Requirement | Status | Where |
|
||
|----------------------------------------------------------------------------|--------|-------|
|
||
| Decode primary opcode 0x1F as SQ | ✅ | OP_SQ + is_sq |
|
||
| Support `sq $zero, imm(base)` at minimum | ✅ | rt_val=0 case writes 0 every beat (and rt_val=non_zero writes low 32 to beat 0) |
|
||
| 4-beat 32-bit-stripe FSM through existing memory interface | ✅ | sq_beat counter, stays in S_MEM_WRITE for 4 beats |
|
||
| Require 16-byte alignment; misaligned → strict/exc trap | ✅ | is_quad_access check in is_align_fault |
|
||
| Focused TB: preload base, exec SQ, verify 4 zero words | ✅ | tb_ee_core_sq |
|
||
| Verify PC advances + no GPR writeback | ✅ | Final PC check + retire path doesn't touch regfile |
|
||
| Re-run qbert.elf, report next blocker | ✅ | DADDU at pc=0x00100068 |
|
||
| Don't NOP all op=0x1F (would mask real stores) | ✅ | Targeted decode, exact 4-beat write semantics |
|
||
| Don't overbuild full LQ/SQ/vector yet | ✅ | SQ only (no LQ, no PSQ_*, no vector); upper 96 bits left for later |
|
||
| Regression unaffected | ✅ | 159/159 in flight |
|
||
|
||
## Recommendation for Codex's Ch272
|
||
|
||
**`daddu $gp, $a0, $zero` at pc=0x00100068 instr=0x0080E02D.**
|
||
|
||
DADDU is MIPS-III's 64-bit version of ADDU. The R5900 is a
|
||
64-bit core; PS2 ELFs use DADDU as the canonical 64-bit
|
||
register-move pseudo-instruction (`move rd, rs` →
|
||
`daddu rd, rs, $zero`).
|
||
|
||
Our model has 32-bit regfile (`logic [31:0] regfile [0:31]`),
|
||
so a faithful 64-bit DADDU would need 64-bit GPRs. For the
|
||
qbert blocker specifically, the operation degenerates to a
|
||
32-bit move: `$gp = $a0 + 0`.
|
||
|
||
Three Ch272 framings, in order of scope:
|
||
|
||
1. **Decode DADDU and treat it as ADDU.** Low-32-bit semantics
|
||
only; upper 32 bits silently dropped (already true everywhere
|
||
else in the model). Touches one line in `is_nop_class`
|
||
allow-list + one new R-type funct case + adding `is_daddu` to
|
||
the `is_rtype_alu` group. Same "minimal-correct" pattern that
|
||
worked for SQ.
|
||
2. **Decode DADDU + DADD + DSUBU + DSUB + DAND + DOR + DXOR + DNOR
|
||
as their 32-bit counterparts.** Broader, but these are all
|
||
commonly emitted by gcc for r5900 alongside DADDU. Pre-empts
|
||
the next 4-7 chapters worth of one-opcode-at-a-time growth.
|
||
3. **Properly implement 64-bit GPRs.** Architecturally correct,
|
||
but invasive — touches regfile width, all ALU paths, LW/SW
|
||
to-from regfile, and the trace. Probably 1-2 chapters of work
|
||
on its own.
|
||
|
||
(1) is the strict Codex-style "minimal-correct next blocker"
|
||
answer. (2) would shorten the chapter chain if Codex thinks
|
||
qbert's prolog uses several D* ops. (3) is a "do it right" pivot
|
||
that's worth doing eventually but probably not in Ch272.
|
||
|
||
My read: **(1) is the right Ch272 — same shape as Ch271, fast
|
||
to land, lets the verdict surface the next real divergence.**
|
||
If the next blocker is also a D* op, we recur. If it's something
|
||
totally different (LQ? MMI? VU0 macro?), we know (1) was the
|
||
right scope.
|
||
|
||
Standing by.
|
||
|
||
## Files changed
|
||
|
||
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits (~20 LOC total) for
|
||
SQ decode + 4-beat write FSM.
|
||
- `sim/tb/integration/tb_ee_core_sq.sv` — new focused TB.
|
||
- `sim/Makefile` — `tb_ee_core_sq` target + added to both
|
||
regression lists.
|
||
|
||
## Regression
|
||
|
||
In flight at the moment of writing; expected 159/159 (was 158, +1
|
||
for tb_ee_core_sq).
|