RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.2 KiB
Ch275 closeout — SD as 2-beat 32-bit-stripe write; qbert clears the prologue, next blocker is DSLL
Status: Closed. Verdict from re-running qbert.elf:
elf_first_unsupported_opcode (pc=0x00112C54 instr=0x00094C38) —
DSLL (Doubleword Shift Left Logical), MIPS-III SPECIAL
funct 0x38. qbert ran through the SD prologue at 0x00112DAC,
executed 21 more instructions of the function body, and trapped
on a 64-bit shift inside the function logic.
Numbers
| Chapter | Blocker | qbert retire_count |
|---|---|---|
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 |
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 |
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
What landed
RTL — surgical edits in ee_core_stub.sv
localparam OP_SD = 6'h3Falongside OP_SQ.is_sddecode signal.- Alignment: new
is_dword_access = is_sd; extendedis_align_faultwithis_dword_access && (ea[2:0] != 3'd0); addedis_sdtois_align_store. Misaligned SD trips the same AdES path as SW/SH/SQ. - Decoder allow-list:
!is_sdadded tois_nop_classcatch-all. - FSM transition: new
else if (is_sd)branch in EXECUTE that initializessq_beat <= 0and enters S_MEM_WRITE (reusing the SQ counter — SD only needs 2 beats, which fits in the 2-bit counter). - S_MEM_WRITE comb: combined SQ + SD into one
(is_sq || is_sd)branch. Same beat-indexed address +(sq_beat == 0) ? rt_val : 32'd0data pattern. - S_MEM_WRITE FSM: retire when
(is_sq && beat==3) || (is_sd && beat==1), otherwise stay and increment.
7 surgical edits, ~12 LOC total. The reuse of sq_beat keeps
the FSM minimal.
Focused TB — tb_ee_core_sd.sv
- Bootstrap from 0xBFC00000 reset → 0xBFC00100.
$v0 = 0x80000400(kseg0 → EE-RAM phys 0x400).$ra = 0xABCD1234(sentinel).- Pre-poke phys 0x400/0x404 with
0xDEADBEEF/0xCAFEF00D. - Execute
sd $ra, 0($v0)(encoded viaenc_i(OP_SD, 2, 31, 0)). - LW + BNE chain verifies
mem[0x400] = 0xABCD1234,mem[0x404] = 0. - Direct hierarchical RAM peek confirms both 32-bit lanes inside the qword. PASS via syscall.
Result: retired=16 halt=1 trap=0 pc=0xbfc00134 errors=0 PASS.
Makefile
tb_ee_core_sdtarget.- Added to both regression lists.
- Regression: 162 → 163.
qbert progression highlights
- The 21-retire delta from Ch274 to Ch275 means qbert ran the SD prologue, executed ~20 instructions of the function body, then hit DSLL.
- The trap PC
0x00112C54is LOWER than the prologue PC0x00112DACby ~0x158 bytes — so qbert's flow went forward through the prologue, then BACKWARD (a JAL to an earlier- defined function, or a loop branch). Either way, real function-call flow is happening. $a0 = $a3 = $v1 = 0x0012C2C0at trap — same pointer in multiple registers. Looks like a struct pointer passed to some library function.
Recommendation for Codex's Ch276
dsll $t1, $t1, 16 at PC 0x00112C54 — opcode SPECIAL,
rt=9, rd=9, sa=16, funct=0x38.
Same shape as Ch272 DADDU — implement as SLL semantics for
the low 32 bits. PS2 EE is 64-bit; our regfile is 32-bit; for
sa < 32, DSLL and SLL produce identical low-32-bit results.
For sa >= 32 (would need DSLL32 with funct 0x3C), the low 32
bits become 0 — but DSLL with sa=16 here is firmly in the
SLL-equivalent range.
Minimal scope:
localparam FUNC_DSLL = 6'h38.is_dslldecode signal + add tois_rtype_alugroup.- In
rtype_alu_wb:else if (is_dsll) rtype_alu_wb = rt_val << shamt;(identical to SLL's path).
Focused TB pattern (mirrors tb_ee_core_daddu):
- Normal shift:
dsll $t1, $t0, 16with$t0 = 0x00001234→$t1 = 0x12340000. - Exact qbert encoding:
dsll $t1, $t1, 16(rt=rd=9, sa=16), encoded withenc_rtypeand asserted to equal0x00094C38. - Edge cases: sa=0 (no shift), sa=31 (max valid SLL-equivalent shift). sa values 32+ would need DSLL32; defer until qbert hits one.
Likely follow-ons after DSLL: DSRL (0x3A), DSRA (0x3B), DSLL32 (0x3C), DSRL32 (0x3E), DSRA32 (0x3F), DADDIU (0x19), LD (0x37). Land each as the runner surfaces it. The opcode-growth cadence is now fast (~minutes per chapter); Codex can choose to fold multiple D-shifts into one chapter if qbert hits several in sequence.
Files changed
rtl/ee/ee_core_stub.sv— 7 surgical edits.sim/tb/integration/tb_ee_core_sd.sv— new focused TB.sim/Makefile— target + both regression lists.
Regression
In flight at the moment of writing; expected 163/163 (was
162, +1 for tb_ee_core_sd).
Pattern summary across the qbert track
Ch271→Ch275: SQ → DADDU → SYSCALL HLE → BEQL → SD. Each chapter
- One opcode (or syscall family) added.
- 2-7 RTL edits, all surgical.
- One focused TB with pre/post register assertions.
- One re-run of qbert that reveals the next blocker.
- One regression bump.
retire_count progression: 12 → 26,958 → 26,960 → 26,980 → 26,985 → 27,006. The runner is doing exactly its job — surfacing the next concrete blocker in the order qbert actually needs them, never speculating about what to add next.