Files
retroDE_ps2/docs/ch275_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

5.2 KiB

Ch275 closeout — SD as 2-beat 32-bit-stripe write; qbert clears the prologue, next blocker is DSLL

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C54 instr=0x00094C38)DSLL (Doubleword Shift Left Logical), MIPS-III SPECIAL funct 0x38. qbert ran through the SD prologue at 0x00112DAC, executed 21 more instructions of the function body, and trapped on a 64-bit shift inside the function logic.

Numbers

Chapter Blocker qbert retire_count
Post-Ch271 (SQ) DADDU at 0x00100068 26,958
Post-Ch272 (DADDU) SYSCALL at 0x00100070 26,960
Post-Ch273 (SYSCALL HLE) BEQL at 0x001000C0 26,980
Post-Ch274 (BEQL) SD at 0x00112DAC 26,985
Post-Ch275 (SD) DSLL at 0x00112C54 27,006

What landed

RTL — surgical edits in ee_core_stub.sv

  1. localparam OP_SD = 6'h3F alongside OP_SQ.
  2. is_sd decode signal.
  3. Alignment: new is_dword_access = is_sd; extended is_align_fault with is_dword_access && (ea[2:0] != 3'd0); added is_sd to is_align_store. Misaligned SD trips the same AdES path as SW/SH/SQ.
  4. Decoder allow-list: !is_sd added to is_nop_class catch-all.
  5. FSM transition: new else if (is_sd) branch in EXECUTE that initializes sq_beat <= 0 and enters S_MEM_WRITE (reusing the SQ counter — SD only needs 2 beats, which fits in the 2-bit counter).
  6. S_MEM_WRITE comb: combined SQ + SD into one (is_sq || is_sd) branch. Same beat-indexed address + (sq_beat == 0) ? rt_val : 32'd0 data pattern.
  7. S_MEM_WRITE FSM: retire when (is_sq && beat==3) || (is_sd && beat==1), otherwise stay and increment.

7 surgical edits, ~12 LOC total. The reuse of sq_beat keeps the FSM minimal.

Focused TB — tb_ee_core_sd.sv

  • Bootstrap from 0xBFC00000 reset → 0xBFC00100.
  • $v0 = 0x80000400 (kseg0 → EE-RAM phys 0x400).
  • $ra = 0xABCD1234 (sentinel).
  • Pre-poke phys 0x400/0x404 with 0xDEADBEEF / 0xCAFEF00D.
  • Execute sd $ra, 0($v0) (encoded via enc_i(OP_SD, 2, 31, 0)).
  • LW + BNE chain verifies mem[0x400] = 0xABCD1234, mem[0x404] = 0.
  • Direct hierarchical RAM peek confirms both 32-bit lanes inside the qword. PASS via syscall.

Result: retired=16 halt=1 trap=0 pc=0xbfc00134 errors=0 PASS.

Makefile

  • tb_ee_core_sd target.
  • Added to both regression lists.
  • Regression: 162 → 163.

qbert progression highlights

  • The 21-retire delta from Ch274 to Ch275 means qbert ran the SD prologue, executed ~20 instructions of the function body, then hit DSLL.
  • The trap PC 0x00112C54 is LOWER than the prologue PC 0x00112DAC by ~0x158 bytes — so qbert's flow went forward through the prologue, then BACKWARD (a JAL to an earlier- defined function, or a loop branch). Either way, real function-call flow is happening.
  • $a0 = $a3 = $v1 = 0x0012C2C0 at trap — same pointer in multiple registers. Looks like a struct pointer passed to some library function.

Recommendation for Codex's Ch276

dsll $t1, $t1, 16 at PC 0x00112C54 — opcode SPECIAL, rt=9, rd=9, sa=16, funct=0x38.

Same shape as Ch272 DADDU — implement as SLL semantics for the low 32 bits. PS2 EE is 64-bit; our regfile is 32-bit; for sa < 32, DSLL and SLL produce identical low-32-bit results. For sa >= 32 (would need DSLL32 with funct 0x3C), the low 32 bits become 0 — but DSLL with sa=16 here is firmly in the SLL-equivalent range.

Minimal scope:

  1. localparam FUNC_DSLL = 6'h38.
  2. is_dsll decode signal + add to is_rtype_alu group.
  3. In rtype_alu_wb: else if (is_dsll) rtype_alu_wb = rt_val << shamt; (identical to SLL's path).

Focused TB pattern (mirrors tb_ee_core_daddu):

  • Normal shift: dsll $t1, $t0, 16 with $t0 = 0x00001234$t1 = 0x12340000.
  • Exact qbert encoding: dsll $t1, $t1, 16 (rt=rd=9, sa=16), encoded with enc_rtype and asserted to equal 0x00094C38.
  • Edge cases: sa=0 (no shift), sa=31 (max valid SLL-equivalent shift). sa values 32+ would need DSLL32; defer until qbert hits one.

Likely follow-ons after DSLL: DSRL (0x3A), DSRA (0x3B), DSLL32 (0x3C), DSRL32 (0x3E), DSRA32 (0x3F), DADDIU (0x19), LD (0x37). Land each as the runner surfaces it. The opcode-growth cadence is now fast (~minutes per chapter); Codex can choose to fold multiple D-shifts into one chapter if qbert hits several in sequence.

Files changed

  • rtl/ee/ee_core_stub.sv — 7 surgical edits.
  • sim/tb/integration/tb_ee_core_sd.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight at the moment of writing; expected 163/163 (was 162, +1 for tb_ee_core_sd).

Pattern summary across the qbert track

Ch271→Ch275: SQ → DADDU → SYSCALL HLE → BEQL → SD. Each chapter

  • One opcode (or syscall family) added.
  • 2-7 RTL edits, all surgical.
  • One focused TB with pre/post register assertions.
  • One re-run of qbert that reveals the next blocker.
  • One regression bump.

retire_count progression: 12 → 26,958 → 26,960 → 26,980 → 26,985 → 27,006. The runner is doing exactly its job — surfacing the next concrete blocker in the order qbert actually needs them, never speculating about what to add next.