Files
retroDE_ps2/docs/ch276_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

5.1 KiB

Ch276 closeout — DSLL as SLL low-32-bit; qbert progresses 10 retires, next blocker is BNEL

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C7C instr=0x54400019)BNEL (Branch on Not Equal Likely), MIPS-II opcode 0x15. Exactly the follow-on Codex predicted in the Ch274 closeout: "Likely follow-on after BEQL: BNEL."

Numbers

Chapter Blocker qbert retire_count
Post-Ch273 (SYSCALL HLE) BEQL at 0x001000C0 26,980
Post-Ch274 (BEQL) SD at 0x00112DAC 26,985
Post-Ch275 (SD) DSLL at 0x00112C54 27,006
Post-Ch276 (DSLL) BNEL at 0x00112C7C 27,016

What landed

RTL — 4 surgical edits in ee_core_stub.sv

  1. localparam FUNC_DSLL = 6'h38 alongside FUNC_SLL.
  2. is_dsll logic decl + assign is_dsll = is_special && (func == FUNC_DSLL).
  3. Added is_dsll to the is_rtype_alu group.
  4. Added is_dsll to the is_sll arm of rtype_alu_wb: else if (is_sll || is_dsll) rtype_alu_wb = rt_val << shamt.

The arm reuses SLL's writeback path because for any valid sa < 32 the low 32 bits of DSLL and SLL are identical. About 4 LOC of real change — mirrors Ch272 DADDU's "implement 64-bit opcode as 32-bit equivalent" pattern.

Focused TB — tb_ee_core_dsll.sv

Four cases:

  1. Exact qbert encoding: dsll $t1, $t1, 16 (rt=rd=9, sa=16). Built via enc_rtype(OP_SPCL, 0, 9, 9, 16, FUNC_DSLL) and asserted to equal 0x00094C38 (the literal qbert instruction). With $t1 = 0x1234$t1 = 0x12340000.
  2. Low-bit shift: dsll $t2, $t3, 1 with $t3 = 0x40000001$t2 = 0x80000002.
  3. Wrap-out (low-32 truncation): dsll $t4, $t5, 1 with $t5 = 0x80000001$t4 = 0x00000002. Proves bit-31 falls off in our 32-bit model (in a faithful 64-bit model it would move to bit 32; our model has nowhere to put it).
  4. sa=0 identity: dsll $t6, $t7, 0 with $t7 = 0xABCD1234$t6 = 0xABCD1234.

Result: retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS.

Makefile + regression

  • tb_ee_core_dsll target.
  • Added to both PHONY list and run: master.
  • Regression: 163 → 164.

qbert progression detail

10-retire delta from Ch275 (27,006 → 27,016). The DSLL retires at 0x00112C54, then qbert executes ~9 more instructions before hitting BNEL at 0x00112C7C — that's 10 PCs over 40 bytes (0x28), so a tight straight-line block with no branches between. Likely a switch-statement entry or function-body case dispatcher.

$a0 = 0x80808080 at the trap is interesting — that's a canonical "byte-broadcast" sentinel (e.g. ~(uint32 0x7F7F7F7F)), often used by stdlib string ops to detect zero/high bytes in parallel. qbert may be calling something like strlen or memchr internally.

Recommendation for Codex's Ch277 — BNEL

bnel $v0, $0, +25*4 at PC 0x00112C7C, opcode 0x15 — the exact follow-on Codex predicted from BEQL.

Same shape as Ch274 BEQL:

  • Decode opcode 6'h15 as BNEL.
  • BNEL TAKEN when rs != rt (same as BNE).
  • BNEL NOT-TAKEN: squash the delay slot.

Reuse the existing Ch274 is_beql_squash infrastructure:

  1. localparam OP_BNEL = 6'h15.
  2. is_bnel decode signal.
  3. Add is_bnel to is_branch group.
  4. Extend branch_taken with (is_bnel && (rs_val != rt_val)).
  5. Replace is_beql_squash with a more general is_branch_likely_squash:
    is_branch_likely_squash = (is_beql && (rs_val == rt_val))
                           || (is_bnel && (rs_val != rt_val));  // wait — taken
    
    No wait — squash fires when likely-branch is NOT taken:
    is_branch_likely_squash = (is_beql && (rs_val != rt_val))
                           || (is_bnel && (rs_val == rt_val));
    
    Update retire_advance to use the new name.
  6. Add !is_bnel to is_nop_class allow-list.

Focused TB mirrors tb_ee_core_beql: BNEL taken (delay fires), BNEL not-taken (delay squashed), BNE cross-check (delay always fires). ~5 LOC + the TB.

Likely follow-ons after BNEL: BLEZL/BGTZL (0x16/0x17) and REGIMM-likely family (BLTZL/BGEZL at REGIMM rt=0x02/0x03, BLTZALL/BGEZALL at rt=0x12/0x13). Same squash mechanism for all of them. Codex may want to fold multiple branch-likely variants into one chapter now that the pattern is well-locked.

Files changed

  • rtl/ee/ee_core_stub.sv — 4 surgical edits (~4 LOC).
  • sim/tb/integration/tb_ee_core_dsll.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 164/164.

Pattern review

Six qbert-driven chapters (Ch271→Ch276):

  • Ch271 SQ — 5 RTL edits, 4-beat write
  • Ch272 DADDU — 4 RTL edits, ALU low-32
  • Ch273 SYSCALL HLE — 2 RTL edits, gated dispatcher
  • Ch274 BEQL — 6 RTL edits, branch + squash
  • Ch275 SD — 7 RTL edits, 2-beat write (reuses SQ counter)
  • Ch276 DSLL — 4 RTL edits, ALU low-32 (reuses SLL path)

Each chapter has been smaller as the patterns lock in. Ch276 is the smallest yet — pure pattern-reuse from Ch272 + Ch275. The qbert track is well-trained, the runner correctly surfaces the next blocker each time, and the incremental cadence holds.