RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.1 KiB
Ch276 closeout — DSLL as SLL low-32-bit; qbert progresses 10 retires, next blocker is BNEL
Status: Closed. Verdict from re-running qbert.elf:
elf_first_unsupported_opcode (pc=0x00112C7C instr=0x54400019) —
BNEL (Branch on Not Equal Likely), MIPS-II opcode 0x15.
Exactly the follow-on Codex predicted in the Ch274 closeout:
"Likely follow-on after BEQL: BNEL."
Numbers
| Chapter | Blocker | qbert retire_count |
|---|---|---|
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
| Post-Ch276 (DSLL) | BNEL at 0x00112C7C | 27,016 |
What landed
RTL — 4 surgical edits in ee_core_stub.sv
localparam FUNC_DSLL = 6'h38alongsideFUNC_SLL.is_dslllogic decl +assign is_dsll = is_special && (func == FUNC_DSLL).- Added
is_dsllto theis_rtype_alugroup. - Added
is_dsllto theis_sllarm ofrtype_alu_wb:else if (is_sll || is_dsll) rtype_alu_wb = rt_val << shamt.
The arm reuses SLL's writeback path because for any valid
sa < 32 the low 32 bits of DSLL and SLL are identical. About
4 LOC of real change — mirrors Ch272 DADDU's "implement
64-bit opcode as 32-bit equivalent" pattern.
Focused TB — tb_ee_core_dsll.sv
Four cases:
- Exact qbert encoding:
dsll $t1, $t1, 16(rt=rd=9, sa=16). Built viaenc_rtype(OP_SPCL, 0, 9, 9, 16, FUNC_DSLL)and asserted to equal0x00094C38(the literal qbert instruction). With$t1 = 0x1234→$t1 = 0x12340000. - Low-bit shift:
dsll $t2, $t3, 1with$t3 = 0x40000001→$t2 = 0x80000002. - Wrap-out (low-32 truncation):
dsll $t4, $t5, 1with$t5 = 0x80000001→$t4 = 0x00000002. Proves bit-31 falls off in our 32-bit model (in a faithful 64-bit model it would move to bit 32; our model has nowhere to put it). - sa=0 identity:
dsll $t6, $t7, 0with$t7 = 0xABCD1234→$t6 = 0xABCD1234.
Result: retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS.
Makefile + regression
tb_ee_core_dslltarget.- Added to both PHONY list and
run:master. - Regression: 163 → 164.
qbert progression detail
10-retire delta from Ch275 (27,006 → 27,016). The DSLL retires at 0x00112C54, then qbert executes ~9 more instructions before hitting BNEL at 0x00112C7C — that's 10 PCs over 40 bytes (0x28), so a tight straight-line block with no branches between. Likely a switch-statement entry or function-body case dispatcher.
$a0 = 0x80808080 at the trap is interesting — that's a
canonical "byte-broadcast" sentinel (e.g. ~(uint32 0x7F7F7F7F)),
often used by stdlib string ops to detect zero/high bytes in
parallel. qbert may be calling something like strlen or
memchr internally.
Recommendation for Codex's Ch277 — BNEL
bnel $v0, $0, +25*4 at PC 0x00112C7C, opcode 0x15 — the
exact follow-on Codex predicted from BEQL.
Same shape as Ch274 BEQL:
- Decode opcode
6'h15as BNEL. - BNEL TAKEN when
rs != rt(same as BNE). - BNEL NOT-TAKEN: squash the delay slot.
Reuse the existing Ch274 is_beql_squash infrastructure:
localparam OP_BNEL = 6'h15.is_bneldecode signal.- Add
is_bneltois_branchgroup. - Extend
branch_takenwith(is_bnel && (rs_val != rt_val)). - Replace
is_beql_squashwith a more generalis_branch_likely_squash:No wait — squash fires when likely-branch is NOT taken:is_branch_likely_squash = (is_beql && (rs_val == rt_val)) || (is_bnel && (rs_val != rt_val)); // wait — takenUpdateis_branch_likely_squash = (is_beql && (rs_val != rt_val)) || (is_bnel && (rs_val == rt_val));retire_advanceto use the new name. - Add
!is_bneltois_nop_classallow-list.
Focused TB mirrors tb_ee_core_beql: BNEL taken (delay fires),
BNEL not-taken (delay squashed), BNE cross-check (delay always
fires). ~5 LOC + the TB.
Likely follow-ons after BNEL: BLEZL/BGTZL (0x16/0x17) and
REGIMM-likely family (BLTZL/BGEZL at REGIMM rt=0x02/0x03,
BLTZALL/BGEZALL at rt=0x12/0x13). Same squash mechanism for
all of them. Codex may want to fold multiple branch-likely
variants into one chapter now that the pattern is well-locked.
Files changed
rtl/ee/ee_core_stub.sv— 4 surgical edits (~4 LOC).sim/tb/integration/tb_ee_core_dsll.sv— new focused TB.sim/Makefile— target + both regression lists.
Regression
In flight; expected 164/164.
Pattern review
Six qbert-driven chapters (Ch271→Ch276):
- Ch271 SQ — 5 RTL edits, 4-beat write
- Ch272 DADDU — 4 RTL edits, ALU low-32
- Ch273 SYSCALL HLE — 2 RTL edits, gated dispatcher
- Ch274 BEQL — 6 RTL edits, branch + squash
- Ch275 SD — 7 RTL edits, 2-beat write (reuses SQ counter)
- Ch276 DSLL — 4 RTL edits, ALU low-32 (reuses SLL path)
Each chapter has been smaller as the patterns lock in. Ch276 is the smallest yet — pure pattern-reuse from Ch272 + Ch275. The qbert track is well-trained, the runner correctly surfaces the next blocker each time, and the incremental cadence holds.