RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.5 KiB
Ch274 closeout — BEQL with squash-on-not-taken; qbert lands in a function prologue, next blocker is SD
Status: Closed. Verdict from re-running qbert.elf:
elf_first_unsupported_opcode (pc=0x00112DAC instr=0xFFBF0020) —
SD (Store Doubleword, MIPS-III). qbert passed the C++
constructor walker's BEQL correctly, JAL'd into a function at
PC 0x00112DAC, and trapped on the very first instruction of
that function — the canonical sd $ra, 0x20($sp) register-save
prologue.
Numbers
| Chapter | Blocker | qbert retire_count |
|---|---|---|
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 |
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 |
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
The 5-retire delta covers: BEQL squash → addiu $v0, $v1, 4 →
lw $a0, 0($v0) → addiu $a1, $v0, 4 → jal 0x00112DAC →
first instruction of the called function (SD, traps). The
~78 KB PC jump to 0x00112DAC confirms the BEQL squash worked
— qbert's $a0 was NOT clobbered to 0 by the squashed delay
slot, the LW loaded the real constructor-pointer, and the JAL
dispatched correctly.
What landed
RTL — surgical edits in ee_core_stub.sv
- Opcode:
localparam OP_BEQL = 6'h14alongsideOP_BEQ. - Decode:
is_beqlsignal +assign is_beql = (opcode == OP_BEQL). - Branch logic: BEQL added to
is_branchgroup and tobranch_taken(same(rs_val == rt_val)condition as BEQ). - New signal
is_beql_squash:is_beql && (rs_val != rt_val)— the load-bearing case. retire_advance: whenis_beql_squashis true,next_pc <= pc + 32'd8(skip the delay slot directly);new_branch_pendingstays low so no stale target leaks. Existing BEQ/BNE/jump path unchanged.- Decoder allow-list: added
!is_beqlto theis_nop_classcatch-all so SQ doesn't get strict-trap'd.
About 6 LOC of real change.
Focused TB — tb_ee_core_beql.sv
Three cases per Codex's spec:
- BEQL taken (
$t0 == $t1): branch reaches target; delay slot DOES execute (writes a sentinel into$t5). Cross-checked by$t6 = 0xCAFEat the target. - BEQL not-taken (
$t2 != $t3): delay slot squashed.$t7 = 0x2222at PC+8 proves we landed correctly past the squash. Inline BNE chain verifies$t5was NOT clobbered by the squashed delay slot ($t5stays at its pre-BEQL0xBEEF0000value). - BEQ not-taken cross-check (same operands): plain BEQ's
delay slot DOES execute, so
$t5gets0xCABORed into the low 16 bits ($t5 = 0xBABE0CAB). Proves BEQL's squash differs from BEQ's no-squash behavior.
Encoding gotcha caught during TB authoring: my initial delay
slots used ori $t5, $0, ... (clobbers $t5 regardless of
prior value) instead of ori $t5, $t5, ... (ORs into $t5,
preserving high bits). The first build FAILED the Case-3 check
with $t5=0x00000CAB instead of 0xBABE0CAB. Fixed by changing
the rs field to RT5 so the delay slot ORs into the existing
value — making both "delay-fired" and "delay-squashed" cases
distinguishable by the high half-word.
Result: retired=21 halt=1 trap=0 pc=0xbfc00158 errors=0 PASS.
Makefile + regression
tb_ee_core_beqltarget.- Added to both PHONY list and
run:master. - Regression: 161 → 162.
qbert disassembly around the new blocker (PC 0x00112DAC)
The JAL at 0x001000D4 calls into a function at 0x00112DAC.
That function's prologue is:
0x00112DAC: 0xFFBF0020 sd $ra, 0x20($sp) <-- TRAP (opcode 0x3F, MIPS-III SD)
SD (Store Doubleword) is the MIPS-III 64-bit cousin of SW.
PS2 ELFs use it everywhere in function prologues to save
64-bit register values ($ra, $s*) onto the stack.
Recommendation for Codex's Ch275
Implement SD as a 2-beat 32-bit-stripe write FSM, mirroring Ch271's SQ pattern but smaller:
- Decode: opcode
6'h3F→is_sd. - Alignment: SD requires 8-byte alignment (
ea[2:0] == 0). Misaligned → AdES path (same as existing SW alignment). - FSM: reuse the
sq_beatcounter (or addsd_beat); 2 beats this time. Beat 0 writesrt_val(low 32 bits of $rt) at EA; beat 1 writes 0 at EA+4 (upper 32 bits of $rt not modelled — same approximation we made for SQ beats 1-3). - For
sd $ra,...: real PS2 callees laterLDto restore 64-bit$ra. Our model's upper 32 bits are always 0, so the round-trip works as long as the function doesn't do 64-bit math on$raitself (rare).
Focused TB shape (mirrors tb_ee_core_sq):
- Pre-poke RAM target with non-zero junk.
- Execute
sd $rt, 0(base)with$rtnon-zero in low 32 bits. - LW + BNE chain verifies
mem[base+0] = rt_val_lowandmem[base+4] = 0. - Direct hierarchical RAM peek for belt-and-braces.
This is structurally identical to Ch271 with 4 → 2 beats
and 16 → 8 byte alignment. Should be ~30 minutes of work.
Likely follow-on after SD: LD (Load Doubleword, opcode
0x37). When the called function eventually returns, it'll
LD $ra, 0x20($sp) to restore the saved register; our
model needs the corresponding 2-beat read path. Codex may
want to fold SD+LD into one chapter since they're symmetric.
Files changed
rtl/ee/ee_core_stub.sv— 6 surgical edits.sim/tb/integration/tb_ee_core_beql.sv— new focused TB.sim/Makefile— target + both regression lists.
Regression
In flight at the moment of writing; expected 162/162 (was
161, +1 for tb_ee_core_beql).
Process notes
- Cross-check via BEQ in the same TB. Codex specifically
asked for the BEQ cross-check, and it caught a real
difference: Case 3 (BEQ not-taken) writes
$t5low bits while Case 2 (BEQL not-taken) does NOT. Without the cross- check, a regression where BEQL accidentally behaved like BEQ would silently pass on the "PC landed at PC+8" check alone. - OR-INTO vs OR-FROM-ZERO encoding bugs are easy to make.
My first TB pass had
ori $rt, $0, imm(overwriting), which loses info about whether the delay slot fired. Always useori $rt, $rt, imm(or similar accumulating op) in delay-slot probes so "did it fire?" is observable by a bitwise comparison rather than a value comparison. - The pattern continues to compress. Ch271 SQ took 5
edits + a TB. Ch272 DADDU took 4 + a TB. Ch273 SYSCALL HLE
took 2 + a TB (plus a runner update). Ch274 BEQL is 6 + a
TB. Each is a 1-day chapter at most. The qbert progression
is now
12 → 26,958 → 26,960 → 26,980 → 26,985 retires— the runner is doing its job.