RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.8 KiB
Ch279 closeout — LQ as single-beat low-word load; next blocker is PSUBB (MMI0)
Status: Closed. Verdict from re-running qbert.elf:
elf_first_unsupported_opcode (pc=0x00112C90 instr=0x712A1248) —
opcode 0x1C (MMI) + funct 0x08 (MMI0 sub-table) + sa 0x09
= PSUBB (Parallel Subtract Byte). qbert ran LQ + one more
instruction, then trapped on the byte-wise SIMD subtract that
sits at the heart of its stdlib byte-walker.
Numbers
| Chapter | Blocker | qbert retire_count |
|---|---|---|
| Post-Ch277 (BNEL) | PCPYLD at 0x00112C84 | 27,017 |
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
| Post-Ch279 (LQ) | PSUBB at 0x00112C90 | 27,020 |
2-retire delta: LQ + the next instruction (probably another register move) before PSUBB. The chain qbert is running here is the canonical SIMD byte-walker — load a 128-bit chunk, do a byte-wise compare/subtract against a sentinel, mask, test.
What landed
RTL — 4 surgical edits in ee_core_stub.sv
localparam OP_LQ = 6'h1EalongsideOP_LW.is_lqdecode signal.- Alignment: extended
is_quad_access = is_sq || is_lqso the existing 16-byte alignment faultea[3:0] != 0covers LQ too. Misaligned LQ trips the AdEL path (it's a load, so the existingis_align_storegroup correctly doesn't include it — exception code is ADEL not ADES). - FSM transition: added
|| is_lqto the LW/LB/LBU/LH/LHU loads list. The existingS_MEM_REQ → S_MEM_WAITpath handles the 32-bit read;S_MEM_WAIT's default writebackregfile[rt_idx] <= map_rd_datafires for LQ because none of is_lb/lbu/lh/lhu match (the if-else chain falls through to the default LW arm). !is_lqadded tois_nop_classcatch-all.
5 surgical edits total. The "reuse LW path" decision keeps the chapter small.
Focused TB — tb_ee_core_lq.sv
Cases:
- Exact qbert encoding shape:
lq $t1, 0($a1)built viaenc_i(OP_LQ, RA1, RT1, 0)and asserted to equal0x78A90000. (We use this assertion to lock the encoding even though the actual exec useslq $t1, 0($v0)with a different base — same opcode shape, different register index.) - Value check: pre-poke phys 0x400..0x40F with 4 distinct
patterns (
0xAABBCCDD / 0x11112222 / 0x33334444 / 0x55556666) so a buggy implementation reading the wrong lane would fail. Verify$t1 = 0xAABBCCDD(the low 32 of the qword). - LW cross-check: LW at the same EA reads the same value. Confirms LQ is decoded as a "single-beat low-word load" consistent with the existing LW path.
- No-modify check: post-halt hierarchical RAM peek confirms all 4 lanes still hold the pre-pokes (LQ doesn't write).
Result: retired=13 halt=1 trap=0 pc=0xbfc00128 errors=0 PASS.
Makefile + regression
tb_ee_core_lqtarget.- Added to both regression lists.
- Regression: 166 → 167.
Recommendation for Codex's Ch280 — PSUBB
PSUBB at PC 0x00112C90, instr 0x712A1248:
- opcode 0x1C (MMI)
- funct 0x08 (MMI0 sub-table)
- sa 0x09 (PSUBB within MMI0)
- rs=$t1, rt=$t2, rd=$v0
- →
psubb $v0, $t1, $t2
Architectural: rd[7+8i:8i] = rs[7+8i:8i] - rt[7+8i:8i] for
i ∈ [0..15], 16 parallel byte subtractions with no carry/borrow
between byte lanes.
For our 32-bit model: 4 parallel byte subtractions on the low 32 bits.
Implementation outline (mirrors Ch278 PCPYLD's narrow-decode):
localparam FUNC_MMI0 = 6'h08.localparam MMI0_PSUBB = 5'h09.is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB).- Add to
is_rtype_alugroup. - New writeback arm:
(Each byte sub is naturally modulo-256, no carry between lanes — that's the SIMD semantic.)
else if (is_psubb) begin rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0]; rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8]; rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16]; rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24]; end - Add
!is_psubbtois_nop_classallow-list.
Focused TB:
- Identity check:
psubb $rd, $rs, $0→$rd = $rs(each byte minus 0). - Lane-isolation check:
psubb $rd, $rs, $rtwith$rs = 0x10203040,$rt = 0x01010101→$rd = 0x0F1F2F3F(proves each byte subtracts independently, no inter-lane carry/borrow). - Wrap check:
psubb $rd, 0x00010203, 0x01010101→$rd = 0xFF000102(proves bit 7 doesn't carry into byte 1). - Exact qbert encoding assertion against
0x712A1248.
~4 LOC change.
Likely follow-ons in this byte-walker context: PCEQB (parallel compare equal byte) and PMFHL/LH (parallel move from HI/LO low halves). The string-walker pattern is:
- LQ a chunk of memory.
- PSUBB or PCEQB against a sentinel.
- PMFHL or some other reduction.
- Branch.
Files changed
rtl/ee/ee_core_stub.sv— 5 surgical edits.sim/tb/integration/tb_ee_core_lq.sv— new focused TB.sim/Makefile— target + both regression lists.
Regression
In flight; expected 167/167.
Pattern review
9 qbert chapters. The MMI sub-decode pattern from Ch278 is about to be reused (PSUBB shares the same shape: MMI prefix
- funct + sa selector). Anticipated: PSUBB in 4 edits, mirror of PCPYLD.
| Chapter | Blocker | Edits | Pattern |
|---|---|---|---|
| Ch271 SQ | SQ | 5 | NEW 4-beat write |
| Ch272 DADDU | DADDU | 4 | NEW ALU-low-32 |
| Ch273 SYSCALL HLE | SYSCALL #60 | 2 | NEW gated dispatcher |
| Ch274 BEQL | BEQL | 6 | NEW branch+squash |
| Ch275 SD | SD | 7 | REUSE SQ counter |
| Ch276 DSLL | DSLL | 4 | REUSE DADDU |
| Ch277 BNEL | BNEL | 6 | REUSE BEQL squash |
| Ch278 PCPYLD | PCPYLD | 4 | NEW MMI narrow-decode |
| Ch279 LQ | LQ | 5 | REUSE LW path |
The runner-pick-next-blocker loop is producing one chapter per sub-half-day. The qbert track is on rails.