RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.8 KiB
Ch280 closeout — PSUBB byte-wise SIMD; next blocker is PNOR
Status: Closed. Verdict from re-running qbert.elf:
elf_first_unsupported_opcode (pc=0x00112C94 instr=0x70091CE9) —
opcode 0x1C (MMI) + funct 0x29 (MMI3) + sa 0x13 = PNOR
(Parallel Not-OR). qbert's byte-walker advanced past PSUBB on the
first try.
Numbers
| Chapter | Blocker | qbert retire_count |
|---|---|---|
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
| Post-Ch279 (LQ) | PSUBB at 0x00112C90 | 27,020 |
| Post-Ch280 (PSUBB) | PNOR at 0x00112C94 | 27,021 |
1-retire delta — PSUBB itself retired, PNOR is the next instruction.
What landed
RTL — 5 surgical edits in ee_core_stub.sv
- Constants:
FUNC_MMI0 = 6'h08andMMI0_PSUBB = 5'h09. - Decode:
is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB). Three-way AND keeps the decode narrow — any other op=0x1C/funct=0x08 sub-instruction (PADDW, PADDH, PADDB, ...) continues to strict-trap. is_rtype_alugroup: addedis_psubb.rtype_alu_wbarm: 4 independent byte subtracts:Each lane is naturally modulo-256; no carry between bytes.else if (is_psubb) begin rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0]; rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8]; rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16]; rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24]; endis_nop_classallow:!is_psubbadded.
5 LOC of real change.
Focused TB — tb_ee_core_psubb.sv
Three cases:
- Distinct lanes (qbert encoding shape):
$t1 = 0x10203040,$t2 = 0x01020304→$v0 = 0x0F1E2D3C. Encoder-output asserted to equal0x712A1248(qbert's literal instruction). - All-wrap:
$t3 = 0,$t4 = 0x01020304→$t5 = 0xFFFEFDFC. Proves all 4 byte lanes underflow independently to 0xFx. - No cross-byte borrow:
$t6 = 0x12345600,$t7 = 0x00000001→$t8 = 0x123456FF. The low byte borrows (0x00 - 0x01 = 0xFF) but must not propagate into byte 1. Byte 1 stays at 0x56 (= 0x56 - 0x00). This is the critical SIMD property.
Result: retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS.
Makefile + regression
tb_ee_core_psubbtarget.- Added to both regression lists.
- Regression: 167 → 168.
Recommendation for Codex's Ch281 — PNOR
0x70091CE9 at PC 0x00112C94:
- opcode 0x1C (MMI)
- funct 0x29 (MMI3 sub-group)
- sa 0x13 (PNOR within MMI3)
- rs=$zero, rt=$t1, rd=$v1
- →
pnor $v1, $0, $t1
Architectural: 128-bit rd = ~(rs | rt). For our 32-bit model:
$rd[31:0] = ~($rs[31:0] | $rt[31:0]) — bit-identical to the
existing standard NOR (SPECIAL funct 0x27). The only difference
between PNOR and NOR is the architectural width.
With rs = $zero, PNOR is the canonical MIPS "NOT" pseudo-instruction:
pnor $rd, $0, $rt ≡ not $rd, $rt.
Implementation outline (mirrors Ch278 PCPYLD + Ch280 PSUBB):
localparam FUNC_MMI3 = 6'h29.localparam MMI3_PNOR = 5'h13.is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR).- Add to
is_rtype_alu. - Reuse the existing NOR writeback arm:
else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val); - Add
!is_pnortois_nop_classallow-list.
~4 LOC.
Focused TB:
- Exact qbert encoding asserted ==
0x70091CE9. - NOT-of-zero:
pnor $rd, $0, $0→$rd = 0xFFFFFFFF. - NOT-of-pattern:
pnor $rd, $0, 0xAAAAAAAA→$rd = 0x55555555. - General NOR:
pnor $rd, 0xF0F0F0F0, 0x0F0F0F0F→$rd = 0.
Likely follow-ons after PNOR: byte-walker reductions like PMFHL (move from HI/LO), or another mask op like PAND (MMI2 sa=0x12) / POR (MMI3 sa=0x12). Codex may want to consider folding the bitwise MMI family (PAND/POR/PXOR/PNOR) into one chapter since they're all reuses of existing ALU arms.
Files changed
rtl/ee/ee_core_stub.sv— 5 surgical edits.sim/tb/integration/tb_ee_core_psubb.sv— new focused TB.sim/Makefile— target + both regression lists.
Regression
In flight; expected 168/168.
Pattern review (10 qbert chapters)
| Ch | Blocker | Edits | Pattern |
|---|---|---|---|
| 271 SQ | first | 5 | NEW 4-beat write |
| 272 DADDU | 4 | NEW ALU-low-32 | |
| 273 SYSCALL HLE | 2 | NEW gated dispatcher | |
| 274 BEQL | 6 | NEW branch+squash | |
| 275 SD | 7 | REUSE SQ counter | |
| 276 DSLL | 4 | REUSE DADDU | |
| 277 BNEL | 6 | REUSE BEQL squash | |
| 278 PCPYLD | 4 | NEW MMI narrow-decode | |
| 279 LQ | 5 | REUSE LW path | |
| 280 PSUBB | 5 | REUSE MMI narrow (byte-SIMD) |
10 chapters in, qbert at 27,021 retires, regression at 168. SIMD byte-walker pattern is locking in: LQ → PSUBB → PNOR (likely → PMFHL → branch). Each chapter is now ~4-5 LOC + a TB; cadence holds at sub-half-day per chapter.