Files
retroDE_ps2/docs/ch280_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

4.8 KiB

Ch280 closeout — PSUBB byte-wise SIMD; next blocker is PNOR

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C94 instr=0x70091CE9) — opcode 0x1C (MMI) + funct 0x29 (MMI3) + sa 0x13 = PNOR (Parallel Not-OR). qbert's byte-walker advanced past PSUBB on the first try.

Numbers

Chapter Blocker qbert retire_count
Post-Ch278 (PCPYLD) LQ at 0x00112C88 27,018
Post-Ch279 (LQ) PSUBB at 0x00112C90 27,020
Post-Ch280 (PSUBB) PNOR at 0x00112C94 27,021

1-retire delta — PSUBB itself retired, PNOR is the next instruction.

What landed

RTL — 5 surgical edits in ee_core_stub.sv

  1. Constants: FUNC_MMI0 = 6'h08 and MMI0_PSUBB = 5'h09.
  2. Decode: is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB). Three-way AND keeps the decode narrow — any other op=0x1C/funct=0x08 sub-instruction (PADDW, PADDH, PADDB, ...) continues to strict-trap.
  3. is_rtype_alu group: added is_psubb.
  4. rtype_alu_wb arm: 4 independent byte subtracts:
    else if (is_psubb) begin
        rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
        rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
        rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
        rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
    end
    
    Each lane is naturally modulo-256; no carry between bytes.
  5. is_nop_class allow: !is_psubb added.

5 LOC of real change.

Focused TB — tb_ee_core_psubb.sv

Three cases:

  1. Distinct lanes (qbert encoding shape): $t1 = 0x10203040, $t2 = 0x01020304$v0 = 0x0F1E2D3C. Encoder-output asserted to equal 0x712A1248 (qbert's literal instruction).
  2. All-wrap: $t3 = 0, $t4 = 0x01020304$t5 = 0xFFFEFDFC. Proves all 4 byte lanes underflow independently to 0xFx.
  3. No cross-byte borrow: $t6 = 0x12345600, $t7 = 0x00000001$t8 = 0x123456FF. The low byte borrows (0x00 - 0x01 = 0xFF) but must not propagate into byte 1. Byte 1 stays at 0x56 (= 0x56 - 0x00). This is the critical SIMD property.

Result: retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS.

Makefile + regression

  • tb_ee_core_psubb target.
  • Added to both regression lists.
  • Regression: 167 → 168.

Recommendation for Codex's Ch281 — PNOR

0x70091CE9 at PC 0x00112C94:

  • opcode 0x1C (MMI)
  • funct 0x29 (MMI3 sub-group)
  • sa 0x13 (PNOR within MMI3)
  • rs=$zero, rt=$t1, rd=$v1
  • pnor $v1, $0, $t1

Architectural: 128-bit rd = ~(rs | rt). For our 32-bit model: $rd[31:0] = ~($rs[31:0] | $rt[31:0])bit-identical to the existing standard NOR (SPECIAL funct 0x27). The only difference between PNOR and NOR is the architectural width.

With rs = $zero, PNOR is the canonical MIPS "NOT" pseudo-instruction: pnor $rd, $0, $rtnot $rd, $rt.

Implementation outline (mirrors Ch278 PCPYLD + Ch280 PSUBB):

  1. localparam FUNC_MMI3 = 6'h29.
  2. localparam MMI3_PNOR = 5'h13.
  3. is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR).
  4. Add to is_rtype_alu.
  5. Reuse the existing NOR writeback arm:
    else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val);
    
  6. Add !is_pnor to is_nop_class allow-list.

~4 LOC.

Focused TB:

  • Exact qbert encoding asserted == 0x70091CE9.
  • NOT-of-zero: pnor $rd, $0, $0$rd = 0xFFFFFFFF.
  • NOT-of-pattern: pnor $rd, $0, 0xAAAAAAAA$rd = 0x55555555.
  • General NOR: pnor $rd, 0xF0F0F0F0, 0x0F0F0F0F$rd = 0.

Likely follow-ons after PNOR: byte-walker reductions like PMFHL (move from HI/LO), or another mask op like PAND (MMI2 sa=0x12) / POR (MMI3 sa=0x12). Codex may want to consider folding the bitwise MMI family (PAND/POR/PXOR/PNOR) into one chapter since they're all reuses of existing ALU arms.

Files changed

  • rtl/ee/ee_core_stub.sv — 5 surgical edits.
  • sim/tb/integration/tb_ee_core_psubb.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 168/168.

Pattern review (10 qbert chapters)

Ch Blocker Edits Pattern
271 SQ first 5 NEW 4-beat write
272 DADDU 4 NEW ALU-low-32
273 SYSCALL HLE 2 NEW gated dispatcher
274 BEQL 6 NEW branch+squash
275 SD 7 REUSE SQ counter
276 DSLL 4 REUSE DADDU
277 BNEL 6 REUSE BEQL squash
278 PCPYLD 4 NEW MMI narrow-decode
279 LQ 5 REUSE LW path
280 PSUBB 5 REUSE MMI narrow (byte-SIMD)

10 chapters in, qbert at 27,021 retires, regression at 168. SIMD byte-walker pattern is locking in: LQ → PSUBB → PNOR (likely → PMFHL → branch). Each chapter is now ~4-5 LOC + a TB; cadence holds at sub-half-day per chapter.