Files
retroDE_ps2/docs/ch281_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

5.1 KiB
Raw Permalink Blame History

Ch281 closeout — MMI3/PNOR (canonical NOT); next blocker is PAND

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C98 instr=0x70431489) — opcode 0x1C (MMI) + funct 0x09 (MMI2) + sa 0x12 = PAND (Parallel AND). qbert is now deep into the SIMD byte-walker's mask-and-reduce stage: PSUBB → PNOR → PAND.

Numbers

Chapter Blocker qbert retire_count
Post-Ch279 (LQ) PSUBB at 0x00112C90 27,020
Post-Ch280 (PSUBB) PNOR at 0x00112C94 27,021
Post-Ch281 (PNOR) PAND at 0x00112C98 27,022

1-retire delta — PNOR retired, PAND traps next.

What landed

RTL — 5 surgical edits in ee_core_stub.sv

  1. Constants: FUNC_MMI3 = 6'h29, MMI3_PNOR = 5'h13.
  2. Decode: is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR). Same three-way AND as Ch278/Ch280.
  3. is_rtype_alu group: added is_pnor.
  4. Writeback (REUSE): extended the existing NOR arm to else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val). Architectural 128-bit PNOR collapses to a regular 32-bit bitwise NOR for the low lane.
  5. is_nop_class allow: !is_pnor added.

5 LOC of real change. Pure pattern reuse from Ch280 PSUBB (same MMI narrow-decode shape) plus reuse of the existing NOR writeback arm.

Focused TB — tb_ee_core_pnor.sv

Three cases:

  1. qbert exact encoding: pnor $v1, $zero, $t1. Encoder asserted == 0x70091CE9. With $t1 = 0x12345678$v1 = ~0x12345678 = 0xEDCBA987.
  2. NOT-of-zero: pnor $t2, $0, $00xFFFFFFFF. Both operands zero; result is all-ones.
  3. General NOR: $t3 = 0xF0F0F0F0, $t4 = 0x0F0F0F0F$t5 = ~(0xF0F0F0F0 | 0x0F0F0F0F) = ~0xFFFFFFFF = 0. Locks in the "general two-operand NOR" path even though qbert's specific usage is the NOT-pseudo form.

Result: retired=22 halt=1 trap=0 pc=0xbfc0014c errors=0 PASS.

Makefile + regression

  • tb_ee_core_pnor target.
  • Added to both regression lists.
  • Regression: 168 → 169.

qbert's SIMD byte-walker — pipeline shape now clear

Six MMI/load chapters (Ch278Ch281, plus Ch271 SQ and Ch279 LQ) have surfaced the full byte-walker shape:

0x00112C88: lq     $t1, 0($a1)           ; Ch279 — load 128-bit chunk
0x00112C8C: <one  instr we haven't seen the next blocker for>
0x00112C90: psubb  $v0, $t1, $t2         ; Ch280 — per-byte subtract
0x00112C94: pnor   $v1, $zero, $t1       ; Ch281 — ~$t1 (mask gen)
0x00112C98: pand   $v0, $v0, $v1         ; Ch282 — mask the result
... reduction continues ...

This is the classic "find a zero byte" or "detect sentinel byte" SIMD loop — PSUBB against a key, PNOR to invert the bits, PAND with a mask to isolate the lanes where the condition holds, then PMFHL or similar to reduce to a single GPR for a branch test.

Recommendation for Codex's Ch282 — PAND

0x70431489 at PC 0x00112C98:

  • opcode 0x1C (MMI)
  • funct 0x09 (MMI2)
  • sa 0x12 (PAND within MMI2)
  • rs=$v0, rt=$v1, rd=$v0
  • pand $v0, $v0, $v1

Architectural: 128-bit $rd = $rs & $rt. For our 32-bit model: bit-identical to standard AND (SPECIAL funct 0x24). Same shape as PNOR/NOR — different opcode, reused writeback arm.

Implementation outline (mirrors Ch281 PNOR exactly):

  1. localparam MMI2_PAND = 5'h12.
  2. is_pand = is_mmi && (func == FUNC_MMI2) && (shamt == MMI2_PAND). The MMI2 funct constant already exists from Ch278.
  3. Add to is_rtype_alu.
  4. Reuse the existing AND writeback arm:
    else if (is_and || is_pand) rtype_alu_wb = rs_val & rt_val;
    
  5. Add !is_pand to is_nop_class.

~4 LOC.

Focused TB:

  • Exact qbert encoding asserted == 0x70431489.
  • General AND case: pand $rd, 0xFFFFFFFF, 0xAAAAAAAA0xAAAAAAAA.
  • All-zero case: pand $rd, 0xFFFFFFFF, 0x00000000 → 0.

Likely follow-ons after PAND: PMFHL (move from HI/LO low halves) for the reduction — the byte-walker needs to fold the masked vector down to a scalar for branching. Or PEXTLW (parallel extract low word) for a different reduction shape.

Pattern review (11 chapters)

Ch Blocker Edits Pattern
271 SQ first 5 NEW 4-beat write
272 DADDU 4 NEW ALU-low-32
273 SYSCALL HLE 2 NEW gated dispatcher
274 BEQL 6 NEW branch+squash
275 SD 7 REUSE SQ counter
276 DSLL 4 REUSE DADDU
277 BNEL 6 REUSE BEQL squash
278 PCPYLD 4 NEW MMI narrow-decode
279 LQ 5 REUSE LW path
280 PSUBB 5 REUSE MMI narrow (byte-SIMD)
281 PNOR 5 REUSE MMI narrow + reuse NOR arm

5 NEW patterns + 6 REUSE chapters. The reuse density continues to climb — Ch282 PAND will be the most-reused chapter yet (MMI narrow-decode + standard-AND writeback, both already in place).

Files changed

  • rtl/ee/ee_core_stub.sv — 5 surgical edits.
  • sim/tb/integration/tb_ee_core_pnor.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 169/169.