Files
retroDE_ps2/docs/ch282_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

5.5 KiB

Ch282 closeout — PAND; next blocker is PCPYUD (the first "upper-half" MMI op)

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112CA0 instr=0x704923A9) — opcode 0x1C (MMI) + funct 0x29 (MMI3) + sa 0x0E = PCPYUD (Parallel Copy Upper Doubleword). This is the first MMI op that reads from the architectural upper 64 bits of a source register — a place our 32-bit-GPR model has never been able to represent.

Numbers

Chapter Blocker qbert retire_count
Post-Ch280 (PSUBB) PNOR at 0x00112C94 27,021
Post-Ch281 (PNOR) PAND at 0x00112C98 27,022
Post-Ch282 (PAND) PCPYUD at 0x00112CA0 27,024

2-retire delta — PAND retired plus one instruction at PC 0x00112C9C (probably another byte-broadcast or comparison), then PCPYUD traps.

What landed

RTL — 5 surgical edits in ee_core_stub.sv

  1. localparam MMI2_PAND = 5'h12 alongside MMI2_PCPYLD.
  2. is_pand = is_mmi && (func == FUNC_MMI2) && (shamt == MMI2_PAND).
  3. Added is_pand to is_rtype_alu.
  4. Reused the existing AND writeback: if (is_and || is_pand) rtype_alu_wb = rs_val & rt_val.
  5. !is_pand added to is_nop_class.

Highest-reuse chapter yet — MMI narrow-decode + AND writeback arm both already in place from prior chapters.

Focused TB — tb_ee_core_pand.sv

Three cases:

  1. Exact qbert encoding: pand $v0, $v0, $v1 (rs=2, rt=3, rd=2, sa=0x12, funct=0x09). Encoder asserted 0x70431489. $v0 = 0xFFFFFFFF & 0xAAAAAAAA = 0xAAAAAAAA.
  2. Disjoint masks: 0xF0F0F0F0 & 0x0F0F0F0F = 0 (proves pure bitwise AND).
  3. Zero-mask: 0xDEADBEEF & 0 = 0.

Result: retired=24 halt=1 trap=0 pc=0xbfc00154 errors=0 PASS.

Makefile + regression

  • tb_ee_core_pand target.
  • Added to both regression lists.
  • Regression: 169 → 170.

Ch283 framing — PCPYUD: a fork in the road

Decoded: pcpyud $a0, $v0, $t1 (rs=$v0, rt=$t1, rd=$a0).

  • Architectural: $rd[127:64] = $rs[127:64]; $rd[63:0] = $rt[127:64]. Extracts the upper-64 of both source operands; the upper-64 of rt becomes the lower-64 of rd.

The fundamental problem: every prior chapter has lived inside a "low 32 bits only" approximation. The upper 96 bits of every GPR are silently 0 in our model — never written by SQ/SD/PCPYLD/PSUBB/PNOR/PAND. PCPYUD is the first op that reads from that upper half, so the question becomes unavoidable:

  • Option A — preserve the approximation: implement PCPYUD as $rd = 0 always. Honest "this op reads from a region we don't model, which is always zero by construction." qbert will see all-zero PCPYUD results and may falsely conclude it found a sentinel byte every iteration of the byte-walker. Silent divergence; the next 5-10 chapters of blockers might be illusory (cascading from the wrong PCPYUD result) rather than real qbert needs.
  • Option B — NOP-class PCPYUD (do not allow): leave it trapping; surface this as the "model boundary" that warrants a real-128-bit-GPR pivot in a future chapter. qbert wouldn't continue past 27,024 until that pivot happens.
  • Option C — implement 128-bit GPRs: faithful but a big cross-cutting change (regfile width, every ALU arm, every load/store writeback). Multiple chapters of work. Real semantic correctness, but breaks the "one op per chapter" cadence we've held since Ch271.

My read: at minimum, do NOT silently NOP-class to 0. The qbert byte-walker's correctness depends on the upper 8 bytes of every LQ. Even if we land "Option B" first (keep the trap), the next chapter genuinely should be the 128-bit GPR pivot.

This is the right moment to step back and frame the broader question with Codex. The MMI-narrow-decode cadence has worked beautifully for ops where low-32-bit semantics happen to suffice (PCPYLD, PSUBB, PNOR, PAND). It hits a wall at upper-half ops. Either:

  1. Bite the 128-bit GPR bullet now (Ch283 = "expand regfile to 128 bits + propagate through every LQ/SQ/SD/PCPYLD/... writeback").
  2. Accept that qbert is "as far as we can get" without 128-bit GPRs and pivot to a different ELF (homebrew that's 32-bit-clean) or to hardware-facing deliverables.

I'd recommend (1) is the right next move — qbert has been a productive test vector, and the SIMD byte-walker shape is universal across PS2 stdlib code. Future game ELFs will hit the same wall.

Files changed

  • rtl/ee/ee_core_stub.sv — 5 surgical edits.
  • sim/tb/integration/tb_ee_core_pand.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 170/170.

Pattern review (12 chapters)

Ch Blocker Edits Pattern
271 SQ first qbert 5 NEW 4-beat write
272 DADDU 4 NEW ALU-low-32
273 SYSCALL HLE 2 NEW gated dispatcher
274 BEQL 6 NEW branch+squash
275 SD 7 REUSE SQ counter
276 DSLL 4 REUSE DADDU
277 BNEL 6 REUSE BEQL squash
278 PCPYLD 4 NEW MMI narrow-decode
279 LQ 5 REUSE LW path
280 PSUBB 5 REUSE MMI narrow (byte-SIMD new)
281 PNOR 5 REUSE MMI narrow + NOR arm
282 PAND 5 REUSE MMI narrow + AND arm

5 NEW patterns + 7 REUSE chapters. The reuse density is at its peak right now, but Ch283 PCPYUD is signaling that the "low-32-only" approximation has reached its natural boundary. Codex's framing on whether to widen the regfile or pivot elsewhere will set the direction for the next stretch.