Files
retroDE_ps2/docs/ch279_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

5.8 KiB

Ch279 closeout — LQ as single-beat low-word load; next blocker is PSUBB (MMI0)

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C90 instr=0x712A1248) — opcode 0x1C (MMI) + funct 0x08 (MMI0 sub-table) + sa 0x09 = PSUBB (Parallel Subtract Byte). qbert ran LQ + one more instruction, then trapped on the byte-wise SIMD subtract that sits at the heart of its stdlib byte-walker.

Numbers

Chapter Blocker qbert retire_count
Post-Ch277 (BNEL) PCPYLD at 0x00112C84 27,017
Post-Ch278 (PCPYLD) LQ at 0x00112C88 27,018
Post-Ch279 (LQ) PSUBB at 0x00112C90 27,020

2-retire delta: LQ + the next instruction (probably another register move) before PSUBB. The chain qbert is running here is the canonical SIMD byte-walker — load a 128-bit chunk, do a byte-wise compare/subtract against a sentinel, mask, test.

What landed

RTL — 4 surgical edits in ee_core_stub.sv

  1. localparam OP_LQ = 6'h1E alongside OP_LW.
  2. is_lq decode signal.
  3. Alignment: extended is_quad_access = is_sq || is_lq so the existing 16-byte alignment fault ea[3:0] != 0 covers LQ too. Misaligned LQ trips the AdEL path (it's a load, so the existing is_align_store group correctly doesn't include it — exception code is ADEL not ADES).
  4. FSM transition: added || is_lq to the LW/LB/LBU/LH/LHU loads list. The existing S_MEM_REQ → S_MEM_WAIT path handles the 32-bit read; S_MEM_WAIT's default writeback regfile[rt_idx] <= map_rd_data fires for LQ because none of is_lb/lbu/lh/lhu match (the if-else chain falls through to the default LW arm).
  5. !is_lq added to is_nop_class catch-all.

5 surgical edits total. The "reuse LW path" decision keeps the chapter small.

Focused TB — tb_ee_core_lq.sv

Cases:

  1. Exact qbert encoding shape: lq $t1, 0($a1) built via enc_i(OP_LQ, RA1, RT1, 0) and asserted to equal 0x78A90000. (We use this assertion to lock the encoding even though the actual exec uses lq $t1, 0($v0) with a different base — same opcode shape, different register index.)
  2. Value check: pre-poke phys 0x400..0x40F with 4 distinct patterns (0xAABBCCDD / 0x11112222 / 0x33334444 / 0x55556666) so a buggy implementation reading the wrong lane would fail. Verify $t1 = 0xAABBCCDD (the low 32 of the qword).
  3. LW cross-check: LW at the same EA reads the same value. Confirms LQ is decoded as a "single-beat low-word load" consistent with the existing LW path.
  4. No-modify check: post-halt hierarchical RAM peek confirms all 4 lanes still hold the pre-pokes (LQ doesn't write).

Result: retired=13 halt=1 trap=0 pc=0xbfc00128 errors=0 PASS.

Makefile + regression

  • tb_ee_core_lq target.
  • Added to both regression lists.
  • Regression: 166 → 167.

Recommendation for Codex's Ch280 — PSUBB

PSUBB at PC 0x00112C90, instr 0x712A1248:

  • opcode 0x1C (MMI)
  • funct 0x08 (MMI0 sub-table)
  • sa 0x09 (PSUBB within MMI0)
  • rs=$t1, rt=$t2, rd=$v0
  • psubb $v0, $t1, $t2

Architectural: rd[7+8i:8i] = rs[7+8i:8i] - rt[7+8i:8i] for i ∈ [0..15], 16 parallel byte subtractions with no carry/borrow between byte lanes.

For our 32-bit model: 4 parallel byte subtractions on the low 32 bits.

Implementation outline (mirrors Ch278 PCPYLD's narrow-decode):

  1. localparam FUNC_MMI0 = 6'h08.
  2. localparam MMI0_PSUBB = 5'h09.
  3. is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB).
  4. Add to is_rtype_alu group.
  5. New writeback arm:
    else if (is_psubb) begin
        rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
        rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
        rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
        rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
    end
    
    (Each byte sub is naturally modulo-256, no carry between lanes — that's the SIMD semantic.)
  6. Add !is_psubb to is_nop_class allow-list.

Focused TB:

  • Identity check: psubb $rd, $rs, $0$rd = $rs (each byte minus 0).
  • Lane-isolation check: psubb $rd, $rs, $rt with $rs = 0x10203040, $rt = 0x01010101$rd = 0x0F1F2F3F (proves each byte subtracts independently, no inter-lane carry/borrow).
  • Wrap check: psubb $rd, 0x00010203, 0x01010101$rd = 0xFF000102 (proves bit 7 doesn't carry into byte 1).
  • Exact qbert encoding assertion against 0x712A1248.

~4 LOC change.

Likely follow-ons in this byte-walker context: PCEQB (parallel compare equal byte) and PMFHL/LH (parallel move from HI/LO low halves). The string-walker pattern is:

  1. LQ a chunk of memory.
  2. PSUBB or PCEQB against a sentinel.
  3. PMFHL or some other reduction.
  4. Branch.

Files changed

  • rtl/ee/ee_core_stub.sv — 5 surgical edits.
  • sim/tb/integration/tb_ee_core_lq.sv — new focused TB.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 167/167.

Pattern review

9 qbert chapters. The MMI sub-decode pattern from Ch278 is about to be reused (PSUBB shares the same shape: MMI prefix

  • funct + sa selector). Anticipated: PSUBB in 4 edits, mirror of PCPYLD.
Chapter Blocker Edits Pattern
Ch271 SQ SQ 5 NEW 4-beat write
Ch272 DADDU DADDU 4 NEW ALU-low-32
Ch273 SYSCALL HLE SYSCALL #60 2 NEW gated dispatcher
Ch274 BEQL BEQL 6 NEW branch+squash
Ch275 SD SD 7 REUSE SQ counter
Ch276 DSLL DSLL 4 REUSE DADDU
Ch277 BNEL BNEL 6 REUSE BEQL squash
Ch278 PCPYLD PCPYLD 4 NEW MMI narrow-decode
Ch279 LQ LQ 5 REUSE LW path

The runner-pick-next-blocker loop is producing one chapter per sub-half-day. The qbert track is on rails.