Files

T

thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)

RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-29 20:10:50 -04:00

5.8 KiB

Raw Blame History

Ch279 closeout — LQ as single-beat low-word load; next blocker is PSUBB (MMI0)

Status: Closed. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00112C90 instr=0x712A1248) — opcode 0x1C (MMI) + funct 0x08 (MMI0 sub-table) + sa 0x09 = PSUBB (Parallel Subtract Byte). qbert ran LQ + one more instruction, then trapped on the byte-wise SIMD subtract that sits at the heart of its stdlib byte-walker.

Numbers

Chapter	Blocker	qbert retire_count
Post-Ch277 (BNEL)	PCPYLD at 0x00112C84	27,017
Post-Ch278 (PCPYLD)	LQ at 0x00112C88	27,018
Post-Ch279 (LQ)	PSUBB at 0x00112C90	27,020

2-retire delta: LQ + the next instruction (probably another register move) before PSUBB. The chain qbert is running here is the canonical SIMD byte-walker — load a 128-bit chunk, do a byte-wise compare/subtract against a sentinel, mask, test.

What landed

RTL — 4 surgical edits in `ee_core_stub.sv`

localparam OP_LQ = 6'h1E alongside OP_LW.
is_lq decode signal.
Alignment: extended is_quad_access = is_sq || is_lq so the existing 16-byte alignment fault ea[3:0] != 0 covers LQ too. Misaligned LQ trips the AdEL path (it's a load, so the existing is_align_store group correctly doesn't include it — exception code is ADEL not ADES).
FSM transition: added || is_lq to the LW/LB/LBU/LH/LHU loads list. The existing S_MEM_REQ → S_MEM_WAIT path handles the 32-bit read; S_MEM_WAIT's default writeback regfile[rt_idx] <= map_rd_data fires for LQ because none of is_lb/lbu/lh/lhu match (the if-else chain falls through to the default LW arm).
!is_lq added to is_nop_class catch-all.

5 surgical edits total. The "reuse LW path" decision keeps the chapter small.

Focused TB — `tb_ee_core_lq.sv`

Cases:

Exact qbert encoding shape: lq $t1, 0($a1) built via enc_i(OP_LQ, RA1, RT1, 0) and asserted to equal 0x78A90000. (We use this assertion to lock the encoding even though the actual exec uses lq $t1, 0($v0) with a different base — same opcode shape, different register index.)
Value check: pre-poke phys 0x400..0x40F with 4 distinct patterns (0xAABBCCDD / 0x11112222 / 0x33334444 / 0x55556666) so a buggy implementation reading the wrong lane would fail. Verify $t1 = 0xAABBCCDD (the low 32 of the qword).
LW cross-check: LW at the same EA reads the same value. Confirms LQ is decoded as a "single-beat low-word load" consistent with the existing LW path.
No-modify check: post-halt hierarchical RAM peek confirms all 4 lanes still hold the pre-pokes (LQ doesn't write).

Result: retired=13 halt=1 trap=0 pc=0xbfc00128 errors=0 PASS.

Makefile + regression

tb_ee_core_lq target.
Added to both regression lists.
Regression: 166 → 167.

Recommendation for Codex's Ch280 — PSUBB

PSUBB at PC 0x00112C90, instr 0x712A1248:

opcode 0x1C (MMI)
funct 0x08 (MMI0 sub-table)
sa 0x09 (PSUBB within MMI0)
rs=$t1, rt=$t2, rd=$v0
→ psubb $v0, $t1, $t2

Architectural: rd[7+8i:8i] = rs[7+8i:8i] - rt[7+8i:8i] for i ∈ [0..15], 16 parallel byte subtractions with no carry/borrow between byte lanes.

For our 32-bit model: 4 parallel byte subtractions on the low 32 bits.

Implementation outline (mirrors Ch278 PCPYLD's narrow-decode):

localparam FUNC_MMI0 = 6'h08.
localparam MMI0_PSUBB = 5'h09.
is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB).
Add to is_rtype_alu group.

New writeback arm:

else if (is_psubb) begin
    rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
    rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
    rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
    rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
end

(Each byte sub is naturally modulo-256, no carry between lanes — that's the SIMD semantic.)

Add !is_psubb to is_nop_class allow-list.

Focused TB:

Identity check: psubb $rd, $rs, $0 → $rd = $rs (each byte minus 0).
Lane-isolation check: psubb $rd, $rs, $rt with $rs = 0x10203040, $rt = 0x01010101 → $rd = 0x0F1F2F3F (proves each byte subtracts independently, no inter-lane carry/borrow).
Wrap check: psubb $rd, 0x00010203, 0x01010101 → $rd = 0xFF000102 (proves bit 7 doesn't carry into byte 1).
Exact qbert encoding assertion against 0x712A1248.

~4 LOC change.

Likely follow-ons in this byte-walker context: PCEQB (parallel compare equal byte) and PMFHL/LH (parallel move from HI/LO low halves). The string-walker pattern is:

LQ a chunk of memory.
PSUBB or PCEQB against a sentinel.
PMFHL or some other reduction.
Branch.

Files changed

rtl/ee/ee_core_stub.sv — 5 surgical edits.
sim/tb/integration/tb_ee_core_lq.sv — new focused TB.
sim/Makefile — target + both regression lists.

Regression

In flight; expected 167/167.

Pattern review

9 qbert chapters. The MMI sub-decode pattern from Ch278 is about to be reused (PSUBB shares the same shape: MMI prefix

funct + sa selector). Anticipated: PSUBB in 4 edits, mirror of PCPYLD.

Chapter	Blocker	Edits	Pattern
Ch271 SQ	SQ	5	NEW 4-beat write
Ch272 DADDU	DADDU	4	NEW ALU-low-32
Ch273 SYSCALL HLE	SYSCALL #60	2	NEW gated dispatcher
Ch274 BEQL	BEQL	6	NEW branch+squash
Ch275 SD	SD	7	REUSE SQ counter
Ch276 DSLL	DSLL	4	REUSE DADDU
Ch277 BNEL	BNEL	6	REUSE BEQL squash
Ch278 PCPYLD	PCPYLD	4	NEW MMI narrow-decode
Ch279 LQ	LQ	5	REUSE LW path

The runner-pick-next-blocker loop is producing one chapter per sub-half-day. The qbert track is on rails.

5.8 KiB Raw Blame History