Files
retroDE_ps2/docs/ch279_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

159 lines
5.8 KiB
Markdown

# Ch279 closeout — LQ as single-beat low-word load; next blocker is PSUBB (MMI0)
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00112C90 instr=0x712A1248)`
opcode `0x1C` (MMI) + funct `0x08` (MMI0 sub-table) + sa `0x09`
= **PSUBB** (Parallel Subtract Byte). qbert ran LQ + one more
instruction, then trapped on the byte-wise SIMD subtract that
sits at the heart of its stdlib byte-walker.
## Numbers
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch277 (BNEL) | PCPYLD at 0x00112C84 | 27,017 |
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
| **Post-Ch279 (LQ)** | **PSUBB at 0x00112C90** | **27,020** |
2-retire delta: LQ + the next instruction (probably another
register move) before PSUBB. The chain qbert is running here is
the canonical SIMD byte-walker — load a 128-bit chunk, do a
byte-wise compare/subtract against a sentinel, mask, test.
## What landed
### RTL — 4 surgical edits in `ee_core_stub.sv`
1. `localparam OP_LQ = 6'h1E` alongside `OP_LW`.
2. `is_lq` decode signal.
3. **Alignment**: extended `is_quad_access = is_sq || is_lq`
so the existing 16-byte alignment fault `ea[3:0] != 0` covers
LQ too. Misaligned LQ trips the AdEL path (it's a load, so
the existing `is_align_store` group correctly doesn't include
it — exception code is ADEL not ADES).
4. **FSM transition**: added `|| is_lq` to the LW/LB/LBU/LH/LHU
loads list. The existing `S_MEM_REQ → S_MEM_WAIT` path
handles the 32-bit read; `S_MEM_WAIT`'s default writeback
`regfile[rt_idx] <= map_rd_data` fires for LQ because none
of is_lb/lbu/lh/lhu match (the if-else chain falls through
to the default LW arm).
5. `!is_lq` added to `is_nop_class` catch-all.
5 surgical edits total. The "reuse LW path" decision keeps the
chapter small.
### Focused TB — `tb_ee_core_lq.sv`
Cases:
1. **Exact qbert encoding shape**: `lq $t1, 0($a1)` built via
`enc_i(OP_LQ, RA1, RT1, 0)` and asserted to equal
`0x78A90000`. (We use this assertion to lock the encoding
even though the actual exec uses `lq $t1, 0($v0)` with a
different base — same opcode shape, different register
index.)
2. **Value check**: pre-poke phys 0x400..0x40F with 4 distinct
patterns (`0xAABBCCDD / 0x11112222 / 0x33334444 / 0x55556666`)
so a buggy implementation reading the wrong lane would fail.
Verify `$t1 = 0xAABBCCDD` (the low 32 of the qword).
3. **LW cross-check**: LW at the same EA reads the same value.
Confirms LQ is decoded as a "single-beat low-word load"
consistent with the existing LW path.
4. **No-modify check**: post-halt hierarchical RAM peek
confirms all 4 lanes still hold the pre-pokes (LQ doesn't
write).
Result: `retired=13 halt=1 trap=0 pc=0xbfc00128 errors=0 PASS`.
### Makefile + regression
- `tb_ee_core_lq` target.
- Added to both regression lists.
- Regression: 166 → **167**.
## Recommendation for Codex's Ch280 — PSUBB
PSUBB at PC `0x00112C90`, instr `0x712A1248`:
- opcode 0x1C (MMI)
- funct 0x08 (MMI0 sub-table)
- sa 0x09 (PSUBB within MMI0)
- rs=$t1, rt=$t2, rd=$v0
-`psubb $v0, $t1, $t2`
Architectural: `rd[7+8i:8i] = rs[7+8i:8i] - rt[7+8i:8i]` for
i ∈ [0..15], 16 parallel byte subtractions with no carry/borrow
between byte lanes.
For our 32-bit model: 4 parallel byte subtractions on the low
32 bits.
Implementation outline (mirrors Ch278 PCPYLD's narrow-decode):
1. `localparam FUNC_MMI0 = 6'h08`.
2. `localparam MMI0_PSUBB = 5'h09`.
3. `is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB)`.
4. Add to `is_rtype_alu` group.
5. New writeback arm:
```sv
else if (is_psubb) begin
rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
end
```
(Each byte sub is naturally modulo-256, no carry between
lanes — that's the SIMD semantic.)
6. Add `!is_psubb` to `is_nop_class` allow-list.
Focused TB:
- Identity check: `psubb $rd, $rs, $0` → `$rd = $rs` (each byte
minus 0).
- Lane-isolation check: `psubb $rd, $rs, $rt` with `$rs =
0x10203040`, `$rt = 0x01010101` → `$rd = 0x0F1F2F3F` (proves
each byte subtracts independently, no inter-lane carry/borrow).
- Wrap check: `psubb $rd, 0x00010203, 0x01010101` → `$rd =
0xFF000102` (proves bit 7 doesn't carry into byte 1).
- Exact qbert encoding assertion against `0x712A1248`.
~4 LOC change.
**Likely follow-ons** in this byte-walker context: **PCEQB**
(parallel compare equal byte) and **PMFHL/LH** (parallel move
from HI/LO low halves). The string-walker pattern is:
1. LQ a chunk of memory.
2. PSUBB or PCEQB against a sentinel.
3. PMFHL or some other reduction.
4. Branch.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits.
- `sim/tb/integration/tb_ee_core_lq.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Regression
In flight; expected **167/167**.
## Pattern review
9 qbert chapters. The MMI sub-decode pattern from Ch278 is
about to be reused (PSUBB shares the same shape: MMI prefix
+ funct + sa selector). Anticipated: PSUBB in 4 edits, mirror
of PCPYLD.
| Chapter | Blocker | Edits | Pattern |
|---------|---------|-------|---------|
| Ch271 SQ | SQ | 5 | NEW 4-beat write |
| Ch272 DADDU | DADDU | 4 | NEW ALU-low-32 |
| Ch273 SYSCALL HLE | SYSCALL #60 | 2 | NEW gated dispatcher |
| Ch274 BEQL | BEQL | 6 | NEW branch+squash |
| Ch275 SD | SD | 7 | REUSE SQ counter |
| Ch276 DSLL | DSLL | 4 | REUSE DADDU |
| Ch277 BNEL | BNEL | 6 | REUSE BEQL squash |
| Ch278 PCPYLD | PCPYLD | 4 | NEW MMI narrow-decode |
| **Ch279 LQ** | **LQ** | **5** | **REUSE LW path** |
The runner-pick-next-blocker loop is producing one chapter per
sub-half-day. The qbert track is on rails.