Files
retroDE_ps2/docs/ch280_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

137 lines
4.8 KiB
Markdown

# Ch280 closeout — PSUBB byte-wise SIMD; next blocker is PNOR
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00112C94 instr=0x70091CE9)`
opcode `0x1C` (MMI) + funct `0x29` (MMI3) + sa `0x13` = **PNOR**
(Parallel Not-OR). qbert's byte-walker advanced past PSUBB on the
first try.
## Numbers
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
| Post-Ch279 (LQ) | PSUBB at 0x00112C90 | 27,020 |
| **Post-Ch280 (PSUBB)** | **PNOR at 0x00112C94** | **27,021** |
1-retire delta — PSUBB itself retired, PNOR is the next instruction.
## What landed
### RTL — 5 surgical edits in `ee_core_stub.sv`
1. **Constants**: `FUNC_MMI0 = 6'h08` and `MMI0_PSUBB = 5'h09`.
2. **Decode**: `is_psubb = is_mmi && (func == FUNC_MMI0) &&
(shamt == MMI0_PSUBB)`. Three-way AND keeps the decode narrow
— any other op=0x1C/funct=0x08 sub-instruction (PADDW, PADDH,
PADDB, ...) continues to strict-trap.
3. **`is_rtype_alu` group**: added `is_psubb`.
4. **`rtype_alu_wb` arm**: 4 independent byte subtracts:
```sv
else if (is_psubb) begin
rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
end
```
Each lane is naturally modulo-256; no carry between bytes.
5. **`is_nop_class` allow**: `!is_psubb` added.
5 LOC of real change.
### Focused TB — `tb_ee_core_psubb.sv`
Three cases:
1. **Distinct lanes (qbert encoding shape)**: `$t1 = 0x10203040`,
`$t2 = 0x01020304` → `$v0 = 0x0F1E2D3C`. Encoder-output
asserted to equal `0x712A1248` (qbert's literal instruction).
2. **All-wrap**: `$t3 = 0`, `$t4 = 0x01020304` → `$t5 = 0xFFFEFDFC`.
Proves all 4 byte lanes underflow independently to 0xFx.
3. **No cross-byte borrow**: `$t6 = 0x12345600`, `$t7 = 0x00000001`
→ `$t8 = 0x123456FF`. The low byte borrows (0x00 - 0x01 =
0xFF) but **must not propagate into byte 1**. Byte 1 stays
at 0x56 (= 0x56 - 0x00). This is the critical SIMD property.
Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`.
### Makefile + regression
- `tb_ee_core_psubb` target.
- Added to both regression lists.
- Regression: 167 → **168**.
## Recommendation for Codex's Ch281 — PNOR
`0x70091CE9` at PC `0x00112C94`:
- opcode 0x1C (MMI)
- funct 0x29 (MMI3 sub-group)
- sa 0x13 (PNOR within MMI3)
- rs=$zero, rt=$t1, rd=$v1
- → `pnor $v1, $0, $t1`
Architectural: 128-bit `rd = ~(rs | rt)`. For our 32-bit model:
`$rd[31:0] = ~($rs[31:0] | $rt[31:0])` — **bit-identical to the
existing standard NOR** (SPECIAL funct 0x27). The only difference
between PNOR and NOR is the architectural width.
With `rs = $zero`, PNOR is the canonical MIPS "NOT" pseudo-instruction:
`pnor $rd, $0, $rt` ≡ `not $rd, $rt`.
Implementation outline (mirrors Ch278 PCPYLD + Ch280 PSUBB):
1. `localparam FUNC_MMI3 = 6'h29`.
2. `localparam MMI3_PNOR = 5'h13`.
3. `is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR)`.
4. Add to `is_rtype_alu`.
5. **Reuse the existing NOR writeback arm**:
```sv
else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val);
```
6. Add `!is_pnor` to `is_nop_class` allow-list.
~4 LOC.
Focused TB:
- Exact qbert encoding asserted == `0x70091CE9`.
- NOT-of-zero: `pnor $rd, $0, $0` → `$rd = 0xFFFFFFFF`.
- NOT-of-pattern: `pnor $rd, $0, 0xAAAAAAAA` → `$rd = 0x55555555`.
- General NOR: `pnor $rd, 0xF0F0F0F0, 0x0F0F0F0F` → `$rd = 0`.
**Likely follow-ons after PNOR**: byte-walker reductions like
**PMFHL** (move from HI/LO), or another mask op like **PAND**
(MMI2 sa=0x12) / **POR** (MMI3 sa=0x12). Codex may want to
consider folding the bitwise MMI family (PAND/POR/PXOR/PNOR) into
one chapter since they're all reuses of existing ALU arms.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits.
- `sim/tb/integration/tb_ee_core_psubb.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Regression
In flight; expected **168/168**.
## Pattern review (10 qbert chapters)
| Ch | Blocker | Edits | Pattern |
|----|---------|-------|---------|
| 271 SQ | first | 5 | NEW 4-beat write |
| 272 DADDU | | 4 | NEW ALU-low-32 |
| 273 SYSCALL HLE | | 2 | NEW gated dispatcher |
| 274 BEQL | | 6 | NEW branch+squash |
| 275 SD | | 7 | REUSE SQ counter |
| 276 DSLL | | 4 | REUSE DADDU |
| 277 BNEL | | 6 | REUSE BEQL squash |
| 278 PCPYLD | | 4 | NEW MMI narrow-decode |
| 279 LQ | | 5 | REUSE LW path |
| **280 PSUBB** | | **5** | **REUSE MMI narrow (byte-SIMD)** |
10 chapters in, qbert at 27,021 retires, regression at 168.
SIMD byte-walker pattern is locking in: LQ → PSUBB → PNOR
(likely → PMFHL → branch). Each chapter is now ~4-5 LOC + a
TB; cadence holds at sub-half-day per chapter.