ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
137 lines
4.8 KiB
Markdown
137 lines
4.8 KiB
Markdown
# Ch280 closeout — PSUBB byte-wise SIMD; next blocker is PNOR
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112C94 instr=0x70091CE9)` —
|
|
opcode `0x1C` (MMI) + funct `0x29` (MMI3) + sa `0x13` = **PNOR**
|
|
(Parallel Not-OR). qbert's byte-walker advanced past PSUBB on the
|
|
first try.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
|
|
| Post-Ch279 (LQ) | PSUBB at 0x00112C90 | 27,020 |
|
|
| **Post-Ch280 (PSUBB)** | **PNOR at 0x00112C94** | **27,021** |
|
|
|
|
1-retire delta — PSUBB itself retired, PNOR is the next instruction.
|
|
|
|
## What landed
|
|
|
|
### RTL — 5 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. **Constants**: `FUNC_MMI0 = 6'h08` and `MMI0_PSUBB = 5'h09`.
|
|
2. **Decode**: `is_psubb = is_mmi && (func == FUNC_MMI0) &&
|
|
(shamt == MMI0_PSUBB)`. Three-way AND keeps the decode narrow
|
|
— any other op=0x1C/funct=0x08 sub-instruction (PADDW, PADDH,
|
|
PADDB, ...) continues to strict-trap.
|
|
3. **`is_rtype_alu` group**: added `is_psubb`.
|
|
4. **`rtype_alu_wb` arm**: 4 independent byte subtracts:
|
|
```sv
|
|
else if (is_psubb) begin
|
|
rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
|
|
rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
|
|
rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
|
|
rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
|
|
end
|
|
```
|
|
Each lane is naturally modulo-256; no carry between bytes.
|
|
5. **`is_nop_class` allow**: `!is_psubb` added.
|
|
|
|
5 LOC of real change.
|
|
|
|
### Focused TB — `tb_ee_core_psubb.sv`
|
|
|
|
Three cases:
|
|
|
|
1. **Distinct lanes (qbert encoding shape)**: `$t1 = 0x10203040`,
|
|
`$t2 = 0x01020304` → `$v0 = 0x0F1E2D3C`. Encoder-output
|
|
asserted to equal `0x712A1248` (qbert's literal instruction).
|
|
2. **All-wrap**: `$t3 = 0`, `$t4 = 0x01020304` → `$t5 = 0xFFFEFDFC`.
|
|
Proves all 4 byte lanes underflow independently to 0xFx.
|
|
3. **No cross-byte borrow**: `$t6 = 0x12345600`, `$t7 = 0x00000001`
|
|
→ `$t8 = 0x123456FF`. The low byte borrows (0x00 - 0x01 =
|
|
0xFF) but **must not propagate into byte 1**. Byte 1 stays
|
|
at 0x56 (= 0x56 - 0x00). This is the critical SIMD property.
|
|
|
|
Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_psubb` target.
|
|
- Added to both regression lists.
|
|
- Regression: 167 → **168**.
|
|
|
|
## Recommendation for Codex's Ch281 — PNOR
|
|
|
|
`0x70091CE9` at PC `0x00112C94`:
|
|
- opcode 0x1C (MMI)
|
|
- funct 0x29 (MMI3 sub-group)
|
|
- sa 0x13 (PNOR within MMI3)
|
|
- rs=$zero, rt=$t1, rd=$v1
|
|
- → `pnor $v1, $0, $t1`
|
|
|
|
Architectural: 128-bit `rd = ~(rs | rt)`. For our 32-bit model:
|
|
`$rd[31:0] = ~($rs[31:0] | $rt[31:0])` — **bit-identical to the
|
|
existing standard NOR** (SPECIAL funct 0x27). The only difference
|
|
between PNOR and NOR is the architectural width.
|
|
|
|
With `rs = $zero`, PNOR is the canonical MIPS "NOT" pseudo-instruction:
|
|
`pnor $rd, $0, $rt` ≡ `not $rd, $rt`.
|
|
|
|
Implementation outline (mirrors Ch278 PCPYLD + Ch280 PSUBB):
|
|
|
|
1. `localparam FUNC_MMI3 = 6'h29`.
|
|
2. `localparam MMI3_PNOR = 5'h13`.
|
|
3. `is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR)`.
|
|
4. Add to `is_rtype_alu`.
|
|
5. **Reuse the existing NOR writeback arm**:
|
|
```sv
|
|
else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val);
|
|
```
|
|
6. Add `!is_pnor` to `is_nop_class` allow-list.
|
|
|
|
~4 LOC.
|
|
|
|
Focused TB:
|
|
- Exact qbert encoding asserted == `0x70091CE9`.
|
|
- NOT-of-zero: `pnor $rd, $0, $0` → `$rd = 0xFFFFFFFF`.
|
|
- NOT-of-pattern: `pnor $rd, $0, 0xAAAAAAAA` → `$rd = 0x55555555`.
|
|
- General NOR: `pnor $rd, 0xF0F0F0F0, 0x0F0F0F0F` → `$rd = 0`.
|
|
|
|
**Likely follow-ons after PNOR**: byte-walker reductions like
|
|
**PMFHL** (move from HI/LO), or another mask op like **PAND**
|
|
(MMI2 sa=0x12) / **POR** (MMI3 sa=0x12). Codex may want to
|
|
consider folding the bitwise MMI family (PAND/POR/PXOR/PNOR) into
|
|
one chapter since they're all reuses of existing ALU arms.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_psubb.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected **168/168**.
|
|
|
|
## Pattern review (10 qbert chapters)
|
|
|
|
| Ch | Blocker | Edits | Pattern |
|
|
|----|---------|-------|---------|
|
|
| 271 SQ | first | 5 | NEW 4-beat write |
|
|
| 272 DADDU | | 4 | NEW ALU-low-32 |
|
|
| 273 SYSCALL HLE | | 2 | NEW gated dispatcher |
|
|
| 274 BEQL | | 6 | NEW branch+squash |
|
|
| 275 SD | | 7 | REUSE SQ counter |
|
|
| 276 DSLL | | 4 | REUSE DADDU |
|
|
| 277 BNEL | | 6 | REUSE BEQL squash |
|
|
| 278 PCPYLD | | 4 | NEW MMI narrow-decode |
|
|
| 279 LQ | | 5 | REUSE LW path |
|
|
| **280 PSUBB** | | **5** | **REUSE MMI narrow (byte-SIMD)** |
|
|
|
|
10 chapters in, qbert at 27,021 retires, regression at 168.
|
|
SIMD byte-walker pattern is locking in: LQ → PSUBB → PNOR
|
|
(likely → PMFHL → branch). Each chapter is now ~4-5 LOC + a
|
|
TB; cadence holds at sub-half-day per chapter.
|