ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
159 lines
5.8 KiB
Markdown
159 lines
5.8 KiB
Markdown
# Ch279 closeout — LQ as single-beat low-word load; next blocker is PSUBB (MMI0)
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112C90 instr=0x712A1248)` —
|
|
opcode `0x1C` (MMI) + funct `0x08` (MMI0 sub-table) + sa `0x09`
|
|
= **PSUBB** (Parallel Subtract Byte). qbert ran LQ + one more
|
|
instruction, then trapped on the byte-wise SIMD subtract that
|
|
sits at the heart of its stdlib byte-walker.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch277 (BNEL) | PCPYLD at 0x00112C84 | 27,017 |
|
|
| Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 |
|
|
| **Post-Ch279 (LQ)** | **PSUBB at 0x00112C90** | **27,020** |
|
|
|
|
2-retire delta: LQ + the next instruction (probably another
|
|
register move) before PSUBB. The chain qbert is running here is
|
|
the canonical SIMD byte-walker — load a 128-bit chunk, do a
|
|
byte-wise compare/subtract against a sentinel, mask, test.
|
|
|
|
## What landed
|
|
|
|
### RTL — 4 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. `localparam OP_LQ = 6'h1E` alongside `OP_LW`.
|
|
2. `is_lq` decode signal.
|
|
3. **Alignment**: extended `is_quad_access = is_sq || is_lq`
|
|
so the existing 16-byte alignment fault `ea[3:0] != 0` covers
|
|
LQ too. Misaligned LQ trips the AdEL path (it's a load, so
|
|
the existing `is_align_store` group correctly doesn't include
|
|
it — exception code is ADEL not ADES).
|
|
4. **FSM transition**: added `|| is_lq` to the LW/LB/LBU/LH/LHU
|
|
loads list. The existing `S_MEM_REQ → S_MEM_WAIT` path
|
|
handles the 32-bit read; `S_MEM_WAIT`'s default writeback
|
|
`regfile[rt_idx] <= map_rd_data` fires for LQ because none
|
|
of is_lb/lbu/lh/lhu match (the if-else chain falls through
|
|
to the default LW arm).
|
|
5. `!is_lq` added to `is_nop_class` catch-all.
|
|
|
|
5 surgical edits total. The "reuse LW path" decision keeps the
|
|
chapter small.
|
|
|
|
### Focused TB — `tb_ee_core_lq.sv`
|
|
|
|
Cases:
|
|
1. **Exact qbert encoding shape**: `lq $t1, 0($a1)` built via
|
|
`enc_i(OP_LQ, RA1, RT1, 0)` and asserted to equal
|
|
`0x78A90000`. (We use this assertion to lock the encoding
|
|
even though the actual exec uses `lq $t1, 0($v0)` with a
|
|
different base — same opcode shape, different register
|
|
index.)
|
|
2. **Value check**: pre-poke phys 0x400..0x40F with 4 distinct
|
|
patterns (`0xAABBCCDD / 0x11112222 / 0x33334444 / 0x55556666`)
|
|
so a buggy implementation reading the wrong lane would fail.
|
|
Verify `$t1 = 0xAABBCCDD` (the low 32 of the qword).
|
|
3. **LW cross-check**: LW at the same EA reads the same value.
|
|
Confirms LQ is decoded as a "single-beat low-word load"
|
|
consistent with the existing LW path.
|
|
4. **No-modify check**: post-halt hierarchical RAM peek
|
|
confirms all 4 lanes still hold the pre-pokes (LQ doesn't
|
|
write).
|
|
|
|
Result: `retired=13 halt=1 trap=0 pc=0xbfc00128 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_lq` target.
|
|
- Added to both regression lists.
|
|
- Regression: 166 → **167**.
|
|
|
|
## Recommendation for Codex's Ch280 — PSUBB
|
|
|
|
PSUBB at PC `0x00112C90`, instr `0x712A1248`:
|
|
- opcode 0x1C (MMI)
|
|
- funct 0x08 (MMI0 sub-table)
|
|
- sa 0x09 (PSUBB within MMI0)
|
|
- rs=$t1, rt=$t2, rd=$v0
|
|
- → `psubb $v0, $t1, $t2`
|
|
|
|
Architectural: `rd[7+8i:8i] = rs[7+8i:8i] - rt[7+8i:8i]` for
|
|
i ∈ [0..15], 16 parallel byte subtractions with no carry/borrow
|
|
between byte lanes.
|
|
|
|
For our 32-bit model: 4 parallel byte subtractions on the low
|
|
32 bits.
|
|
|
|
Implementation outline (mirrors Ch278 PCPYLD's narrow-decode):
|
|
|
|
1. `localparam FUNC_MMI0 = 6'h08`.
|
|
2. `localparam MMI0_PSUBB = 5'h09`.
|
|
3. `is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB)`.
|
|
4. Add to `is_rtype_alu` group.
|
|
5. New writeback arm:
|
|
```sv
|
|
else if (is_psubb) begin
|
|
rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0];
|
|
rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8];
|
|
rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16];
|
|
rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24];
|
|
end
|
|
```
|
|
(Each byte sub is naturally modulo-256, no carry between
|
|
lanes — that's the SIMD semantic.)
|
|
6. Add `!is_psubb` to `is_nop_class` allow-list.
|
|
|
|
Focused TB:
|
|
- Identity check: `psubb $rd, $rs, $0` → `$rd = $rs` (each byte
|
|
minus 0).
|
|
- Lane-isolation check: `psubb $rd, $rs, $rt` with `$rs =
|
|
0x10203040`, `$rt = 0x01010101` → `$rd = 0x0F1F2F3F` (proves
|
|
each byte subtracts independently, no inter-lane carry/borrow).
|
|
- Wrap check: `psubb $rd, 0x00010203, 0x01010101` → `$rd =
|
|
0xFF000102` (proves bit 7 doesn't carry into byte 1).
|
|
- Exact qbert encoding assertion against `0x712A1248`.
|
|
|
|
~4 LOC change.
|
|
|
|
**Likely follow-ons** in this byte-walker context: **PCEQB**
|
|
(parallel compare equal byte) and **PMFHL/LH** (parallel move
|
|
from HI/LO low halves). The string-walker pattern is:
|
|
1. LQ a chunk of memory.
|
|
2. PSUBB or PCEQB against a sentinel.
|
|
3. PMFHL or some other reduction.
|
|
4. Branch.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_lq.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected **167/167**.
|
|
|
|
## Pattern review
|
|
|
|
9 qbert chapters. The MMI sub-decode pattern from Ch278 is
|
|
about to be reused (PSUBB shares the same shape: MMI prefix
|
|
+ funct + sa selector). Anticipated: PSUBB in 4 edits, mirror
|
|
of PCPYLD.
|
|
|
|
| Chapter | Blocker | Edits | Pattern |
|
|
|---------|---------|-------|---------|
|
|
| Ch271 SQ | SQ | 5 | NEW 4-beat write |
|
|
| Ch272 DADDU | DADDU | 4 | NEW ALU-low-32 |
|
|
| Ch273 SYSCALL HLE | SYSCALL #60 | 2 | NEW gated dispatcher |
|
|
| Ch274 BEQL | BEQL | 6 | NEW branch+squash |
|
|
| Ch275 SD | SD | 7 | REUSE SQ counter |
|
|
| Ch276 DSLL | DSLL | 4 | REUSE DADDU |
|
|
| Ch277 BNEL | BNEL | 6 | REUSE BEQL squash |
|
|
| Ch278 PCPYLD | PCPYLD | 4 | NEW MMI narrow-decode |
|
|
| **Ch279 LQ** | **LQ** | **5** | **REUSE LW path** |
|
|
|
|
The runner-pick-next-blocker loop is producing one chapter per
|
|
sub-half-day. The qbert track is on rails.
|