ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
173 lines
6.4 KiB
Markdown
173 lines
6.4 KiB
Markdown
# Ch278 closeout — MMI2/PCPYLD (narrow, one sub-instruction only); next blocker is LQ
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112C88 instr=0x78A90000)` —
|
|
**LQ** (Load Quadword, opcode 0x1E, R5900 EE), the 128-bit load
|
|
symmetric to Ch271's SQ. qbert ran the PCPYLD and trapped on
|
|
the next instruction, which is the matching 128-bit load.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch276 (DSLL) | BNEL at 0x00112C7C | 27,016 |
|
|
| Post-Ch277 (BNEL) | PCPYLD at 0x00112C84 | 27,017 |
|
|
| **Post-Ch278 (PCPYLD)** | **LQ at 0x00112C88** | **27,018** |
|
|
|
|
1-retire delta — PCPYLD retired, LQ trapped before retiring.
|
|
Same compact "one opcode at a time" cadence; qbert's stdlib
|
|
byte-walker is showing us each MIPS-III/MMI feature it touches
|
|
in textbook order.
|
|
|
|
## What landed
|
|
|
|
### RTL — 4 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. **Opcode/sub-instruction constants**:
|
|
```sv
|
|
localparam OP_MMI = 6'h1C;
|
|
localparam FUNC_MMI2 = 6'h09;
|
|
localparam MMI2_PCPYLD = 5'h0E;
|
|
```
|
|
2. **Narrow decode**: `is_pcpyld = is_mmi && (func == FUNC_MMI2)
|
|
&& (shamt == MMI2_PCPYLD)`. Three-way AND on opcode + funct +
|
|
sa fields — any OTHER op=0x1C instruction continues to fall
|
|
through to strict-trap.
|
|
3. **Added to `is_rtype_alu` group** so the existing R-type
|
|
writeback path handles it.
|
|
4. **`rtype_alu_wb`**: `else if (is_pcpyld) rtype_alu_wb = rt_val`.
|
|
Architectural `rd[63:0] = rt[63:0]` — the only observable
|
|
effect in our 32-bit model.
|
|
5. **`is_nop_class` allow**: added `&& !is_pcpyld` to the
|
|
catch-all so other MMI sub-instructions still trap. Critical
|
|
per Codex's caution — do NOT NOP-class the whole MMI opcode.
|
|
|
|
### Focused TB — `tb_ee_core_pcpyld.sv`
|
|
|
|
Two cases:
|
|
1. **Exact qbert encoding**: `pcpyld $t2, $t1, $t1` (rs=rt=$t1
|
|
in the actual qbert instruction — see process note below).
|
|
Built via `enc_rtype` and asserted to equal `0x71295389`.
|
|
With `$t1 = 0xBBBBBBBB`, verifies `$t2 = 0xBBBBBBBB`.
|
|
2. **Distinct rs/rt sentinels** (the rd<-rt proof):
|
|
`pcpyld $t3, $a0, $a1` with `$a0 = 0xDEADBEEF`,
|
|
`$a1 = 0xCAFEF00D`. Verifies `$t3 = 0xCAFEF00D` (rt) and
|
|
explicitly NOT `0xDEADBEEF` (rs). Locks in the
|
|
architectural rd-takes-from-rt semantics for the low 32
|
|
bits.
|
|
|
|
Result: `retired=21 halt=1 trap=0 pc=0xbfc00148 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_pcpyld` target.
|
|
- Added to both regression lists.
|
|
- Regression: 165 → **166**.
|
|
|
|
## Process note — decode mistake caught by encoder assertion
|
|
|
|
My initial decode of qbert's `0x71295389` claimed
|
|
`pcpyld $t2, $a1, $t1`, reading the rs field as `$a1=5`. That
|
|
was wrong: bits 25:21 of `0x71295389` are `01001 = 9 = $t1`.
|
|
The actual instruction is `pcpyld $t2, $t1, $t1` (rs=rt=$t1).
|
|
|
|
The error was caught by the TB's `enc_rtype` assertion — the
|
|
first run produced `0x70A95389` instead of the expected
|
|
`0x71295389`, and the inline `$error` exposed the difference.
|
|
**The encoder-output assertion pattern (`enc_rtype(...) ===
|
|
0x...`) has now caught misdecodes in Ch272 (DADDU was clean),
|
|
Ch276 (DSLL was clean), and Ch278 (PCPYLD was not).** Always
|
|
including the assertion is paying off.
|
|
|
|
The corrected encoding `pcpyld $t2, $t1, $t1` still falls
|
|
under the same architectural semantic — `$rd = $rt` low 32 —
|
|
because both rs and rt are $t1 in this specific qbert
|
|
encoding. So Codex's "rd <= rt_val" implementation is correct
|
|
regardless.
|
|
|
|
## qbert disassembly check (Ch279 framing)
|
|
|
|
The trap at PC 0x00112C88 is one word past PCPYLD (0x00112C84
|
|
+ 4):
|
|
|
|
```
|
|
0x00112C84: 0x71295389 pcpyld $t2, $t1, $t1
|
|
0x00112C88: 0x78A90000 lq $t1, 0($a1) <-- next blocker
|
|
```
|
|
|
|
LQ is the 128-bit load: `rt[127:0] = mem[base+imm][127:0]`. In
|
|
our 32-bit register model, `$rt[31:0] = mem[base+imm][31:0]`
|
|
(low 32 bits only; upper 96 unrepresentable). This is the
|
|
symmetric counterpart to **Ch271 SQ**.
|
|
|
|
## Recommendation for Codex's Ch279 — LQ
|
|
|
|
Symmetric to SQ. Two possible implementation shapes:
|
|
|
|
**(A) Minimal: single 32-bit read at EA, writeback to $rt.**
|
|
- 16-byte alignment required (`ea[3:0] == 0`); misaligned →
|
|
AdES.
|
|
- Reuse the existing S_MEM_REQ → S_MEM_WAIT → writeback FSM
|
|
that LW uses. The single-word read returns the low 32 bits.
|
|
- Upper 96 bits of `$rt` aren't modelled in our regfile, so
|
|
there's nothing to do with the high beats.
|
|
- Documented approximation: same as SQ — only the architectural
|
|
low 32 bits are observable.
|
|
- ~4 RTL edits.
|
|
|
|
**(B) Symmetric: 4-beat read FSM reading 32 bits per beat.**
|
|
- Mirrors Ch271's SQ structure exactly.
|
|
- All 4 reads issued; the implementation discards beats 1-3
|
|
(since we have no GPR storage for them).
|
|
- ~8 RTL edits.
|
|
- Slightly more uniform with SQ but no observable behavior
|
|
difference from (A).
|
|
|
|
**My read: (A)**, because the upper 96 bits are unrepresentable.
|
|
A 4-beat read costs sim cycles for zero benefit. We can revisit
|
|
if/when 128-bit GPRs are added.
|
|
|
|
Implementation outline for (A):
|
|
1. `localparam OP_LQ = 6'h1E`.
|
|
2. `is_lq` decode signal.
|
|
3. Add 16-byte alignment check: extend `is_align_fault` with
|
|
`is_quad_load_access && (ea[3:0] != 0)` (or just extend
|
|
`is_quad_access` to cover both SQ and LQ).
|
|
4. Add LQ to the FSM transition: `else if (is_lq) state <=
|
|
S_MEM_REQ`. Reuse the existing `S_MEM_WAIT` writeback path.
|
|
5. Hook LQ into the LW/LB/LBU writeback case as a "word load
|
|
with 16-byte aligned EA".
|
|
6. Add `!is_lq` to `is_nop_class` allow-list.
|
|
|
|
Focused TB mirrors `tb_ee_core_sq` shape: pre-poke RAM with
|
|
distinct non-zero values, execute `lq $rt, 0($base)`, verify
|
|
`$rt = low 32 bits of mem[base]`. Cross-check that an LW at
|
|
the same EA returns the same value (proving LQ degenerates to
|
|
LW in our model for the observable lane).
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 4 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_pcpyld.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected **166/166**.
|
|
|
|
## Pattern review
|
|
|
|
Eight qbert chapters now. The pattern continues to compress.
|
|
RTL edits per chapter (qbert track):
|
|
|
|
| Ch271 SQ | 5 | NEW 4-beat write |
|
|
| Ch272 DADDU | 4 | NEW ALU-low-32 |
|
|
| Ch273 SYSCALL HLE | 2 | NEW gated dispatcher |
|
|
| Ch274 BEQL | 6 | NEW branch+squash |
|
|
| Ch275 SD | 7 | REUSE SQ counter |
|
|
| Ch276 DSLL | 4 | REUSE DADDU |
|
|
| Ch277 BNEL | 6 | REUSE BEQL squash (generalized) |
|
|
| **Ch278 PCPYLD** | **4** | **NEW MMI narrow-decode** |
|
|
|
|
Ch279 LQ should be ~4 edits (reuse LW path + new alignment).
|