ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
161 lines
7.5 KiB
Markdown
161 lines
7.5 KiB
Markdown
# Ch283 closeout — 128-bit GPR shadow + PCPYUD (the upper-half MMI op)
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00113378 instr=0xdfbf0000)` —
|
|
opcode 0x37 = **LD (Load Doubleword)**, encoding `ld $ra, 0($ra)`.
|
|
This is the end-of-function return-address restore pattern, hit
|
|
*after* the byte-walker PCPYUD path completes and the function
|
|
returns. qbert retire_count: 27,024 → **27,067** (+43). The Ch283
|
|
chapter introduced the
|
|
architectural seam Codex framed as the right middle path between
|
|
"fake PCPYUD as zero" (silent divergence) and "widen the whole EE
|
|
core to 128 bits" (multi-chapter cross-cutting work): a parallel
|
|
**128-bit GPR shadow** (`gpr128`) that LQ/SQ/SD and every MMI op now
|
|
flow through, while the legacy 32-bit `regfile` remains the canonical
|
|
scalar surface.
|
|
|
|
## What landed (architectural summary)
|
|
|
|
The EE core now has two parallel GPR storages:
|
|
|
|
| | width | who writes it | who reads it |
|
|
|---|---|---|---|
|
|
| `regfile [0:31]` | 32 | every scalar op (unchanged) | scalar decode, branches, ALU operands |
|
|
| `gpr128 [0:31]` | 128 | every scalar op (via mirror — zero-extended); MMI ops; LQ | MMI ops needing upper bits; SQ/SD per-beat sources |
|
|
|
|
**Invariant:** `gpr128[i][31:0] === regfile[i]` always. Scalar writes
|
|
zero-extend into `gpr128[i][127:32]`; MMI/LQ writes can land non-zero
|
|
bits there. This is the R5900 rule that scalar ops clear the upper
|
|
bits of their destination — Codex framed it as "define upper bits
|
|
conservatively," and zero is the conservative answer.
|
|
|
|
## RTL — surgical edits in `ee_core_stub.sv`
|
|
|
|
1. **Declaration + reset** — `logic [127:0] gpr128 [0:31];` next to
|
|
`regfile`. Reset clears all 32 to 128'd0.
|
|
2. **Read helpers** — `rs128_val` / `rt128_val` next to `rs_val` /
|
|
`rt_val`, both with the `$0 → 0` guard.
|
|
3. **Scalar-write mirrors** — every existing `regfile[X] <= Y` now
|
|
has a paired `gpr128[X] <= {96'd0, Y}`. Sites touched: SYSCALL HLE
|
|
(3), I-type ALU writeback, R-type ALU writeback, MFHI/MFLO,
|
|
JAL/JALR link, MFC0, Ch215 jmp_buf restore (12) + final $v0,
|
|
LW/LB/LBU/LH/LHU load returns. Load path was refactored to compute
|
|
`load_wb` once and write both stores.
|
|
4. **MMI 128-bit writeback** — new `rtype_alu128_wb` combinational
|
|
block computes the full 128-bit MMI result for PCPYLD/PSUBB/PNOR/
|
|
PAND/PCPYUD. The R-type writeback site picks between the full
|
|
128-bit value (when `is_mmi_wb`) and the zero-extended scalar
|
|
value (every other R-type op). The existing 32-bit `rtype_alu_wb`
|
|
still lands the correct low 32 into `regfile`.
|
|
5. **LQ 4-beat FSM** — `is_lq` now takes a dedicated dispatch arm
|
|
that initializes `sq_beat <= 0` and re-uses S_MEM_REQ/S_MEM_WAIT
|
|
four times. Beat N's `map_rd_addr = ea + N*4`. Each beat captures
|
|
`map_rd_data` into the matching 32-bit lane of `gpr128[rt]`. Last
|
|
beat mirrors `gpr128[rt][31:0]` to `regfile[rt]` and retires once.
|
|
Replaces the Ch279 single-beat LW-style approximation.
|
|
6. **SQ/SD per-beat source upgrade** — beats now pull from
|
|
`gpr128[rt][lane]` instead of "low 32 then zero": SQ emits all
|
|
four lanes, SD emits the low two.
|
|
7. **PCPYUD decode + arms** — `localparam MMI3_PCPYUD = 5'h0E`,
|
|
`is_pcpyud` decode (MMI3 / sa 0x0E), added to `is_rtype_alu` and
|
|
`is_nop_class` exclusion. Low-32 arm in `rtype_alu_wb` uses
|
|
`rt128_val[95:64]` (= low 32 of $rt's upper doubleword); full
|
|
128-bit arm in `rtype_alu128_wb` is `{rs128[127:64],
|
|
rt128[127:64]}`.
|
|
|
|
## Focused TB — `tb_ee_core_pcpyud.sv`
|
|
|
|
Three cases:
|
|
|
|
1. **Exact qbert encoding asserted** == 0x704923A9. `pcpyud $a0, $v0,
|
|
$t1` with $v0 and $t1 set by scalar LUI+ORI (upper halves
|
|
architecturally 0). PCPYUD's low-32 result = 0 — exactly what
|
|
qbert sees on every byte-walker iteration.
|
|
2. **PCPYLD-then-PCPYUD round-trip.** `pcpyld $t2, $t0, $t1` puts
|
|
$t0[31:0] = 0xAABBCCDD into `gpr128[$t2][95:64]`. `pcpyud $t3,
|
|
$t2, $t2` then extracts $t2's upper-D into both halves of $t3.
|
|
Verified: `regfile[$t3] == 0xAABBCCDD` *and* peeked
|
|
`gpr128[$t3][127:64] == 0x00000000_AABBCCDD`. Proves the gpr128
|
|
shadow is actually carrying upper bits.
|
|
3. **PCPYUD with rt=$0.** Exercises the rs-upper-D path alone. $t5
|
|
low = 0, gpr128[$t5][127:64] inherits $t2's upper-D.
|
|
|
|
Result: `retired=23 halt=1 trap=0 pc=0xbfc00150 errors=0 PASS`.
|
|
|
|
## Makefile + regression
|
|
|
|
- `tb_ee_core_pcpyud` target with build + run rules.
|
|
- Added to both the PHONY target list (line 407) and the `run:`
|
|
master list (line 2510) — per the dual-list rule.
|
|
- Regression: 170 → **171**.
|
|
|
|
## qbert progression
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch281 (PNOR) | PAND at 0x00112C98 | 27,022 |
|
|
| Post-Ch282 (PAND) | PCPYUD at 0x00112CA0 | 27,024 |
|
|
| **Post-Ch283 (PCPYUD)** | **LD at 0x00113378** | **27,067** |
|
|
|
|
+43 retires past Ch282. qbert finished the byte-walker MMI sequence
|
|
(`LQ → PSUBB → PNOR → PAND → PCPYUD → reduce/branch`), returned from
|
|
that branch, did a chunk of follow-on work, then hit `ld $ra,
|
|
0($ra)` — the end-of-function return-address restore. LD is the
|
|
read-side of SD and is now the Ch284 candidate.
|
|
|
|
Side-effect check: the new full-128-bit LQ feeds real upper-half
|
|
data into PCPYUD. The fact that qbert advanced through the PCPYUD
|
|
site and 43 more instructions means the byte-walker's downstream
|
|
logic accepts the actual data (not zero), and made a real branch
|
|
decision based on it. Snapshot at halt:
|
|
|
|
- `$a0 = 0x33323130` — ASCII `"0123"`, which strongly suggests
|
|
qbert is mid-string processing (the byte-walker did its job).
|
|
- `$v1 = 0x0012c2c6`, `$a1 = 0x0011c326`, `$a2/$a3 = 0x0012c2c0`.
|
|
|
|
This is the first chapter where the qbert run produces visible
|
|
*content-shaped* state (ASCII bytes in registers) rather than just
|
|
opcode-blocker telemetry.
|
|
|
|
## Pattern review (13 chapters)
|
|
|
|
| Ch | Blocker | Edits | Pattern |
|
|
|-----|--------------|-------|---------|
|
|
| 271 | SQ | 5 | NEW 4-beat write |
|
|
| 272 | DADDU | 4 | NEW ALU-low-32 |
|
|
| 273 | SYSCALL HLE | 2 | NEW gated dispatcher |
|
|
| 274 | BEQL | 6 | NEW branch+squash |
|
|
| 275 | SD | 7 | REUSE SQ counter |
|
|
| 276 | DSLL | 4 | REUSE DADDU |
|
|
| 277 | BNEL | 6 | REUSE BEQL squash |
|
|
| 278 | PCPYLD | 4 | NEW MMI narrow-decode |
|
|
| 279 | LQ | 5 | REUSE LW path |
|
|
| 280 | PSUBB | 5 | REUSE MMI narrow (byte-SIMD new) |
|
|
| 281 | PNOR | 5 | REUSE MMI narrow + NOR arm |
|
|
| 282 | PAND | 5 | REUSE MMI narrow + AND arm |
|
|
| **283** | **PCPYUD + gpr128** | **architectural** | **NEW 128-bit shadow** |
|
|
|
|
Ch283 breaks the surgical-one-opcode cadence because it has to: this
|
|
is the first chapter that the "low-32-only" approximation could not
|
|
keep absorbing. The MMI narrow-decode pattern from Ch278 still works
|
|
(PCPYUD adds the same 3-way is_mmi+func+sa decode), but the
|
|
*writeback* now needs full-128 storage, which retroactively forced
|
|
LQ/SQ/SD/PCPYLD/PSUBB/PNOR/PAND to also flow through `gpr128`.
|
|
|
|
That's a one-time investment. Future MMI ops that need upper bits
|
|
(PCPYH, PINTEH, PCEQB, PMADDH, etc.) can ride the existing seam:
|
|
read `rs128_val`/`rt128_val`, write `rtype_alu128_wb`. No more
|
|
architectural work to add upper-half ops.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — declarations + 36 scalar-write mirrors
|
|
+ MMI 128-bit writeback + PCPYUD decode + LQ 4-beat FSM + SQ/SD
|
|
per-beat sources.
|
|
- `sim/tb/integration/tb_ee_core_pcpyud.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
**171/171 PASS** (was 170/170 in Ch282).
|