ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
144 lines
5.5 KiB
Markdown
144 lines
5.5 KiB
Markdown
# Ch282 closeout — PAND; next blocker is PCPYUD (the first "upper-half" MMI op)
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112CA0 instr=0x704923A9)` —
|
|
opcode `0x1C` (MMI) + funct `0x29` (MMI3) + sa `0x0E` =
|
|
**PCPYUD** (Parallel Copy **Upper** Doubleword). This is the
|
|
first MMI op that reads from the architectural **upper 64
|
|
bits** of a source register — a place our 32-bit-GPR model has
|
|
never been able to represent.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch280 (PSUBB) | PNOR at 0x00112C94 | 27,021 |
|
|
| Post-Ch281 (PNOR) | PAND at 0x00112C98 | 27,022 |
|
|
| **Post-Ch282 (PAND)** | **PCPYUD at 0x00112CA0** | **27,024** |
|
|
|
|
2-retire delta — PAND retired plus one instruction at PC
|
|
0x00112C9C (probably another byte-broadcast or comparison),
|
|
then PCPYUD traps.
|
|
|
|
## What landed
|
|
|
|
### RTL — 5 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. `localparam MMI2_PAND = 5'h12` alongside MMI2_PCPYLD.
|
|
2. `is_pand = is_mmi && (func == FUNC_MMI2) && (shamt ==
|
|
MMI2_PAND)`.
|
|
3. Added `is_pand` to `is_rtype_alu`.
|
|
4. **Reused** the existing AND writeback: `if (is_and ||
|
|
is_pand) rtype_alu_wb = rs_val & rt_val`.
|
|
5. `!is_pand` added to `is_nop_class`.
|
|
|
|
Highest-reuse chapter yet — MMI narrow-decode + AND writeback
|
|
arm both already in place from prior chapters.
|
|
|
|
### Focused TB — `tb_ee_core_pand.sv`
|
|
|
|
Three cases:
|
|
|
|
1. **Exact qbert encoding**: `pand $v0, $v0, $v1` (rs=2, rt=3,
|
|
rd=2, sa=0x12, funct=0x09). Encoder asserted `0x70431489`.
|
|
`$v0 = 0xFFFFFFFF & 0xAAAAAAAA = 0xAAAAAAAA`.
|
|
2. **Disjoint masks**: `0xF0F0F0F0 & 0x0F0F0F0F = 0` (proves
|
|
pure bitwise AND).
|
|
3. **Zero-mask**: `0xDEADBEEF & 0 = 0`.
|
|
|
|
Result: `retired=24 halt=1 trap=0 pc=0xbfc00154 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_pand` target.
|
|
- Added to both regression lists.
|
|
- Regression: 169 → **170**.
|
|
|
|
## Ch283 framing — PCPYUD: a fork in the road
|
|
|
|
**Decoded**: `pcpyud $a0, $v0, $t1` (rs=$v0, rt=$t1, rd=$a0).
|
|
- Architectural: `$rd[127:64] = $rs[127:64]; $rd[63:0] =
|
|
$rt[127:64]`. Extracts the upper-64 of both source operands;
|
|
the upper-64 of rt becomes the lower-64 of rd.
|
|
|
|
**The fundamental problem**: every prior chapter has lived
|
|
inside a "low 32 bits only" approximation. The upper 96 bits
|
|
of every GPR are silently 0 in our model — never written by
|
|
SQ/SD/PCPYLD/PSUBB/PNOR/PAND. PCPYUD is the first op that
|
|
**reads** from that upper half, so the question becomes
|
|
unavoidable:
|
|
|
|
- **Option A — preserve the approximation**: implement PCPYUD
|
|
as `$rd = 0` always. Honest "this op reads from a region we
|
|
don't model, which is always zero by construction." qbert
|
|
will see all-zero PCPYUD results and **may falsely conclude
|
|
it found a sentinel byte every iteration** of the
|
|
byte-walker. Silent divergence; the next 5-10 chapters of
|
|
blockers might be illusory (cascading from the wrong PCPYUD
|
|
result) rather than real qbert needs.
|
|
- **Option B — NOP-class PCPYUD (do not allow)**: leave it
|
|
trapping; surface this as the "model boundary" that warrants
|
|
a real-128-bit-GPR pivot in a future chapter. qbert wouldn't
|
|
continue past 27,024 until that pivot happens.
|
|
- **Option C — implement 128-bit GPRs**: faithful but a big
|
|
cross-cutting change (regfile width, every ALU arm, every
|
|
load/store writeback). Multiple chapters of work. Real
|
|
semantic correctness, but breaks the "one op per chapter"
|
|
cadence we've held since Ch271.
|
|
|
|
**My read**: at minimum, do NOT silently NOP-class to 0. The
|
|
qbert byte-walker's correctness depends on the upper 8 bytes
|
|
of every LQ. Even if we land "Option B" first (keep the trap),
|
|
the next chapter genuinely should be the 128-bit GPR pivot.
|
|
|
|
This is the right moment to step back and frame the broader
|
|
question with Codex. The MMI-narrow-decode cadence has worked
|
|
beautifully for ops where low-32-bit semantics happen to
|
|
suffice (PCPYLD, PSUBB, PNOR, PAND). It hits a wall at
|
|
upper-half ops. Either:
|
|
|
|
1. Bite the 128-bit GPR bullet now (Ch283 = "expand regfile
|
|
to 128 bits + propagate through every LQ/SQ/SD/PCPYLD/...
|
|
writeback").
|
|
2. Accept that qbert is "as far as we can get" without 128-bit
|
|
GPRs and pivot to a different ELF (homebrew that's
|
|
32-bit-clean) or to hardware-facing deliverables.
|
|
|
|
I'd recommend (1) is the right next move — qbert has been a
|
|
productive test vector, and the SIMD byte-walker shape is
|
|
universal across PS2 stdlib code. Future game ELFs will hit
|
|
the same wall.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 5 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_pand.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected **170/170**.
|
|
|
|
## Pattern review (12 chapters)
|
|
|
|
| Ch | Blocker | Edits | Pattern |
|
|
|----|---------|-------|---------|
|
|
| 271 SQ | first qbert | 5 | NEW 4-beat write |
|
|
| 272 DADDU | | 4 | NEW ALU-low-32 |
|
|
| 273 SYSCALL HLE | | 2 | NEW gated dispatcher |
|
|
| 274 BEQL | | 6 | NEW branch+squash |
|
|
| 275 SD | | 7 | REUSE SQ counter |
|
|
| 276 DSLL | | 4 | REUSE DADDU |
|
|
| 277 BNEL | | 6 | REUSE BEQL squash |
|
|
| 278 PCPYLD | | 4 | NEW MMI narrow-decode |
|
|
| 279 LQ | | 5 | REUSE LW path |
|
|
| 280 PSUBB | | 5 | REUSE MMI narrow (byte-SIMD new) |
|
|
| 281 PNOR | | 5 | REUSE MMI narrow + NOR arm |
|
|
| **282 PAND** | | **5** | **REUSE MMI narrow + AND arm** |
|
|
|
|
5 NEW patterns + 7 REUSE chapters. The reuse density is at
|
|
its peak right now, but Ch283 PCPYUD is signaling that the
|
|
"low-32-only" approximation has reached its natural boundary.
|
|
Codex's framing on whether to widen the regfile or pivot
|
|
elsewhere will set the direction for the next stretch.
|