Files
retroDE_ps2/docs/ch277_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

150 lines
5.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch277 closeout — BNEL squash-on-not-taken; qbert hits MMI (PCPYLD) one instruction later
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00112C84 instr=0x71295389)`
opcode `0x1C` (R5900 EE **MMI**) + funct `0x09` (MMI2 sub-group)
+ sa `0x0E` = **PCPYLD** (Parallel Copy Lower Doubleword). qbert
ran the BNEL correctly (squashed not-taken — PC went 0xC7C →
0xC84 = +8 bytes, confirming the squash path), then trapped on
the very next instruction, an MMI/PCPYLD.
## Numbers
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
| Post-Ch276 (DSLL) | BNEL at 0x00112C7C | 27,016 |
| **Post-Ch277 (BNEL)** | **PCPYLD at 0x00112C84** | **27,017** |
1-retire delta — BNEL itself retired (the squash path), then
PCPYLD trapped before retiring.
## What landed
### RTL — surgical edits in `ee_core_stub.sv`
1. `localparam OP_BNEL = 6'h15` alongside `OP_BNE`/`OP_BEQL`.
2. `is_bnel` decode signal.
3. Added `is_bnel` to the `is_branch` group.
4. Extended `branch_taken` with `(is_bnel && (rs_val != rt_val))`.
5. **Generalized the squash signal**: renamed `is_beql_squash`
to `is_branch_likely_squash`, now covering BEQL (squash on
`rs == rt`... wait, *not* equal — branch likely SQUASHES on
the NOT-TAKEN condition) and BNEL (squash on `rs == rt`):
```sv
assign is_branch_likely_squash =
(is_beql && (rs_val != rt_val)) // Ch274 — BEQL not-taken
|| (is_bnel && (rs_val == rt_val)); // Ch277 — BNEL not-taken
```
`retire_advance` updated to reference the new name. Adding
BLEZL/BGTZL/REGIMM-likely later is now a one-line OR-extension.
6. Added `!is_bnel` to the `is_nop_class` allow-list.
About 6 LOC of real change. Pure pattern-reuse from Ch274.
### Focused TB — `tb_ee_core_bnel.sv`
Three cases mirroring `tb_ee_core_beql`:
1. **BNEL TAKEN** (`$t0 = 5`, `$t1 = 7`, differ → taken): branch
reaches target; delay slot executes (writes a sentinel into
`$t5`). Cross-check: `$t6 = 0xCAFE` at target.
2. **BNEL NOT-TAKEN** (`$t2 = $t3 = 3`, equal → squash): delay
slot squashed. Inline BNE chain verifies `$t5` stays at
`0xBEEF0000` (the OR-INTO probe didn't execute). `$t7 = 0x2222`
at PC+8.
3. **BNE NOT-TAKEN cross-check** (same operands): plain BNE's
delay slot DOES execute → `$t5 = 0xBABE0CAB`. Proves BNEL
differs.
Result: `retired=21 halt=1 trap=0 pc=0xbfc00158 errors=0 PASS`.
### Makefile + regression
- `tb_ee_core_bnel` target.
- Added to both PHONY list and `run:` master.
- Regression: 164 → **165**.
## Recommendation for Codex's Ch278 — PCPYLD (MMI2)
**`pcpyld $t2, $a1, $t1`** at PC `0x00112C84`, instr `0x71295389`.
Decoded:
- opcode `0x1C` (MMI prefix)
- funct `0x09` (MMI2 sub-group selector)
- sa `0x0E` (PCPYLD sub-instruction)
- rs `5` (`$a1`), rt `9` (`$t1`), rd `10` (`$t2`)
PCPYLD architectural semantics (R5900 EE, 128-bit MMI):
```
rd[127:64] = rs[63:0] // upper 64 of rd = lower 64 of rs
rd[63:0] = rt[63:0] // lower 64 of rd = lower 64 of rt
```
For our **32-bit register model**:
- We can't represent `rd[127:64]` (no upper bits).
- `rd[63:0] = rt[63:0]` collapses to `$rd[31:0] = $rt[31:0]`
(lower 32 bits).
**Minimal Ch278 scope**:
1. Decode the MMI2/PCPYLD path: opcode `0x1C` + funct `0x09` +
sa `0x0E` → set `is_pcpyld`.
2. Add to `is_rtype_alu` group.
3. In `rtype_alu_wb`: `else if (is_pcpyld) rtype_alu_wb = rt_val;`
(low 32 bits of $rt → $rd).
4. Add `!is_pcpyld` to `is_nop_class` allow-list.
Document the approximation explicitly in the RTL: upper bits of
$rd (which would carry $rs's lower 64 in a real EE) are not
modelled. For qbert's specific call pattern at this PC, the
data being shuffled is likely 128-bit packed bytes for a
strlen-style byte-walker (`$a0 = 0x80808080` is the classic
"detect high bit per byte" mask); the **low 32 bits** are the
relevant observable.
**Important Codex caution**: do NOT NOP-class the entire MMI
opcode (`0x1C`). MMI has ~80 sub-instructions (MMI0/MMI1/MMI2/
MMI3 sub-tables); some are real data movement (PCPYLD, PCPYUD,
PCPYH), some are arithmetic (PADDB, PSUBB, PMULTW), some are
SIMD compares (PCEQB, PCEQH). Each needs its own decode arm or
careful approximation. The qbert track is fine with one
sub-instruction per chapter — same incremental cadence we've
maintained throughout.
**Likely follow-ons** after PCPYLD: any other MMI2 op qbert's
byte-walker uses. Common candidates given the `0x80808080`
sentinel: **PCEQB** (parallel compare equal byte) and **PMFHL**
(parallel move from HI/LO).
## Files changed
- `rtl/ee/ee_core_stub.sv` — 6 surgical edits.
- `sim/tb/integration/tb_ee_core_bnel.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Regression
In flight; expected **165/165**.
## Pattern review
Seven qbert chapters (Ch271Ch277). The qbert-driven track keeps
producing one chapter per blocker at sub-half-day cadence:
| Chapter | Blocker | retire_count |
|---------|---------|--------------|
| Ch271 SQ | (init) | 12 → 26,958 |
| Ch272 DADDU | | → 26,960 |
| Ch273 SYSCALL HLE | | → 26,980 |
| Ch274 BEQL | | → 26,985 |
| Ch275 SD | | → 27,006 |
| Ch276 DSLL | | → 27,016 |
| **Ch277 BNEL** | | **→ 27,017** |
The MMI surface (PCPYLD and likely siblings) will broaden the
opcode count quickly — that's expected when a real program
starts using SIMD-style operations for stdlib-class work.