ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
150 lines
5.4 KiB
Markdown
150 lines
5.4 KiB
Markdown
# Ch277 closeout — BNEL squash-on-not-taken; qbert hits MMI (PCPYLD) one instruction later
|
||
|
||
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
||
`elf_first_unsupported_opcode (pc=0x00112C84 instr=0x71295389)` —
|
||
opcode `0x1C` (R5900 EE **MMI**) + funct `0x09` (MMI2 sub-group)
|
||
+ sa `0x0E` = **PCPYLD** (Parallel Copy Lower Doubleword). qbert
|
||
ran the BNEL correctly (squashed not-taken — PC went 0xC7C →
|
||
0xC84 = +8 bytes, confirming the squash path), then trapped on
|
||
the very next instruction, an MMI/PCPYLD.
|
||
|
||
## Numbers
|
||
|
||
| Chapter | Blocker | qbert retire_count |
|
||
|---------|---------|---------------------|
|
||
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
|
||
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
|
||
| Post-Ch276 (DSLL) | BNEL at 0x00112C7C | 27,016 |
|
||
| **Post-Ch277 (BNEL)** | **PCPYLD at 0x00112C84** | **27,017** |
|
||
|
||
1-retire delta — BNEL itself retired (the squash path), then
|
||
PCPYLD trapped before retiring.
|
||
|
||
## What landed
|
||
|
||
### RTL — surgical edits in `ee_core_stub.sv`
|
||
|
||
1. `localparam OP_BNEL = 6'h15` alongside `OP_BNE`/`OP_BEQL`.
|
||
2. `is_bnel` decode signal.
|
||
3. Added `is_bnel` to the `is_branch` group.
|
||
4. Extended `branch_taken` with `(is_bnel && (rs_val != rt_val))`.
|
||
5. **Generalized the squash signal**: renamed `is_beql_squash`
|
||
to `is_branch_likely_squash`, now covering BEQL (squash on
|
||
`rs == rt`... wait, *not* equal — branch likely SQUASHES on
|
||
the NOT-TAKEN condition) and BNEL (squash on `rs == rt`):
|
||
|
||
```sv
|
||
assign is_branch_likely_squash =
|
||
(is_beql && (rs_val != rt_val)) // Ch274 — BEQL not-taken
|
||
|| (is_bnel && (rs_val == rt_val)); // Ch277 — BNEL not-taken
|
||
```
|
||
|
||
`retire_advance` updated to reference the new name. Adding
|
||
BLEZL/BGTZL/REGIMM-likely later is now a one-line OR-extension.
|
||
6. Added `!is_bnel` to the `is_nop_class` allow-list.
|
||
|
||
About 6 LOC of real change. Pure pattern-reuse from Ch274.
|
||
|
||
### Focused TB — `tb_ee_core_bnel.sv`
|
||
|
||
Three cases mirroring `tb_ee_core_beql`:
|
||
|
||
1. **BNEL TAKEN** (`$t0 = 5`, `$t1 = 7`, differ → taken): branch
|
||
reaches target; delay slot executes (writes a sentinel into
|
||
`$t5`). Cross-check: `$t6 = 0xCAFE` at target.
|
||
2. **BNEL NOT-TAKEN** (`$t2 = $t3 = 3`, equal → squash): delay
|
||
slot squashed. Inline BNE chain verifies `$t5` stays at
|
||
`0xBEEF0000` (the OR-INTO probe didn't execute). `$t7 = 0x2222`
|
||
at PC+8.
|
||
3. **BNE NOT-TAKEN cross-check** (same operands): plain BNE's
|
||
delay slot DOES execute → `$t5 = 0xBABE0CAB`. Proves BNEL
|
||
differs.
|
||
|
||
Result: `retired=21 halt=1 trap=0 pc=0xbfc00158 errors=0 PASS`.
|
||
|
||
### Makefile + regression
|
||
|
||
- `tb_ee_core_bnel` target.
|
||
- Added to both PHONY list and `run:` master.
|
||
- Regression: 164 → **165**.
|
||
|
||
## Recommendation for Codex's Ch278 — PCPYLD (MMI2)
|
||
|
||
**`pcpyld $t2, $a1, $t1`** at PC `0x00112C84`, instr `0x71295389`.
|
||
|
||
Decoded:
|
||
- opcode `0x1C` (MMI prefix)
|
||
- funct `0x09` (MMI2 sub-group selector)
|
||
- sa `0x0E` (PCPYLD sub-instruction)
|
||
- rs `5` (`$a1`), rt `9` (`$t1`), rd `10` (`$t2`)
|
||
|
||
PCPYLD architectural semantics (R5900 EE, 128-bit MMI):
|
||
```
|
||
rd[127:64] = rs[63:0] // upper 64 of rd = lower 64 of rs
|
||
rd[63:0] = rt[63:0] // lower 64 of rd = lower 64 of rt
|
||
```
|
||
|
||
For our **32-bit register model**:
|
||
- We can't represent `rd[127:64]` (no upper bits).
|
||
- `rd[63:0] = rt[63:0]` collapses to `$rd[31:0] = $rt[31:0]`
|
||
(lower 32 bits).
|
||
|
||
**Minimal Ch278 scope**:
|
||
1. Decode the MMI2/PCPYLD path: opcode `0x1C` + funct `0x09` +
|
||
sa `0x0E` → set `is_pcpyld`.
|
||
2. Add to `is_rtype_alu` group.
|
||
3. In `rtype_alu_wb`: `else if (is_pcpyld) rtype_alu_wb = rt_val;`
|
||
(low 32 bits of $rt → $rd).
|
||
4. Add `!is_pcpyld` to `is_nop_class` allow-list.
|
||
|
||
Document the approximation explicitly in the RTL: upper bits of
|
||
$rd (which would carry $rs's lower 64 in a real EE) are not
|
||
modelled. For qbert's specific call pattern at this PC, the
|
||
data being shuffled is likely 128-bit packed bytes for a
|
||
strlen-style byte-walker (`$a0 = 0x80808080` is the classic
|
||
"detect high bit per byte" mask); the **low 32 bits** are the
|
||
relevant observable.
|
||
|
||
**Important Codex caution**: do NOT NOP-class the entire MMI
|
||
opcode (`0x1C`). MMI has ~80 sub-instructions (MMI0/MMI1/MMI2/
|
||
MMI3 sub-tables); some are real data movement (PCPYLD, PCPYUD,
|
||
PCPYH), some are arithmetic (PADDB, PSUBB, PMULTW), some are
|
||
SIMD compares (PCEQB, PCEQH). Each needs its own decode arm or
|
||
careful approximation. The qbert track is fine with one
|
||
sub-instruction per chapter — same incremental cadence we've
|
||
maintained throughout.
|
||
|
||
**Likely follow-ons** after PCPYLD: any other MMI2 op qbert's
|
||
byte-walker uses. Common candidates given the `0x80808080`
|
||
sentinel: **PCEQB** (parallel compare equal byte) and **PMFHL**
|
||
(parallel move from HI/LO).
|
||
|
||
## Files changed
|
||
|
||
- `rtl/ee/ee_core_stub.sv` — 6 surgical edits.
|
||
- `sim/tb/integration/tb_ee_core_bnel.sv` — new focused TB.
|
||
- `sim/Makefile` — target + both regression lists.
|
||
|
||
## Regression
|
||
|
||
In flight; expected **165/165**.
|
||
|
||
## Pattern review
|
||
|
||
Seven qbert chapters (Ch271–Ch277). The qbert-driven track keeps
|
||
producing one chapter per blocker at sub-half-day cadence:
|
||
|
||
| Chapter | Blocker | retire_count |
|
||
|---------|---------|--------------|
|
||
| Ch271 SQ | (init) | 12 → 26,958 |
|
||
| Ch272 DADDU | | → 26,960 |
|
||
| Ch273 SYSCALL HLE | | → 26,980 |
|
||
| Ch274 BEQL | | → 26,985 |
|
||
| Ch275 SD | | → 27,006 |
|
||
| Ch276 DSLL | | → 27,016 |
|
||
| **Ch277 BNEL** | | **→ 27,017** |
|
||
|
||
The MMI surface (PCPYLD and likely siblings) will broaden the
|
||
opcode count quickly — that's expected when a real program
|
||
starts using SIMD-style operations for stdlib-class work.
|