Files
retroDE_ps2/docs/ch276_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

136 lines
5.1 KiB
Markdown

# Ch276 closeout — DSLL as SLL low-32-bit; qbert progresses 10 retires, next blocker is BNEL
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00112C7C instr=0x54400019)`
**BNEL** (Branch on Not Equal Likely), MIPS-II opcode 0x15.
Exactly the follow-on Codex predicted in the Ch274 closeout:
*"Likely follow-on after BEQL: BNEL."*
## Numbers
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
| **Post-Ch276 (DSLL)** | **BNEL at 0x00112C7C** | **27,016** |
## What landed
### RTL — 4 surgical edits in `ee_core_stub.sv`
1. `localparam FUNC_DSLL = 6'h38` alongside `FUNC_SLL`.
2. `is_dsll` logic decl + `assign is_dsll = is_special && (func == FUNC_DSLL)`.
3. Added `is_dsll` to the `is_rtype_alu` group.
4. Added `is_dsll` to the `is_sll` arm of `rtype_alu_wb`:
`else if (is_sll || is_dsll) rtype_alu_wb = rt_val << shamt`.
The arm reuses SLL's writeback path because for any valid
`sa < 32` the low 32 bits of DSLL and SLL are identical. About
4 LOC of real change — mirrors Ch272 DADDU's "implement
64-bit opcode as 32-bit equivalent" pattern.
### Focused TB — `tb_ee_core_dsll.sv`
Four cases:
1. **Exact qbert encoding**: `dsll $t1, $t1, 16` (rt=rd=9, sa=16).
Built via `enc_rtype(OP_SPCL, 0, 9, 9, 16, FUNC_DSLL)` and
asserted to equal `0x00094C38` (the literal qbert instruction).
With `$t1 = 0x1234``$t1 = 0x12340000`.
2. **Low-bit shift**: `dsll $t2, $t3, 1` with `$t3 = 0x40000001`
`$t2 = 0x80000002`.
3. **Wrap-out (low-32 truncation)**: `dsll $t4, $t5, 1` with
`$t5 = 0x80000001``$t4 = 0x00000002`. Proves bit-31 falls
off in our 32-bit model (in a faithful 64-bit model it would
move to bit 32; our model has nowhere to put it).
4. **sa=0 identity**: `dsll $t6, $t7, 0` with `$t7 = 0xABCD1234`
`$t6 = 0xABCD1234`.
Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`.
### Makefile + regression
- `tb_ee_core_dsll` target.
- Added to both PHONY list and `run:` master.
- Regression: 163 → **164**.
## qbert progression detail
10-retire delta from Ch275 (27,006 → 27,016). The DSLL retires
at 0x00112C54, then qbert executes ~9 more instructions before
hitting BNEL at 0x00112C7C — that's 10 PCs over 40 bytes
(0x28), so a tight straight-line block with no branches between.
Likely a switch-statement entry or function-body case dispatcher.
`$a0 = 0x80808080` at the trap is interesting — that's a
canonical "byte-broadcast" sentinel (e.g. `~(uint32 0x7F7F7F7F)`),
often used by stdlib string ops to detect zero/high bytes in
parallel. qbert may be calling something like `strlen` or
`memchr` internally.
## Recommendation for Codex's Ch277 — BNEL
**`bnel $v0, $0, +25*4`** at PC `0x00112C7C`, opcode 0x15 — the
exact follow-on Codex predicted from BEQL.
Same shape as Ch274 BEQL:
- Decode opcode `6'h15` as BNEL.
- BNEL TAKEN when `rs != rt` (same as BNE).
- BNEL NOT-TAKEN: squash the delay slot.
Reuse the existing Ch274 `is_beql_squash` infrastructure:
1. `localparam OP_BNEL = 6'h15`.
2. `is_bnel` decode signal.
3. Add `is_bnel` to `is_branch` group.
4. Extend `branch_taken` with `(is_bnel && (rs_val != rt_val))`.
5. Replace `is_beql_squash` with a more general
`is_branch_likely_squash`:
```
is_branch_likely_squash = (is_beql && (rs_val == rt_val))
|| (is_bnel && (rs_val != rt_val)); // wait — taken
```
No wait — squash fires when likely-branch is NOT taken:
```
is_branch_likely_squash = (is_beql && (rs_val != rt_val))
|| (is_bnel && (rs_val == rt_val));
```
Update `retire_advance` to use the new name.
6. Add `!is_bnel` to `is_nop_class` allow-list.
Focused TB mirrors `tb_ee_core_beql`: BNEL taken (delay fires),
BNEL not-taken (delay squashed), BNE cross-check (delay always
fires). ~5 LOC + the TB.
Likely follow-ons after BNEL: **BLEZL/BGTZL** (0x16/0x17) and
**REGIMM-likely** family (BLTZL/BGEZL at REGIMM rt=0x02/0x03,
BLTZALL/BGEZALL at rt=0x12/0x13). Same `squash` mechanism for
all of them. Codex may want to fold multiple branch-likely
variants into one chapter now that the pattern is well-locked.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 4 surgical edits (~4 LOC).
- `sim/tb/integration/tb_ee_core_dsll.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Regression
In flight; expected **164/164**.
## Pattern review
Six qbert-driven chapters (Ch271→Ch276):
- Ch271 SQ — 5 RTL edits, 4-beat write
- Ch272 DADDU — 4 RTL edits, ALU low-32
- Ch273 SYSCALL HLE — 2 RTL edits, gated dispatcher
- Ch274 BEQL — 6 RTL edits, branch + squash
- Ch275 SD — 7 RTL edits, 2-beat write (reuses SQ counter)
- **Ch276 DSLL — 4 RTL edits, ALU low-32 (reuses SLL path)**
Each chapter has been smaller as the patterns lock in. Ch276
is the smallest yet — pure pattern-reuse from Ch272 + Ch275.
The qbert track is well-trained, the runner correctly surfaces
the next blocker each time, and the incremental cadence holds.