ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
136 lines
5.1 KiB
Markdown
136 lines
5.1 KiB
Markdown
# Ch276 closeout — DSLL as SLL low-32-bit; qbert progresses 10 retires, next blocker is BNEL
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112C7C instr=0x54400019)` —
|
|
**BNEL** (Branch on Not Equal Likely), MIPS-II opcode 0x15.
|
|
Exactly the follow-on Codex predicted in the Ch274 closeout:
|
|
*"Likely follow-on after BEQL: BNEL."*
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
|
|
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
|
|
| Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 |
|
|
| **Post-Ch276 (DSLL)** | **BNEL at 0x00112C7C** | **27,016** |
|
|
|
|
## What landed
|
|
|
|
### RTL — 4 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. `localparam FUNC_DSLL = 6'h38` alongside `FUNC_SLL`.
|
|
2. `is_dsll` logic decl + `assign is_dsll = is_special && (func == FUNC_DSLL)`.
|
|
3. Added `is_dsll` to the `is_rtype_alu` group.
|
|
4. Added `is_dsll` to the `is_sll` arm of `rtype_alu_wb`:
|
|
`else if (is_sll || is_dsll) rtype_alu_wb = rt_val << shamt`.
|
|
|
|
The arm reuses SLL's writeback path because for any valid
|
|
`sa < 32` the low 32 bits of DSLL and SLL are identical. About
|
|
4 LOC of real change — mirrors Ch272 DADDU's "implement
|
|
64-bit opcode as 32-bit equivalent" pattern.
|
|
|
|
### Focused TB — `tb_ee_core_dsll.sv`
|
|
|
|
Four cases:
|
|
1. **Exact qbert encoding**: `dsll $t1, $t1, 16` (rt=rd=9, sa=16).
|
|
Built via `enc_rtype(OP_SPCL, 0, 9, 9, 16, FUNC_DSLL)` and
|
|
asserted to equal `0x00094C38` (the literal qbert instruction).
|
|
With `$t1 = 0x1234` → `$t1 = 0x12340000`.
|
|
2. **Low-bit shift**: `dsll $t2, $t3, 1` with `$t3 = 0x40000001`
|
|
→ `$t2 = 0x80000002`.
|
|
3. **Wrap-out (low-32 truncation)**: `dsll $t4, $t5, 1` with
|
|
`$t5 = 0x80000001` → `$t4 = 0x00000002`. Proves bit-31 falls
|
|
off in our 32-bit model (in a faithful 64-bit model it would
|
|
move to bit 32; our model has nowhere to put it).
|
|
4. **sa=0 identity**: `dsll $t6, $t7, 0` with `$t7 = 0xABCD1234`
|
|
→ `$t6 = 0xABCD1234`.
|
|
|
|
Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_dsll` target.
|
|
- Added to both PHONY list and `run:` master.
|
|
- Regression: 163 → **164**.
|
|
|
|
## qbert progression detail
|
|
|
|
10-retire delta from Ch275 (27,006 → 27,016). The DSLL retires
|
|
at 0x00112C54, then qbert executes ~9 more instructions before
|
|
hitting BNEL at 0x00112C7C — that's 10 PCs over 40 bytes
|
|
(0x28), so a tight straight-line block with no branches between.
|
|
Likely a switch-statement entry or function-body case dispatcher.
|
|
|
|
`$a0 = 0x80808080` at the trap is interesting — that's a
|
|
canonical "byte-broadcast" sentinel (e.g. `~(uint32 0x7F7F7F7F)`),
|
|
often used by stdlib string ops to detect zero/high bytes in
|
|
parallel. qbert may be calling something like `strlen` or
|
|
`memchr` internally.
|
|
|
|
## Recommendation for Codex's Ch277 — BNEL
|
|
|
|
**`bnel $v0, $0, +25*4`** at PC `0x00112C7C`, opcode 0x15 — the
|
|
exact follow-on Codex predicted from BEQL.
|
|
|
|
Same shape as Ch274 BEQL:
|
|
|
|
- Decode opcode `6'h15` as BNEL.
|
|
- BNEL TAKEN when `rs != rt` (same as BNE).
|
|
- BNEL NOT-TAKEN: squash the delay slot.
|
|
|
|
Reuse the existing Ch274 `is_beql_squash` infrastructure:
|
|
|
|
1. `localparam OP_BNEL = 6'h15`.
|
|
2. `is_bnel` decode signal.
|
|
3. Add `is_bnel` to `is_branch` group.
|
|
4. Extend `branch_taken` with `(is_bnel && (rs_val != rt_val))`.
|
|
5. Replace `is_beql_squash` with a more general
|
|
`is_branch_likely_squash`:
|
|
```
|
|
is_branch_likely_squash = (is_beql && (rs_val == rt_val))
|
|
|| (is_bnel && (rs_val != rt_val)); // wait — taken
|
|
```
|
|
No wait — squash fires when likely-branch is NOT taken:
|
|
```
|
|
is_branch_likely_squash = (is_beql && (rs_val != rt_val))
|
|
|| (is_bnel && (rs_val == rt_val));
|
|
```
|
|
Update `retire_advance` to use the new name.
|
|
6. Add `!is_bnel` to `is_nop_class` allow-list.
|
|
|
|
Focused TB mirrors `tb_ee_core_beql`: BNEL taken (delay fires),
|
|
BNEL not-taken (delay squashed), BNE cross-check (delay always
|
|
fires). ~5 LOC + the TB.
|
|
|
|
Likely follow-ons after BNEL: **BLEZL/BGTZL** (0x16/0x17) and
|
|
**REGIMM-likely** family (BLTZL/BGEZL at REGIMM rt=0x02/0x03,
|
|
BLTZALL/BGEZALL at rt=0x12/0x13). Same `squash` mechanism for
|
|
all of them. Codex may want to fold multiple branch-likely
|
|
variants into one chapter now that the pattern is well-locked.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 4 surgical edits (~4 LOC).
|
|
- `sim/tb/integration/tb_ee_core_dsll.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected **164/164**.
|
|
|
|
## Pattern review
|
|
|
|
Six qbert-driven chapters (Ch271→Ch276):
|
|
- Ch271 SQ — 5 RTL edits, 4-beat write
|
|
- Ch272 DADDU — 4 RTL edits, ALU low-32
|
|
- Ch273 SYSCALL HLE — 2 RTL edits, gated dispatcher
|
|
- Ch274 BEQL — 6 RTL edits, branch + squash
|
|
- Ch275 SD — 7 RTL edits, 2-beat write (reuses SQ counter)
|
|
- **Ch276 DSLL — 4 RTL edits, ALU low-32 (reuses SLL path)**
|
|
|
|
Each chapter has been smaller as the patterns lock in. Ch276
|
|
is the smallest yet — pure pattern-reuse from Ch272 + Ch275.
|
|
The qbert track is well-trained, the runner correctly surfaces
|
|
the next blocker each time, and the incremental cadence holds.
|