ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
159 lines
6.5 KiB
Markdown
159 lines
6.5 KiB
Markdown
# Ch274 closeout — BEQL with squash-on-not-taken; qbert lands in a function prologue, next blocker is SD
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112DAC instr=0xFFBF0020)` —
|
|
**SD** (Store Doubleword, MIPS-III). qbert passed the C++
|
|
constructor walker's BEQL correctly, JAL'd into a function at
|
|
PC `0x00112DAC`, and trapped on the very first instruction of
|
|
that function — the canonical `sd $ra, 0x20($sp)` register-save
|
|
prologue.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 |
|
|
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 |
|
|
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
|
|
| **Post-Ch274 (BEQL)** | **SD at 0x00112DAC** | **26,985** |
|
|
|
|
The 5-retire delta covers: BEQL squash → `addiu $v0, $v1, 4` →
|
|
`lw $a0, 0($v0)` → `addiu $a1, $v0, 4` → `jal 0x00112DAC` →
|
|
first instruction of the called function (SD, traps). The
|
|
~78 KB PC jump to `0x00112DAC` confirms the BEQL squash worked
|
|
— qbert's `$a0` was NOT clobbered to 0 by the squashed delay
|
|
slot, the LW loaded the real constructor-pointer, and the JAL
|
|
dispatched correctly.
|
|
|
|
## What landed
|
|
|
|
### RTL — surgical edits in `ee_core_stub.sv`
|
|
|
|
1. **Opcode**: `localparam OP_BEQL = 6'h14` alongside `OP_BEQ`.
|
|
2. **Decode**: `is_beql` signal + `assign is_beql = (opcode == OP_BEQL)`.
|
|
3. **Branch logic**: BEQL added to `is_branch` group and to
|
|
`branch_taken` (same `(rs_val == rt_val)` condition as BEQ).
|
|
4. **New signal `is_beql_squash`**:
|
|
`is_beql && (rs_val != rt_val)` — the load-bearing case.
|
|
5. **`retire_advance`**: when `is_beql_squash` is true,
|
|
`next_pc <= pc + 32'd8` (skip the delay slot directly);
|
|
`new_branch_pending` stays low so no stale target leaks.
|
|
Existing BEQ/BNE/jump path unchanged.
|
|
6. **Decoder allow-list**: added `!is_beql` to the `is_nop_class`
|
|
catch-all so SQ doesn't get strict-trap'd.
|
|
|
|
About 6 LOC of real change.
|
|
|
|
### Focused TB — `tb_ee_core_beql.sv`
|
|
|
|
Three cases per Codex's spec:
|
|
|
|
1. **BEQL taken** (`$t0 == $t1`): branch reaches target;
|
|
delay slot DOES execute (writes a sentinel into `$t5`).
|
|
Cross-checked by `$t6 = 0xCAFE` at the target.
|
|
2. **BEQL not-taken** (`$t2 != $t3`): delay slot squashed.
|
|
`$t7 = 0x2222` at PC+8 proves we landed correctly past the
|
|
squash. **Inline BNE chain verifies `$t5` was NOT clobbered
|
|
by the squashed delay slot** (`$t5` stays at its pre-BEQL
|
|
`0xBEEF0000` value).
|
|
3. **BEQ not-taken cross-check** (same operands): plain BEQ's
|
|
delay slot DOES execute, so `$t5` gets `0xCAB` ORed into the
|
|
low 16 bits (`$t5 = 0xBABE0CAB`). Proves BEQL's squash
|
|
differs from BEQ's no-squash behavior.
|
|
|
|
Encoding gotcha caught during TB authoring: my initial delay
|
|
slots used `ori $t5, $0, ...` (clobbers `$t5` regardless of
|
|
prior value) instead of `ori $t5, $t5, ...` (ORs into `$t5`,
|
|
preserving high bits). The first build FAILED the Case-3 check
|
|
with `$t5=0x00000CAB` instead of `0xBABE0CAB`. Fixed by changing
|
|
the rs field to RT5 so the delay slot ORs into the existing
|
|
value — making both "delay-fired" and "delay-squashed" cases
|
|
distinguishable by the high half-word.
|
|
|
|
Result: `retired=21 halt=1 trap=0 pc=0xbfc00158 errors=0 PASS`.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_beql` target.
|
|
- Added to both PHONY list and `run:` master.
|
|
- Regression: 161 → **162**.
|
|
|
|
## qbert disassembly around the new blocker (PC 0x00112DAC)
|
|
|
|
The JAL at `0x001000D4` calls into a function at `0x00112DAC`.
|
|
That function's prologue is:
|
|
|
|
```
|
|
0x00112DAC: 0xFFBF0020 sd $ra, 0x20($sp) <-- TRAP (opcode 0x3F, MIPS-III SD)
|
|
```
|
|
|
|
**SD** (Store Doubleword) is the MIPS-III 64-bit cousin of SW.
|
|
PS2 ELFs use it everywhere in function prologues to save
|
|
64-bit register values (`$ra`, `$s*`) onto the stack.
|
|
|
|
## Recommendation for Codex's Ch275
|
|
|
|
**Implement SD as a 2-beat 32-bit-stripe write FSM**, mirroring
|
|
Ch271's SQ pattern but smaller:
|
|
|
|
- **Decode**: opcode `6'h3F` → `is_sd`.
|
|
- **Alignment**: SD requires 8-byte alignment (`ea[2:0] == 0`).
|
|
Misaligned → AdES path (same as existing SW alignment).
|
|
- **FSM**: reuse the `sq_beat` counter (or add `sd_beat`); 2
|
|
beats this time. Beat 0 writes `rt_val` (low 32 bits of $rt)
|
|
at EA; beat 1 writes 0 at EA+4 (upper 32 bits of $rt not
|
|
modelled — same approximation we made for SQ beats 1-3).
|
|
- **For `sd $ra,...`**: real PS2 callees later `LD` to restore
|
|
64-bit `$ra`. Our model's upper 32 bits are always 0, so
|
|
the round-trip works as long as the function doesn't do
|
|
64-bit math on `$ra` itself (rare).
|
|
|
|
Focused TB shape (mirrors `tb_ee_core_sq`):
|
|
- Pre-poke RAM target with non-zero junk.
|
|
- Execute `sd $rt, 0(base)` with `$rt` non-zero in low 32 bits.
|
|
- LW + BNE chain verifies `mem[base+0] = rt_val_low` and
|
|
`mem[base+4] = 0`.
|
|
- Direct hierarchical RAM peek for belt-and-braces.
|
|
|
|
This is structurally identical to Ch271 with `4 → 2` beats
|
|
and `16 → 8` byte alignment. Should be ~30 minutes of work.
|
|
|
|
Likely follow-on after SD: **LD** (Load Doubleword, opcode
|
|
0x37). When the called function eventually returns, it'll
|
|
`LD $ra, 0x20($sp)` to restore the saved register; our
|
|
model needs the corresponding 2-beat read path. Codex may
|
|
want to fold SD+LD into one chapter since they're symmetric.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 6 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_beql.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight at the moment of writing; expected **162/162** (was
|
|
161, +1 for `tb_ee_core_beql`).
|
|
|
|
## Process notes
|
|
|
|
- **Cross-check via BEQ in the same TB.** Codex specifically
|
|
asked for the BEQ cross-check, and it caught a real
|
|
difference: Case 3 (BEQ not-taken) writes `$t5` low bits
|
|
while Case 2 (BEQL not-taken) does NOT. Without the cross-
|
|
check, a regression where BEQL accidentally behaved like
|
|
BEQ would silently pass on the "PC landed at PC+8" check
|
|
alone.
|
|
- **OR-INTO vs OR-FROM-ZERO encoding bugs are easy to make.**
|
|
My first TB pass had `ori $rt, $0, imm` (overwriting),
|
|
which loses info about whether the delay slot fired. Always
|
|
use `ori $rt, $rt, imm` (or similar accumulating op) in
|
|
delay-slot probes so "did it fire?" is observable by a
|
|
bitwise comparison rather than a value comparison.
|
|
- **The pattern continues to compress.** Ch271 SQ took 5
|
|
edits + a TB. Ch272 DADDU took 4 + a TB. Ch273 SYSCALL HLE
|
|
took 2 + a TB (plus a runner update). Ch274 BEQL is 6 + a
|
|
TB. Each is a 1-day chapter at most. The qbert progression
|
|
is now `12 → 26,958 → 26,960 → 26,980 → 26,985 retires` —
|
|
the runner is doing its job.
|