ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
139 lines
5.2 KiB
Markdown
139 lines
5.2 KiB
Markdown
# Ch275 closeout — SD as 2-beat 32-bit-stripe write; qbert clears the prologue, next blocker is DSLL
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_first_unsupported_opcode (pc=0x00112C54 instr=0x00094C38)` —
|
|
**DSLL** (Doubleword Shift Left Logical), MIPS-III SPECIAL
|
|
funct 0x38. qbert ran through the SD prologue at `0x00112DAC`,
|
|
executed 21 more instructions of the function body, and trapped
|
|
on a 64-bit shift inside the function logic.
|
|
|
|
## Numbers
|
|
|
|
| Chapter | Blocker | qbert retire_count |
|
|
|---------|---------|---------------------|
|
|
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 |
|
|
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 |
|
|
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
|
|
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
|
|
| **Post-Ch275 (SD)** | **DSLL at 0x00112C54** | **27,006** |
|
|
|
|
## What landed
|
|
|
|
### RTL — surgical edits in `ee_core_stub.sv`
|
|
|
|
1. `localparam OP_SD = 6'h3F` alongside OP_SQ.
|
|
2. `is_sd` decode signal.
|
|
3. **Alignment**: new `is_dword_access = is_sd`; extended
|
|
`is_align_fault` with `is_dword_access && (ea[2:0] != 3'd0)`;
|
|
added `is_sd` to `is_align_store`. Misaligned SD trips the
|
|
same AdES path as SW/SH/SQ.
|
|
4. **Decoder allow-list**: `!is_sd` added to `is_nop_class`
|
|
catch-all.
|
|
5. **FSM transition**: new `else if (is_sd)` branch in EXECUTE
|
|
that initializes `sq_beat <= 0` and enters S_MEM_WRITE
|
|
(reusing the SQ counter — SD only needs 2 beats, which fits
|
|
in the 2-bit counter).
|
|
6. **S_MEM_WRITE comb**: combined SQ + SD into one
|
|
`(is_sq || is_sd)` branch. Same beat-indexed address +
|
|
`(sq_beat == 0) ? rt_val : 32'd0` data pattern.
|
|
7. **S_MEM_WRITE FSM**: retire when `(is_sq && beat==3) ||
|
|
(is_sd && beat==1)`, otherwise stay and increment.
|
|
|
|
7 surgical edits, ~12 LOC total. The reuse of `sq_beat` keeps
|
|
the FSM minimal.
|
|
|
|
### Focused TB — `tb_ee_core_sd.sv`
|
|
|
|
- Bootstrap from 0xBFC00000 reset → 0xBFC00100.
|
|
- `$v0 = 0x80000400` (kseg0 → EE-RAM phys 0x400).
|
|
- `$ra = 0xABCD1234` (sentinel).
|
|
- Pre-poke phys 0x400/0x404 with `0xDEADBEEF` / `0xCAFEF00D`.
|
|
- Execute `sd $ra, 0($v0)` (encoded via `enc_i(OP_SD, 2, 31, 0)`).
|
|
- LW + BNE chain verifies `mem[0x400] = 0xABCD1234`,
|
|
`mem[0x404] = 0`.
|
|
- Direct hierarchical RAM peek confirms both 32-bit lanes
|
|
inside the qword. PASS via syscall.
|
|
|
|
Result: `retired=16 halt=1 trap=0 pc=0xbfc00134 errors=0 PASS`.
|
|
|
|
### Makefile
|
|
|
|
- `tb_ee_core_sd` target.
|
|
- Added to both regression lists.
|
|
- Regression: 162 → **163**.
|
|
|
|
## qbert progression highlights
|
|
|
|
- The 21-retire delta from Ch274 to Ch275 means qbert ran the
|
|
SD prologue, executed ~20 instructions of the function body,
|
|
then hit DSLL.
|
|
- The trap PC `0x00112C54` is LOWER than the prologue PC
|
|
`0x00112DAC` by ~0x158 bytes — so qbert's flow went forward
|
|
through the prologue, then BACKWARD (a JAL to an earlier-
|
|
defined function, or a loop branch). Either way, real
|
|
function-call flow is happening.
|
|
- `$a0 = $a3 = $v1 = 0x0012C2C0` at trap — same pointer in
|
|
multiple registers. Looks like a struct pointer passed to
|
|
some library function.
|
|
|
|
## Recommendation for Codex's Ch276
|
|
|
|
**`dsll $t1, $t1, 16`** at PC `0x00112C54` — opcode SPECIAL,
|
|
rt=9, rd=9, sa=16, funct=0x38.
|
|
|
|
Same shape as Ch272 DADDU — implement as SLL semantics for
|
|
the low 32 bits. PS2 EE is 64-bit; our regfile is 32-bit; for
|
|
`sa < 32`, DSLL and SLL produce identical low-32-bit results.
|
|
For `sa >= 32` (would need DSLL32 with funct 0x3C), the low 32
|
|
bits become 0 — but DSLL with `sa=16` here is firmly in the
|
|
SLL-equivalent range.
|
|
|
|
Minimal scope:
|
|
1. `localparam FUNC_DSLL = 6'h38`.
|
|
2. `is_dsll` decode signal + add to `is_rtype_alu` group.
|
|
3. In `rtype_alu_wb`: `else if (is_dsll) rtype_alu_wb = rt_val << shamt;`
|
|
(identical to SLL's path).
|
|
|
|
Focused TB pattern (mirrors `tb_ee_core_daddu`):
|
|
- Normal shift: `dsll $t1, $t0, 16` with `$t0 = 0x00001234` →
|
|
`$t1 = 0x12340000`.
|
|
- Exact qbert encoding: `dsll $t1, $t1, 16` (rt=rd=9, sa=16),
|
|
encoded with `enc_rtype` and asserted to equal `0x00094C38`.
|
|
- Edge cases: sa=0 (no shift), sa=31 (max valid SLL-equivalent
|
|
shift). sa values 32+ would need DSLL32; defer until qbert
|
|
hits one.
|
|
|
|
Likely follow-ons after DSLL: **DSRL** (0x3A), **DSRA** (0x3B),
|
|
**DSLL32** (0x3C), **DSRL32** (0x3E), **DSRA32** (0x3F),
|
|
**DADDIU** (0x19), **LD** (0x37). Land each as the runner
|
|
surfaces it. The opcode-growth cadence is now fast (~minutes
|
|
per chapter); Codex can choose to fold multiple D-shifts into
|
|
one chapter if qbert hits several in sequence.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 7 surgical edits.
|
|
- `sim/tb/integration/tb_ee_core_sd.sv` — new focused TB.
|
|
- `sim/Makefile` — target + both regression lists.
|
|
|
|
## Regression
|
|
|
|
In flight at the moment of writing; expected **163/163** (was
|
|
162, +1 for `tb_ee_core_sd`).
|
|
|
|
## Pattern summary across the qbert track
|
|
|
|
Ch271→Ch275: SQ → DADDU → SYSCALL HLE → BEQL → SD. Each chapter
|
|
=
|
|
- One opcode (or syscall family) added.
|
|
- 2-7 RTL edits, all surgical.
|
|
- One focused TB with pre/post register assertions.
|
|
- One re-run of qbert that reveals the next blocker.
|
|
- One regression bump.
|
|
|
|
retire_count progression: 12 → 26,958 → 26,960 → 26,980 →
|
|
26,985 → 27,006. The runner is doing exactly its job —
|
|
surfacing the next concrete blocker in the order qbert
|
|
actually needs them, never speculating about what to add
|
|
next.
|