Files
retroDE_ps2/docs/ch275_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

139 lines
5.2 KiB
Markdown

# Ch275 closeout — SD as 2-beat 32-bit-stripe write; qbert clears the prologue, next blocker is DSLL
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00112C54 instr=0x00094C38)`
**DSLL** (Doubleword Shift Left Logical), MIPS-III SPECIAL
funct 0x38. qbert ran through the SD prologue at `0x00112DAC`,
executed 21 more instructions of the function body, and trapped
on a 64-bit shift inside the function logic.
## Numbers
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 |
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 |
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 |
| Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 |
| **Post-Ch275 (SD)** | **DSLL at 0x00112C54** | **27,006** |
## What landed
### RTL — surgical edits in `ee_core_stub.sv`
1. `localparam OP_SD = 6'h3F` alongside OP_SQ.
2. `is_sd` decode signal.
3. **Alignment**: new `is_dword_access = is_sd`; extended
`is_align_fault` with `is_dword_access && (ea[2:0] != 3'd0)`;
added `is_sd` to `is_align_store`. Misaligned SD trips the
same AdES path as SW/SH/SQ.
4. **Decoder allow-list**: `!is_sd` added to `is_nop_class`
catch-all.
5. **FSM transition**: new `else if (is_sd)` branch in EXECUTE
that initializes `sq_beat <= 0` and enters S_MEM_WRITE
(reusing the SQ counter — SD only needs 2 beats, which fits
in the 2-bit counter).
6. **S_MEM_WRITE comb**: combined SQ + SD into one
`(is_sq || is_sd)` branch. Same beat-indexed address +
`(sq_beat == 0) ? rt_val : 32'd0` data pattern.
7. **S_MEM_WRITE FSM**: retire when `(is_sq && beat==3) ||
(is_sd && beat==1)`, otherwise stay and increment.
7 surgical edits, ~12 LOC total. The reuse of `sq_beat` keeps
the FSM minimal.
### Focused TB — `tb_ee_core_sd.sv`
- Bootstrap from 0xBFC00000 reset → 0xBFC00100.
- `$v0 = 0x80000400` (kseg0 → EE-RAM phys 0x400).
- `$ra = 0xABCD1234` (sentinel).
- Pre-poke phys 0x400/0x404 with `0xDEADBEEF` / `0xCAFEF00D`.
- Execute `sd $ra, 0($v0)` (encoded via `enc_i(OP_SD, 2, 31, 0)`).
- LW + BNE chain verifies `mem[0x400] = 0xABCD1234`,
`mem[0x404] = 0`.
- Direct hierarchical RAM peek confirms both 32-bit lanes
inside the qword. PASS via syscall.
Result: `retired=16 halt=1 trap=0 pc=0xbfc00134 errors=0 PASS`.
### Makefile
- `tb_ee_core_sd` target.
- Added to both regression lists.
- Regression: 162 → **163**.
## qbert progression highlights
- The 21-retire delta from Ch274 to Ch275 means qbert ran the
SD prologue, executed ~20 instructions of the function body,
then hit DSLL.
- The trap PC `0x00112C54` is LOWER than the prologue PC
`0x00112DAC` by ~0x158 bytes — so qbert's flow went forward
through the prologue, then BACKWARD (a JAL to an earlier-
defined function, or a loop branch). Either way, real
function-call flow is happening.
- `$a0 = $a3 = $v1 = 0x0012C2C0` at trap — same pointer in
multiple registers. Looks like a struct pointer passed to
some library function.
## Recommendation for Codex's Ch276
**`dsll $t1, $t1, 16`** at PC `0x00112C54` — opcode SPECIAL,
rt=9, rd=9, sa=16, funct=0x38.
Same shape as Ch272 DADDU — implement as SLL semantics for
the low 32 bits. PS2 EE is 64-bit; our regfile is 32-bit; for
`sa < 32`, DSLL and SLL produce identical low-32-bit results.
For `sa >= 32` (would need DSLL32 with funct 0x3C), the low 32
bits become 0 — but DSLL with `sa=16` here is firmly in the
SLL-equivalent range.
Minimal scope:
1. `localparam FUNC_DSLL = 6'h38`.
2. `is_dsll` decode signal + add to `is_rtype_alu` group.
3. In `rtype_alu_wb`: `else if (is_dsll) rtype_alu_wb = rt_val << shamt;`
(identical to SLL's path).
Focused TB pattern (mirrors `tb_ee_core_daddu`):
- Normal shift: `dsll $t1, $t0, 16` with `$t0 = 0x00001234` →
`$t1 = 0x12340000`.
- Exact qbert encoding: `dsll $t1, $t1, 16` (rt=rd=9, sa=16),
encoded with `enc_rtype` and asserted to equal `0x00094C38`.
- Edge cases: sa=0 (no shift), sa=31 (max valid SLL-equivalent
shift). sa values 32+ would need DSLL32; defer until qbert
hits one.
Likely follow-ons after DSLL: **DSRL** (0x3A), **DSRA** (0x3B),
**DSLL32** (0x3C), **DSRL32** (0x3E), **DSRA32** (0x3F),
**DADDIU** (0x19), **LD** (0x37). Land each as the runner
surfaces it. The opcode-growth cadence is now fast (~minutes
per chapter); Codex can choose to fold multiple D-shifts into
one chapter if qbert hits several in sequence.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 7 surgical edits.
- `sim/tb/integration/tb_ee_core_sd.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Regression
In flight at the moment of writing; expected **163/163** (was
162, +1 for `tb_ee_core_sd`).
## Pattern summary across the qbert track
Ch271→Ch275: SQ → DADDU → SYSCALL HLE → BEQL → SD. Each chapter
=
- One opcode (or syscall family) added.
- 2-7 RTL edits, all surgical.
- One focused TB with pre/post register assertions.
- One re-run of qbert that reveals the next blocker.
- One regression bump.
retire_count progression: 12 → 26,958 → 26,960 → 26,980 →
26,985 → 27,006. The runner is doing exactly its job —
surfacing the next concrete blocker in the order qbert
actually needs them, never speculating about what to add
next.