# Ch275 closeout — SD as 2-beat 32-bit-stripe write; qbert clears the prologue, next blocker is DSLL **Status:** Closed. **Verdict from re-running qbert.elf:** `elf_first_unsupported_opcode (pc=0x00112C54 instr=0x00094C38)` — **DSLL** (Doubleword Shift Left Logical), MIPS-III SPECIAL funct 0x38. qbert ran through the SD prologue at `0x00112DAC`, executed 21 more instructions of the function body, and trapped on a 64-bit shift inside the function logic. ## Numbers | Chapter | Blocker | qbert retire_count | |---------|---------|---------------------| | Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 | | Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 | | Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 | | Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 | | **Post-Ch275 (SD)** | **DSLL at 0x00112C54** | **27,006** | ## What landed ### RTL — surgical edits in `ee_core_stub.sv` 1. `localparam OP_SD = 6'h3F` alongside OP_SQ. 2. `is_sd` decode signal. 3. **Alignment**: new `is_dword_access = is_sd`; extended `is_align_fault` with `is_dword_access && (ea[2:0] != 3'd0)`; added `is_sd` to `is_align_store`. Misaligned SD trips the same AdES path as SW/SH/SQ. 4. **Decoder allow-list**: `!is_sd` added to `is_nop_class` catch-all. 5. **FSM transition**: new `else if (is_sd)` branch in EXECUTE that initializes `sq_beat <= 0` and enters S_MEM_WRITE (reusing the SQ counter — SD only needs 2 beats, which fits in the 2-bit counter). 6. **S_MEM_WRITE comb**: combined SQ + SD into one `(is_sq || is_sd)` branch. Same beat-indexed address + `(sq_beat == 0) ? rt_val : 32'd0` data pattern. 7. **S_MEM_WRITE FSM**: retire when `(is_sq && beat==3) || (is_sd && beat==1)`, otherwise stay and increment. 7 surgical edits, ~12 LOC total. The reuse of `sq_beat` keeps the FSM minimal. ### Focused TB — `tb_ee_core_sd.sv` - Bootstrap from 0xBFC00000 reset → 0xBFC00100. - `$v0 = 0x80000400` (kseg0 → EE-RAM phys 0x400). - `$ra = 0xABCD1234` (sentinel). - Pre-poke phys 0x400/0x404 with `0xDEADBEEF` / `0xCAFEF00D`. - Execute `sd $ra, 0($v0)` (encoded via `enc_i(OP_SD, 2, 31, 0)`). - LW + BNE chain verifies `mem[0x400] = 0xABCD1234`, `mem[0x404] = 0`. - Direct hierarchical RAM peek confirms both 32-bit lanes inside the qword. PASS via syscall. Result: `retired=16 halt=1 trap=0 pc=0xbfc00134 errors=0 PASS`. ### Makefile - `tb_ee_core_sd` target. - Added to both regression lists. - Regression: 162 → **163**. ## qbert progression highlights - The 21-retire delta from Ch274 to Ch275 means qbert ran the SD prologue, executed ~20 instructions of the function body, then hit DSLL. - The trap PC `0x00112C54` is LOWER than the prologue PC `0x00112DAC` by ~0x158 bytes — so qbert's flow went forward through the prologue, then BACKWARD (a JAL to an earlier- defined function, or a loop branch). Either way, real function-call flow is happening. - `$a0 = $a3 = $v1 = 0x0012C2C0` at trap — same pointer in multiple registers. Looks like a struct pointer passed to some library function. ## Recommendation for Codex's Ch276 **`dsll $t1, $t1, 16`** at PC `0x00112C54` — opcode SPECIAL, rt=9, rd=9, sa=16, funct=0x38. Same shape as Ch272 DADDU — implement as SLL semantics for the low 32 bits. PS2 EE is 64-bit; our regfile is 32-bit; for `sa < 32`, DSLL and SLL produce identical low-32-bit results. For `sa >= 32` (would need DSLL32 with funct 0x3C), the low 32 bits become 0 — but DSLL with `sa=16` here is firmly in the SLL-equivalent range. Minimal scope: 1. `localparam FUNC_DSLL = 6'h38`. 2. `is_dsll` decode signal + add to `is_rtype_alu` group. 3. In `rtype_alu_wb`: `else if (is_dsll) rtype_alu_wb = rt_val << shamt;` (identical to SLL's path). Focused TB pattern (mirrors `tb_ee_core_daddu`): - Normal shift: `dsll $t1, $t0, 16` with `$t0 = 0x00001234` → `$t1 = 0x12340000`. - Exact qbert encoding: `dsll $t1, $t1, 16` (rt=rd=9, sa=16), encoded with `enc_rtype` and asserted to equal `0x00094C38`. - Edge cases: sa=0 (no shift), sa=31 (max valid SLL-equivalent shift). sa values 32+ would need DSLL32; defer until qbert hits one. Likely follow-ons after DSLL: **DSRL** (0x3A), **DSRA** (0x3B), **DSLL32** (0x3C), **DSRL32** (0x3E), **DSRA32** (0x3F), **DADDIU** (0x19), **LD** (0x37). Land each as the runner surfaces it. The opcode-growth cadence is now fast (~minutes per chapter); Codex can choose to fold multiple D-shifts into one chapter if qbert hits several in sequence. ## Files changed - `rtl/ee/ee_core_stub.sv` — 7 surgical edits. - `sim/tb/integration/tb_ee_core_sd.sv` — new focused TB. - `sim/Makefile` — target + both regression lists. ## Regression In flight at the moment of writing; expected **163/163** (was 162, +1 for `tb_ee_core_sd`). ## Pattern summary across the qbert track Ch271→Ch275: SQ → DADDU → SYSCALL HLE → BEQL → SD. Each chapter = - One opcode (or syscall family) added. - 2-7 RTL edits, all surgical. - One focused TB with pre/post register assertions. - One re-run of qbert that reveals the next blocker. - One regression bump. retire_count progression: 12 → 26,958 → 26,960 → 26,980 → 26,985 → 27,006. The runner is doing exactly its job — surfacing the next concrete blocker in the order qbert actually needs them, never speculating about what to add next.