# Ch276 closeout — DSLL as SLL low-32-bit; qbert progresses 10 retires, next blocker is BNEL **Status:** Closed. **Verdict from re-running qbert.elf:** `elf_first_unsupported_opcode (pc=0x00112C7C instr=0x54400019)` — **BNEL** (Branch on Not Equal Likely), MIPS-II opcode 0x15. Exactly the follow-on Codex predicted in the Ch274 closeout: *"Likely follow-on after BEQL: BNEL."* ## Numbers | Chapter | Blocker | qbert retire_count | |---------|---------|---------------------| | Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 | | Post-Ch274 (BEQL) | SD at 0x00112DAC | 26,985 | | Post-Ch275 (SD) | DSLL at 0x00112C54 | 27,006 | | **Post-Ch276 (DSLL)** | **BNEL at 0x00112C7C** | **27,016** | ## What landed ### RTL — 4 surgical edits in `ee_core_stub.sv` 1. `localparam FUNC_DSLL = 6'h38` alongside `FUNC_SLL`. 2. `is_dsll` logic decl + `assign is_dsll = is_special && (func == FUNC_DSLL)`. 3. Added `is_dsll` to the `is_rtype_alu` group. 4. Added `is_dsll` to the `is_sll` arm of `rtype_alu_wb`: `else if (is_sll || is_dsll) rtype_alu_wb = rt_val << shamt`. The arm reuses SLL's writeback path because for any valid `sa < 32` the low 32 bits of DSLL and SLL are identical. About 4 LOC of real change — mirrors Ch272 DADDU's "implement 64-bit opcode as 32-bit equivalent" pattern. ### Focused TB — `tb_ee_core_dsll.sv` Four cases: 1. **Exact qbert encoding**: `dsll $t1, $t1, 16` (rt=rd=9, sa=16). Built via `enc_rtype(OP_SPCL, 0, 9, 9, 16, FUNC_DSLL)` and asserted to equal `0x00094C38` (the literal qbert instruction). With `$t1 = 0x1234` → `$t1 = 0x12340000`. 2. **Low-bit shift**: `dsll $t2, $t3, 1` with `$t3 = 0x40000001` → `$t2 = 0x80000002`. 3. **Wrap-out (low-32 truncation)**: `dsll $t4, $t5, 1` with `$t5 = 0x80000001` → `$t4 = 0x00000002`. Proves bit-31 falls off in our 32-bit model (in a faithful 64-bit model it would move to bit 32; our model has nowhere to put it). 4. **sa=0 identity**: `dsll $t6, $t7, 0` with `$t7 = 0xABCD1234` → `$t6 = 0xABCD1234`. Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`. ### Makefile + regression - `tb_ee_core_dsll` target. - Added to both PHONY list and `run:` master. - Regression: 163 → **164**. ## qbert progression detail 10-retire delta from Ch275 (27,006 → 27,016). The DSLL retires at 0x00112C54, then qbert executes ~9 more instructions before hitting BNEL at 0x00112C7C — that's 10 PCs over 40 bytes (0x28), so a tight straight-line block with no branches between. Likely a switch-statement entry or function-body case dispatcher. `$a0 = 0x80808080` at the trap is interesting — that's a canonical "byte-broadcast" sentinel (e.g. `~(uint32 0x7F7F7F7F)`), often used by stdlib string ops to detect zero/high bytes in parallel. qbert may be calling something like `strlen` or `memchr` internally. ## Recommendation for Codex's Ch277 — BNEL **`bnel $v0, $0, +25*4`** at PC `0x00112C7C`, opcode 0x15 — the exact follow-on Codex predicted from BEQL. Same shape as Ch274 BEQL: - Decode opcode `6'h15` as BNEL. - BNEL TAKEN when `rs != rt` (same as BNE). - BNEL NOT-TAKEN: squash the delay slot. Reuse the existing Ch274 `is_beql_squash` infrastructure: 1. `localparam OP_BNEL = 6'h15`. 2. `is_bnel` decode signal. 3. Add `is_bnel` to `is_branch` group. 4. Extend `branch_taken` with `(is_bnel && (rs_val != rt_val))`. 5. Replace `is_beql_squash` with a more general `is_branch_likely_squash`: ``` is_branch_likely_squash = (is_beql && (rs_val == rt_val)) || (is_bnel && (rs_val != rt_val)); // wait — taken ``` No wait — squash fires when likely-branch is NOT taken: ``` is_branch_likely_squash = (is_beql && (rs_val != rt_val)) || (is_bnel && (rs_val == rt_val)); ``` Update `retire_advance` to use the new name. 6. Add `!is_bnel` to `is_nop_class` allow-list. Focused TB mirrors `tb_ee_core_beql`: BNEL taken (delay fires), BNEL not-taken (delay squashed), BNE cross-check (delay always fires). ~5 LOC + the TB. Likely follow-ons after BNEL: **BLEZL/BGTZL** (0x16/0x17) and **REGIMM-likely** family (BLTZL/BGEZL at REGIMM rt=0x02/0x03, BLTZALL/BGEZALL at rt=0x12/0x13). Same `squash` mechanism for all of them. Codex may want to fold multiple branch-likely variants into one chapter now that the pattern is well-locked. ## Files changed - `rtl/ee/ee_core_stub.sv` — 4 surgical edits (~4 LOC). - `sim/tb/integration/tb_ee_core_dsll.sv` — new focused TB. - `sim/Makefile` — target + both regression lists. ## Regression In flight; expected **164/164**. ## Pattern review Six qbert-driven chapters (Ch271→Ch276): - Ch271 SQ — 5 RTL edits, 4-beat write - Ch272 DADDU — 4 RTL edits, ALU low-32 - Ch273 SYSCALL HLE — 2 RTL edits, gated dispatcher - Ch274 BEQL — 6 RTL edits, branch + squash - Ch275 SD — 7 RTL edits, 2-beat write (reuses SQ counter) - **Ch276 DSLL — 4 RTL edits, ALU low-32 (reuses SLL path)** Each chapter has been smaller as the patterns lock in. Ch276 is the smallest yet — pure pattern-reuse from Ch272 + Ch275. The qbert track is well-trained, the runner correctly surfaces the next blocker each time, and the incremental cadence holds.