# Ch280 closeout — PSUBB byte-wise SIMD; next blocker is PNOR **Status:** Closed. **Verdict from re-running qbert.elf:** `elf_first_unsupported_opcode (pc=0x00112C94 instr=0x70091CE9)` — opcode `0x1C` (MMI) + funct `0x29` (MMI3) + sa `0x13` = **PNOR** (Parallel Not-OR). qbert's byte-walker advanced past PSUBB on the first try. ## Numbers | Chapter | Blocker | qbert retire_count | |---------|---------|---------------------| | Post-Ch278 (PCPYLD) | LQ at 0x00112C88 | 27,018 | | Post-Ch279 (LQ) | PSUBB at 0x00112C90 | 27,020 | | **Post-Ch280 (PSUBB)** | **PNOR at 0x00112C94** | **27,021** | 1-retire delta — PSUBB itself retired, PNOR is the next instruction. ## What landed ### RTL — 5 surgical edits in `ee_core_stub.sv` 1. **Constants**: `FUNC_MMI0 = 6'h08` and `MMI0_PSUBB = 5'h09`. 2. **Decode**: `is_psubb = is_mmi && (func == FUNC_MMI0) && (shamt == MMI0_PSUBB)`. Three-way AND keeps the decode narrow — any other op=0x1C/funct=0x08 sub-instruction (PADDW, PADDH, PADDB, ...) continues to strict-trap. 3. **`is_rtype_alu` group**: added `is_psubb`. 4. **`rtype_alu_wb` arm**: 4 independent byte subtracts: ```sv else if (is_psubb) begin rtype_alu_wb[ 7: 0] = rs_val[ 7: 0] - rt_val[ 7: 0]; rtype_alu_wb[15: 8] = rs_val[15: 8] - rt_val[15: 8]; rtype_alu_wb[23:16] = rs_val[23:16] - rt_val[23:16]; rtype_alu_wb[31:24] = rs_val[31:24] - rt_val[31:24]; end ``` Each lane is naturally modulo-256; no carry between bytes. 5. **`is_nop_class` allow**: `!is_psubb` added. 5 LOC of real change. ### Focused TB — `tb_ee_core_psubb.sv` Three cases: 1. **Distinct lanes (qbert encoding shape)**: `$t1 = 0x10203040`, `$t2 = 0x01020304` → `$v0 = 0x0F1E2D3C`. Encoder-output asserted to equal `0x712A1248` (qbert's literal instruction). 2. **All-wrap**: `$t3 = 0`, `$t4 = 0x01020304` → `$t5 = 0xFFFEFDFC`. Proves all 4 byte lanes underflow independently to 0xFx. 3. **No cross-byte borrow**: `$t6 = 0x12345600`, `$t7 = 0x00000001` → `$t8 = 0x123456FF`. The low byte borrows (0x00 - 0x01 = 0xFF) but **must not propagate into byte 1**. Byte 1 stays at 0x56 (= 0x56 - 0x00). This is the critical SIMD property. Result: `retired=28 halt=1 trap=0 pc=0xbfc00164 errors=0 PASS`. ### Makefile + regression - `tb_ee_core_psubb` target. - Added to both regression lists. - Regression: 167 → **168**. ## Recommendation for Codex's Ch281 — PNOR `0x70091CE9` at PC `0x00112C94`: - opcode 0x1C (MMI) - funct 0x29 (MMI3 sub-group) - sa 0x13 (PNOR within MMI3) - rs=$zero, rt=$t1, rd=$v1 - → `pnor $v1, $0, $t1` Architectural: 128-bit `rd = ~(rs | rt)`. For our 32-bit model: `$rd[31:0] = ~($rs[31:0] | $rt[31:0])` — **bit-identical to the existing standard NOR** (SPECIAL funct 0x27). The only difference between PNOR and NOR is the architectural width. With `rs = $zero`, PNOR is the canonical MIPS "NOT" pseudo-instruction: `pnor $rd, $0, $rt` ≡ `not $rd, $rt`. Implementation outline (mirrors Ch278 PCPYLD + Ch280 PSUBB): 1. `localparam FUNC_MMI3 = 6'h29`. 2. `localparam MMI3_PNOR = 5'h13`. 3. `is_pnor = is_mmi && (func == FUNC_MMI3) && (shamt == MMI3_PNOR)`. 4. Add to `is_rtype_alu`. 5. **Reuse the existing NOR writeback arm**: ```sv else if (is_nor || is_pnor) rtype_alu_wb = ~(rs_val | rt_val); ``` 6. Add `!is_pnor` to `is_nop_class` allow-list. ~4 LOC. Focused TB: - Exact qbert encoding asserted == `0x70091CE9`. - NOT-of-zero: `pnor $rd, $0, $0` → `$rd = 0xFFFFFFFF`. - NOT-of-pattern: `pnor $rd, $0, 0xAAAAAAAA` → `$rd = 0x55555555`. - General NOR: `pnor $rd, 0xF0F0F0F0, 0x0F0F0F0F` → `$rd = 0`. **Likely follow-ons after PNOR**: byte-walker reductions like **PMFHL** (move from HI/LO), or another mask op like **PAND** (MMI2 sa=0x12) / **POR** (MMI3 sa=0x12). Codex may want to consider folding the bitwise MMI family (PAND/POR/PXOR/PNOR) into one chapter since they're all reuses of existing ALU arms. ## Files changed - `rtl/ee/ee_core_stub.sv` — 5 surgical edits. - `sim/tb/integration/tb_ee_core_psubb.sv` — new focused TB. - `sim/Makefile` — target + both regression lists. ## Regression In flight; expected **168/168**. ## Pattern review (10 qbert chapters) | Ch | Blocker | Edits | Pattern | |----|---------|-------|---------| | 271 SQ | first | 5 | NEW 4-beat write | | 272 DADDU | | 4 | NEW ALU-low-32 | | 273 SYSCALL HLE | | 2 | NEW gated dispatcher | | 274 BEQL | | 6 | NEW branch+squash | | 275 SD | | 7 | REUSE SQ counter | | 276 DSLL | | 4 | REUSE DADDU | | 277 BNEL | | 6 | REUSE BEQL squash | | 278 PCPYLD | | 4 | NEW MMI narrow-decode | | 279 LQ | | 5 | REUSE LW path | | **280 PSUBB** | | **5** | **REUSE MMI narrow (byte-SIMD)** | 10 chapters in, qbert at 27,021 retires, regression at 168. SIMD byte-walker pattern is locking in: LQ → PSUBB → PNOR (likely → PMFHL → branch). Each chapter is now ~4-5 LOC + a TB; cadence holds at sub-half-day per chapter.