# Ch273 closeout — minimal EE syscall HLE; qbert clears its kernel-call prolog, next blocker is BEQL **Status:** Closed. Codex's spec implemented exactly: minimal HLE dispatcher for three crt0 syscalls (`EndOfHeap`, `InitMainThread`, `FlushCache`), gated behind a parameter so existing TBs are unaffected. **Verdict from re-running qbert.elf:** `elf_first_unsupported_opcode (pc=0x001000C0 instr=0x50600004)` — **BEQL** (branch on equal likely), MIPS-II. That frames Ch274. ## Numbers across the opcode/syscall chapters | Chapter | Blocker | qbert retire_count | Verdict | |---------|---------|---------------------|---------| | Ch270 (init) | SQ at 0x00100024 | 12 | first_unsupported_opcode | | Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 | first_unsupported_opcode | | Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 | `elf_halted` | | **Post-Ch273 (SYSCALL HLE)** | **BEQL at 0x001000C0** | **26,980** | **`elf_first_unsupported_opcode`** | 20 more retires this chapter: all 3 syscalls dispatched, the prolog used the returns to set up `$sp` and a small initializer- table walker, and the trap fires at the FIRST instruction the crt0 emits that we don't decode — `BEQL`. ## What landed ### RTL — 2 surgical additions in `ee_core_stub.sv` 1. **Parameter**: `EE_SYSCALL_HLE_ENABLE` (default `1'b0`) + `SYSCALL_HEAP_END` (default `32'h001E_0000`). Default-off so every existing TB whose `syscall` is a "halt-PASS-marker" (addi/slti/etc.) keeps its semantics. 2. **Dispatcher**: new `else if (EE_SYSCALL_HLE_ENABLE)` branch after the Ch199 special case. `case (regfile[3])` on `$v1`: | `$v1` | name | `$v0` returned | resume | |-------|----------------|-----------------------|-------------| | 0x3C | EndOfHeap | `SYSCALL_HEAP_END` | PC + 4 | | 0x3D | InitMainThread | 0 | PC + 4 | | 0x64 | FlushCache | 0 | PC + 4 | | other | (unhandled) | (none) | **halt** | `pc <= pc + 4` (per Codex's correction — this is normal user-code SYSCALL resume, NOT RFE; RFE is Ch199's path). ### Focused TB — `tb_ee_core_syscall_hle` Four cases: 1. `syscall` with `$v1=0x3C` → verify `$v0 = 0x001E0000` 2. `syscall` with `$v1=0x3D` → verify `$v0 = 0` 3. `syscall` with `$v1=0x64` → verify `$v0 = 0` 4. `syscall` with `$v1=0x7777` → verify HALT (PASS marker) Independent verification: captures `$v0` at the cycle AFTER each known syscall retires AND runs a `BNE $v0, expected, FAIL` chain. Both must agree. Final PC + `$v1=0x7777` post-halt confirms we landed on the unhandled-syscall path correctly. Result: `retired=17 halt=1 trap=0 errors=0 PASS`. ### Runner update — `tb_ee_core_elf_runner.sv` - Wires `EE_SYSCALL_HLE_ENABLE=1` on the ee_core_stub. - Halt-time SUMMARY now includes the live register snapshot: ``` saw_halt = 1 at_pc=0x... $v1=0x... $a0=0x... $a1=0x... $a2=0x... $a3=0x... ``` - New verdict shape `elf_first_unhandled_syscall` when the halt is on a `0x0000000C` instruction with unknown `$v1`. (For this qbert run, the dispatcher handled all 3 and the trap was a separate opcode issue — but the verdict shape is ready for whenever the next unknown SYSCALL surfaces.) ### Makefile - `tb_ee_core_syscall_hle` target. - Added to both regression lists. - Regression: 160 → **161**. ## Codex Ch273 acceptance — line-by-line | Requirement | Status | |----------------------------------------------------------------------------|--------| | Minimal HLE handler in ee_core_stub for normal user-mode SYSCALL | ✅ | | $v1=0x3C EndOfHeap → conservative top-of-RAM, PC+=4 | ✅ | | $v1=0x3D InitMainThread → success ($v0=0), no scheduler mutation, PC+=4 | ✅ | | $v1=0x64 FlushCache → no-op success, PC+=4 | ✅ | | **Not RFE — PC = syscall PC + 4** | ✅ | | Unhandled $v1 still halts; TB can read $v1/$a0-$a3 for verdict | ✅ | | Focused TB: 3 syscalls in sequence + 1 unknown-fallback | ✅ | | Regression unchanged for default-off | ✅ | | Re-run qbert, report next blocker | ✅ | ## qbert disassembly around the new blocker ``` 0x001000A0: lui $v0, 0x0013 ; $v0 = 0x00130000 0x001000A4: addiu $v0, $v0, 0xC800 ; $v0 = 0x0012C800 0x001000A8: lw $v1, 0($v0) ; $v1 = mem[0x0012C800] 0x001000AC: bne $v1, $0, +7*4 ; skip ahead if non-zero 0x001000B0: nop ; delay 0x001000B4: lui $v0, 0x0013 0x001000B8: addiu $v0, $v0, 0xC944 ; $v0 = 0x0012C944 0x001000BC: lw $v1, 0($v0) ; $v1 = mem[0x0012C944] (= 0 per halt $v1=0) 0x001000C0: beql $v1, $0, +4*4 ; <-- TRAPS HERE 0x001000C4: addiu $a0, $0, 0 ; delay slot (squashed if BEQL not taken) 0x001000C8: addiu $v0, $v1, 4 0x001000CC: lw $a0, 0($v0) 0x001000D0: addiu $a1, $v0, 4 0x001000D4: jal ``` This is the C++ static-constructor walker (or a similar initialization table). The BEQL checks whether the table head pointer is null — and **branch-likely semantics are load-bearing**: the delay slot at `0x001000C4` clobbers `$a0` to 0 only if the branch is taken. If we naïvely decode BEQL as plain BEQ, the delay slot would execute on the not-taken path too, silently corrupting `$a0`. ## Recommendation for Codex's Ch274 **Implement BEQL with proper "squash on not-taken" semantics.** MIPS-II "branch likely" family: BEQL (0x14), BNEL (0x15), BLEZL (0x16), BGTZL (0x17), and REGIMM BLTZL/BGEZL/BLTZALL/BGEZALL. Compilers (especially older PS2 SDK gcc with `-fmoveloop-invariants` or default for-loops) emit these as the canonical loop branch. Three Ch274 framings, in order of scope: 1. **BEQL only.** Smallest change. Decode `is_beql`, share `branch_taken` logic with BEQ (rs==rt), but unlike BEQ, when not taken: PC += 8 (skip both the branch and its delay slot), no delay-slot execute. Adds `is_branch_likely` distinction in the retire/PC-advance logic. 2. **BEQL + BNEL** (the two most common). BNEL is the inverse condition (rs!=rt); same likely semantics. Both surface as `0x14` (BEQL) and `0x15` (BNEL) opcodes. 3. **Full branch-likely family.** BEQL/BNEL/BLEZL/BGTZL + REGIMM variants. Bigger surface; usually you only need 1–2 of these per chapter until qbert/a later ELF surfaces another. **My read: (1) — BEQL only.** Same one-question-one-chapter pattern. The next blocker after BEQL might or might not be BNEL; let the runner pick. The implementation hook: existing ee_core_stub has `branch_pending` + `instr_in_delay_slot` + a `branch_taken` combinational signal. For BEQL we need to gate "set branch_pending + queue delay-slot execution" on `branch_taken`, and on not-taken just `pc <= pc + 8` directly (skip the delay slot). Probably a 5–8 line change. Focused TB: 3 cases mirroring Ch272 shape — - BEQL taken: `$v1==$0`, target reached, delay slot executed (writes $a0 to a sentinel value). - BEQL not-taken: `$v1!=$0`, target NOT reached, delay slot squashed (sentinel value NOT written; the original $a0 preserved). - Cross-check vs BEQ: identical inputs through a BEQ should produce different $a0 on the not-taken case (BEQ's delay slot fires). ## Files changed - `rtl/ee/ee_core_stub.sv` — 2 surgical additions (parameter + dispatcher case statement, ~30 LOC). - `sim/tb/integration/tb_ee_core_syscall_hle.sv` — new focused TB. - `sim/tb/integration/tb_ee_core_elf_runner.sv` — enable `EE_SYSCALL_HLE_ENABLE`; new halt-time register snapshot; `elf_first_unhandled_syscall` verdict shape. - `sim/Makefile` — target + both regression lists. ## Regression In flight; expected **161/161** (was 160, +1 for `tb_ee_core_syscall_hle`). ## Process notes - **Codex's PC+4 correction was right.** My initial closeout draft for Ch272 suggested "RFE-style return" — Codex caught it. RFE is for the Ch199 `_ReturnFromException` path; normal user-mode `syscall` resumes at PC+4, no Status stack pop. Filed this in the memory entry so a future chapter doesn't repeat the same wrong assumption. - **Parameter gating is the right call.** Existing TBs that use `syscall` as a halt-PASS-marker would have broken if their `$v1` happened to be 0x3C/0x3D/0x64. Gating preserved 160 passing tests trivially; only the ELF runner opts in. - **The verdict shape now distinguishes 4 halts**: trap (strict opcode), unmapped MMIO, halt-on-syscall (with $v1/$a0..$a3), halt-on-other (unexpected). The runner is becoming a real triage tool.