RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8.7 KiB
Ch273 closeout — minimal EE syscall HLE; qbert clears its kernel-call prolog, next blocker is BEQL
Status: Closed. Codex's spec implemented exactly: minimal
HLE dispatcher for three crt0 syscalls (EndOfHeap,
InitMainThread, FlushCache), gated behind a parameter so
existing TBs are unaffected. Verdict from re-running
qbert.elf: elf_first_unsupported_opcode (pc=0x001000C0 instr=0x50600004) — BEQL (branch on equal likely), MIPS-II.
That frames Ch274.
Numbers across the opcode/syscall chapters
| Chapter | Blocker | qbert retire_count | Verdict |
|---|---|---|---|
| Ch270 (init) | SQ at 0x00100024 | 12 | first_unsupported_opcode |
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 | first_unsupported_opcode |
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 | elf_halted |
| Post-Ch273 (SYSCALL HLE) | BEQL at 0x001000C0 | 26,980 | elf_first_unsupported_opcode |
20 more retires this chapter: all 3 syscalls dispatched, the
prolog used the returns to set up $sp and a small initializer-
table walker, and the trap fires at the FIRST instruction the
crt0 emits that we don't decode — BEQL.
What landed
RTL — 2 surgical additions in ee_core_stub.sv
-
Parameter:
EE_SYSCALL_HLE_ENABLE(default1'b0) +SYSCALL_HEAP_END(default32'h001E_0000). Default-off so every existing TB whosesyscallis a "halt-PASS-marker" (addi/slti/etc.) keeps its semantics. -
Dispatcher: new
else if (EE_SYSCALL_HLE_ENABLE)branch after the Ch199 special case.case (regfile[3])on$v1:$v1name $v0returnedresume 0x3C EndOfHeap SYSCALL_HEAP_ENDPC + 4 0x3D InitMainThread 0 PC + 4 0x64 FlushCache 0 PC + 4 other (unhandled) (none) halt pc <= pc + 4(per Codex's correction — this is normal user-code SYSCALL resume, NOT RFE; RFE is Ch199's path).
Focused TB — tb_ee_core_syscall_hle
Four cases:
syscallwith$v1=0x3C→ verify$v0 = 0x001E0000syscallwith$v1=0x3D→ verify$v0 = 0syscallwith$v1=0x64→ verify$v0 = 0syscallwith$v1=0x7777→ verify HALT (PASS marker)
Independent verification: captures $v0 at the cycle AFTER each
known syscall retires AND runs a BNE $v0, expected, FAIL chain.
Both must agree. Final PC + $v1=0x7777 post-halt confirms we
landed on the unhandled-syscall path correctly.
Result: retired=17 halt=1 trap=0 errors=0 PASS.
Runner update — tb_ee_core_elf_runner.sv
- Wires
EE_SYSCALL_HLE_ENABLE=1on the ee_core_stub. - Halt-time SUMMARY now includes the live register snapshot:
saw_halt = 1 at_pc=0x... $v1=0x... $a0=0x... $a1=0x... $a2=0x... $a3=0x... - New verdict shape
elf_first_unhandled_syscallwhen the halt is on a0x0000000Cinstruction with unknown$v1. (For this qbert run, the dispatcher handled all 3 and the trap was a separate opcode issue — but the verdict shape is ready for whenever the next unknown SYSCALL surfaces.)
Makefile
tb_ee_core_syscall_hletarget.- Added to both regression lists.
- Regression: 160 → 161.
Codex Ch273 acceptance — line-by-line
| Requirement | Status |
|---|---|
| Minimal HLE handler in ee_core_stub for normal user-mode SYSCALL | ✅ |
| $v1=0x3C EndOfHeap → conservative top-of-RAM, PC+=4 | ✅ |
| $v1=0x3D InitMainThread → success ($v0=0), no scheduler mutation, PC+=4 | ✅ |
| $v1=0x64 FlushCache → no-op success, PC+=4 | ✅ |
| Not RFE — PC = syscall PC + 4 | ✅ |
| Unhandled $v1 still halts; TB can read $v1/$a0-$a3 for verdict | ✅ |
| Focused TB: 3 syscalls in sequence + 1 unknown-fallback | ✅ |
| Regression unchanged for default-off | ✅ |
| Re-run qbert, report next blocker | ✅ |
qbert disassembly around the new blocker
0x001000A0: lui $v0, 0x0013 ; $v0 = 0x00130000
0x001000A4: addiu $v0, $v0, 0xC800 ; $v0 = 0x0012C800
0x001000A8: lw $v1, 0($v0) ; $v1 = mem[0x0012C800]
0x001000AC: bne $v1, $0, +7*4 ; skip ahead if non-zero
0x001000B0: nop ; delay
0x001000B4: lui $v0, 0x0013
0x001000B8: addiu $v0, $v0, 0xC944 ; $v0 = 0x0012C944
0x001000BC: lw $v1, 0($v0) ; $v1 = mem[0x0012C944] (= 0 per halt $v1=0)
0x001000C0: beql $v1, $0, +4*4 ; <-- TRAPS HERE
0x001000C4: addiu $a0, $0, 0 ; delay slot (squashed if BEQL not taken)
0x001000C8: addiu $v0, $v1, 4
0x001000CC: lw $a0, 0($v0)
0x001000D0: addiu $a1, $v0, 4
0x001000D4: jal <constructor table walker>
This is the C++ static-constructor walker (or a similar
initialization table). The BEQL checks whether the table head
pointer is null — and branch-likely semantics are
load-bearing: the delay slot at 0x001000C4 clobbers $a0
to 0 only if the branch is taken. If we naïvely decode BEQL as
plain BEQ, the delay slot would execute on the not-taken path
too, silently corrupting $a0.
Recommendation for Codex's Ch274
Implement BEQL with proper "squash on not-taken" semantics.
MIPS-II "branch likely" family: BEQL (0x14), BNEL (0x15), BLEZL
(0x16), BGTZL (0x17), and REGIMM BLTZL/BGEZL/BLTZALL/BGEZALL.
Compilers (especially older PS2 SDK gcc with -fmoveloop-invariants
or default for-loops) emit these as the canonical loop branch.
Three Ch274 framings, in order of scope:
- BEQL only. Smallest change. Decode
is_beql, sharebranch_takenlogic with BEQ (rs==rt), but unlike BEQ, when not taken: PC += 8 (skip both the branch and its delay slot), no delay-slot execute. Addsis_branch_likelydistinction in the retire/PC-advance logic. - BEQL + BNEL (the two most common). BNEL is the inverse
condition (rs!=rt); same likely semantics. Both surface as
0x14(BEQL) and0x15(BNEL) opcodes. - Full branch-likely family. BEQL/BNEL/BLEZL/BGTZL + REGIMM variants. Bigger surface; usually you only need 1–2 of these per chapter until qbert/a later ELF surfaces another.
My read: (1) — BEQL only. Same one-question-one-chapter pattern. The next blocker after BEQL might or might not be BNEL; let the runner pick.
The implementation hook: existing ee_core_stub has
branch_pending + instr_in_delay_slot + a branch_taken
combinational signal. For BEQL we need to gate "set
branch_pending + queue delay-slot execution" on branch_taken,
and on not-taken just pc <= pc + 8 directly (skip the delay
slot). Probably a 5–8 line change.
Focused TB: 3 cases mirroring Ch272 shape —
- BEQL taken:
$v1==$0, target reached, delay slot executed (writes $a0 to a sentinel value). - BEQL not-taken:
$v1!=$0, target NOT reached, delay slot squashed (sentinel value NOT written; the original $a0 preserved). - Cross-check vs BEQ: identical inputs through a BEQ should produce different $a0 on the not-taken case (BEQ's delay slot fires).
Files changed
rtl/ee/ee_core_stub.sv— 2 surgical additions (parameter + dispatcher case statement, ~30 LOC).sim/tb/integration/tb_ee_core_syscall_hle.sv— new focused TB.sim/tb/integration/tb_ee_core_elf_runner.sv— enableEE_SYSCALL_HLE_ENABLE; new halt-time register snapshot;elf_first_unhandled_syscallverdict shape.sim/Makefile— target + both regression lists.
Regression
In flight; expected 161/161 (was 160, +1 for
tb_ee_core_syscall_hle).
Process notes
- Codex's PC+4 correction was right. My initial closeout
draft for Ch272 suggested "RFE-style return" — Codex caught
it. RFE is for the Ch199
_ReturnFromExceptionpath; normal user-modesyscallresumes at PC+4, no Status stack pop. Filed this in the memory entry so a future chapter doesn't repeat the same wrong assumption. - Parameter gating is the right call. Existing TBs that use
syscallas a halt-PASS-marker would have broken if their$v1happened to be 0x3C/0x3D/0x64. Gating preserved 160 passing tests trivially; only the ELF runner opts in. - The verdict shape now distinguishes 4 halts: trap (strict opcode), unmapped MMIO, halt-on-syscall (with $v1/$a0..$a3), halt-on-other (unexpected). The runner is becoming a real triage tool.