RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
Ch294 closeout — wait-loop autopsy; verdict = qbert_waiting_on_memory_flag
Status: Closed. Observation-only chapter per Codex's framing.
Named verdict: qbert_waiting_on_memory_flag — specifically,
qbert is waiting on a syscall-returned status word with bit 17
(0x00020000) set. Our HLE returns 0 unconditionally → bit 17 never
appears → loop runs forever.
No RTL changes. No new TBs. Two artifacts produced: the disassembly + runtime-trace analysis below, and the Ch295 framing proposal at the bottom.
The wait loop, fully decoded
Disassembly: 0x00112400..0x00112480
0x00112400: 0x24020001 addiu $v0, $zero, 1
0x00112404: 0x3c048000 lui $a0, 0x8000
0x00112408: 0x0c044264 jal 0x00110990 ← syscall 0x7A wrapper
0x0011240c: 0xae22c020 sw $v0, -16352($s1) (delay slot)
0x00112410: 0x14400021 bne $v0, $zero, 0x00112498
0x00112414: 0xae020008 sw $v0, 8($s0) (delay slot)
0x00112418: 0x3c100002 lui $s0, 0x2 ; $s0 = 0x00020000 (the mask!)
0x0011241c: 0x00000000 nop
─── LOOP TOP ───────────────────────────────────────────────────
0x00112420: 0x0c044264 jal 0x00110990 ← call wrapper
0x00112424: 0x24040004 addiu $a0, $zero, 4 (delay slot — $a0 = 4)
0x00112428: 0x00501024 and $v0, $v0, $s0 ; $v0 &= 0x00020000
0x0011242c: 0x1040fffc beq $v0, $zero, 0x00112420 ← HOT BRANCH
─── exit-of-loop continues from 0x00112430 ────────────────────
0x00112430: 0x24040002 addiu $a0, $zero, 2
0x00112434: 0x0c044264 jal 0x00110990 ; one more 0x7A call (different $a0)
0x00112438: 0x3c110013 lui $s1, 0x13
0x0011243c: 0x2630c000 addiu $s0, $s1, -16384 ; $s0 = 0x0012C000
...
The called function at 0x00110990
0x00110990: 0x2403007a addiu $v1, $zero, 122 ; $v1 = 0x7A
0x00110994: 0x0000000c syscall ; ← syscall 0x7A
0x00110998: 0x03e00008 jr $ra
0x0011099c: 0x00000000 nop ; (delay slot)
A 4-instruction syscall-0x7A wrapper. Zero memory access. Just sets
$v1 = 0x7A and traps. Whatever arg is in $a0 at call-time gets
threaded through.
A neighboring wrapper at 0x00110980 does the same for syscall
0x71 (= 113) — not exercised by this wait loop but visible in the
disassembly.
Runtime confirmation (from trace files)
After re-running qbert.elf with the current model:
| PC | IFETCH count | Notes |
|---|---|---|
| 0x00112420 (loop-top JAL) | 181,494 | matches syscall_0x7A count=181494 exactly |
| 0x00112424 (addiu delay) | 181,494 | (same) |
| 0x00112428 (AND) | 181,494 | (same) |
| 0x0011242C (BEQ) | 181,493 | one fewer — the iteration that left the loop never reached it... wait, that's the OPPOSITE direction. Actually 181,494 reaches BEQ but loops back, the 181,495th call doesn't fire because we hit the watchdog mid-iteration. Either way: ~181k iterations confirmed. |
| 0x00110990 (wrapper) | 181,494 | matches |
| 0x00110994 (syscall) | 181,494 | matches |
Map-event region breakdown across the full 1.66M-retire run:
| Region | Count | Meaning |
|---|---|---|
| REGION_USEG_SHADOW (0x0B) | 1,677,113 | qbert's own code+data (almost all IFETCH-side) |
| REGION_BIOS (0x00) | 4 | initial trampoline (before ELF entry) |
| REGION_EE_DMAC_PASSIVE (0x0E) | 1 | one access during Ch288's per-channel init |
| REGION_EE_DMAC_CTRL (0x0D) | 1 | one access during Ch287's D_STAT init |
The wait loop performs ZERO MMIO accesses. Not INTC, not D_STAT, not GS CSR, not BIU, not GS_PRIV. The only data traffic in the loop is the syscall return value through $v0.
Verdict, per Codex's 5-verdict enum
qbert_waiting_on_memory_flag is the closest match — though
strictly the polled state is a syscall-returned bitmask, not a
direct memory read. The "memory" being polled is the kernel's
internal state, surfaced via the syscall 0x7A return value.
Specifically: bit 17 (0x00020000) of the value returned by
syscall 0x7A($a0=4).
Other verdicts ruled out:
qbert_waiting_on_dmac_handler— qbert is NOT polling D_STAT or D_PCR. (Although the wait might exit when the registered DMAC handler at 0x00112AB0 fires and sets some kernel state that syscall 0x7A surfaces. That's an indirect dependency.)qbert_waiting_on_vblank— qbert is NOT polling GS CSR or any VBLANK-related MMIO.qbert_waiting_on_thread_scheduler— possible secondary interpretation if syscall 0x7A is a sema/event-flag poll, but there's no thread-switch primitive being called.qbert_wait_loop_unknown— definitely not unknown; we have full decoding.
What is syscall 0x7A really?
Two earlier chapters introduced syscall 0x7A as a stub. At Ch292 we labeled it "likely SyncDCache" because of the proximity to MIPS SYNC. The Ch294 autopsy makes that label questionable. A real SyncDCache wouldn't be invoked 181k+ times in a tight poll, and SyncDCache returns void or a status code with bit 17 having no defined meaning.
The observed shape — (small int $a0) → (bitmask $v0) polled in
a loop — fits better with one of:
GsGetIMR/iGsGetIMR/GsPutIMR— GS Interrupt Mask Register access. Bit 17 in some kernel-layered GS-IMR-related word could correspond to "VSYNC complete" or "GS finish."PollSema/iPollSema— semaphore-state poll. $a0 would be a sema handle; the return is a status word with one of the bits indicating "released."- A multiplexed
GetEvent/iGetEvent— kernel event-channel query. $a0 is a channel selector; return is a bitmask of pending events. - A kernel-internal status word that the SyncDCache call also returns alongside the cache-sync side effect. Bit 17 would be some "subsystem ready" flag.
In all four cases, the structural fact is the same: qbert is waiting for a kernel-managed bit that the HLE doesn't currently update. The exact SDK name is less important than: "what should make bit 17 set?"
Notable: the call at 0x00112408 (BEFORE the wait loop) uses
$a0 = 0x80000000, and qbert expects $v0 = 0 (BNE not-taken
falls into the wait). With our HLE returning 0, qbert correctly
takes the "init OK" path and enters the wait. So this is not a
case where syscall 0x7A's HLE is wrong universally — it's only
wrong for the $a0 = 4 polling call, where qbert wants a
non-zero specific bit.
Ch295 framing — the gate is named, now decide how to open it
Three concrete strategies for Codex to weigh:
Strategy A: Bit-17-flipper HLE patch (cheapest)
After N calls to syscall 0x7A with $a0 = 4, the dispatcher
returns $v0 with bit 17 set (0x00020000). Lets qbert progress.
Risk: bit 17 may not be the only thing qbert checks; downstream
code might check additional bits (different $a0 selectors,
different bit masks). Empirically cheap; one experiment.
Sub-question for Codex: should bit 17 set on every call, or only after N calls? Setting it always might cause downstream "saw the ready bit, now go process the event" code to misbehave (e.g., it might try to read a "completed" event that doesn't exist). Setting after N might let qbert see one "no" then a "yes" — matching realistic interrupt-arrival semantics.
Strategy B: Identify the real SDK semantics (correct path)
Look up PS2 SDK syscall 122 / 0x7A in the canonical kernel
sources (ps2sdk's iop/kernel/include/kernel.h or similar). The
syscall name + arg-shape + return-shape will tell us what kernel
state to model. If it's GsGetIMR, we need a GS IMR register;
if it's PollSema, a sema table; if it's GetEvent, an event-
channel table.
This is more correct but requires more upfront work. The disassembly is rich enough that the SDK name is probably identifiable. Codex likely knows or can look up.
Strategy C: Wire DMAC-completion to bit 17 (interpretive)
The handler registered in Ch290/291 (at 0x00112AB0, for DMAC ch5 SIF0) was never invoked. Hypothesis: the wait loop is qbert asking "has my DMAC-ch5-SIF0 handler run yet?" If we can fire that handler — even just once — bit 17 might set as a side effect. This requires modeling interrupt delivery: COP0 Status → Cause IP → vector to handler.
Strategy C is correct architecturally but is multiple chapters worth of work (interrupt delivery isn't modeled at all yet). Don't pivot to this without confirming the hypothesis first.
Recommendation for Codex
Try Strategy A as a one-experiment chapter: HLE patches
syscall 0x7A($a0=4) to return $v0 = 0x00020000 after, say, the
10th call. If qbert progresses past the wait and the next blocker
is informative, great. If qbert misbranches into garbage, fall
back to Strategy B (look up the SDK semantics) and we'll
know which bit-17 source to model.
The disassembly evidence makes Strategy A safe to try: bit 17 is the only thing the wait loop checks; there's no other "consumer" state that depends on the value being a specific channel-bitmask encoding. Setting bit 17 alone should make the wait exit cleanly.
Files
/tmp/ch294_disasm.py— focused R5900 disassembler used to produce the listings above. Not committed; one-shot diagnostic.- This closeout document.
Pattern review (24 chapters; first investigation chapter since
Ch263..Ch269)
| Era | Chapters | Description |
|---|---|---|
| Opcode-blocker | Ch271..Ch286 | R5900 opcodes, one per chapter |
| MMIO stubs | Ch287..Ch288 | DMAC ctrl + per-channel |
| Syscall HLE | Ch273, 285, 289..291, 293 | $v0=0 narrow extensions |
| Narrow NOP-class | Ch286 (EI), Ch292 (SYNC) | side-effect-free accepts |
| Investigation | Ch294 | wait-loop autopsy, no RTL change |
The Ch263..Ch269 BIOS-treadmill autopsies established the "investigation chapter" pattern: spend a chapter understanding a steady-state loop before deciding what to change. Ch294 is the qbert-side analog and produces the same artifact: a named gate
- a concrete next-step proposal.
Regression
Unchanged at 176/176 — no RTL or TB changes in Ch294.