Files
retroDE_ps2/docs/ch294_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

10 KiB

Ch294 closeout — wait-loop autopsy; verdict = qbert_waiting_on_memory_flag

Status: Closed. Observation-only chapter per Codex's framing. Named verdict: qbert_waiting_on_memory_flag — specifically, qbert is waiting on a syscall-returned status word with bit 17 (0x00020000) set. Our HLE returns 0 unconditionally → bit 17 never appears → loop runs forever.

No RTL changes. No new TBs. Two artifacts produced: the disassembly + runtime-trace analysis below, and the Ch295 framing proposal at the bottom.

The wait loop, fully decoded

Disassembly: 0x00112400..0x00112480

0x00112400: 0x24020001   addiu  $v0, $zero, 1
0x00112404: 0x3c048000   lui    $a0, 0x8000
0x00112408: 0x0c044264   jal    0x00110990         ← syscall 0x7A wrapper
0x0011240c: 0xae22c020   sw     $v0, -16352($s1)   (delay slot)
0x00112410: 0x14400021   bne    $v0, $zero, 0x00112498
0x00112414: 0xae020008   sw     $v0, 8($s0)        (delay slot)
0x00112418: 0x3c100002   lui    $s0, 0x2           ; $s0 = 0x00020000 (the mask!)
0x0011241c: 0x00000000   nop
─── LOOP TOP ───────────────────────────────────────────────────
0x00112420: 0x0c044264   jal    0x00110990         ← call wrapper
0x00112424: 0x24040004   addiu  $a0, $zero, 4      (delay slot — $a0 = 4)
0x00112428: 0x00501024   and    $v0, $v0, $s0      ; $v0 &= 0x00020000
0x0011242c: 0x1040fffc   beq    $v0, $zero, 0x00112420  ← HOT BRANCH
─── exit-of-loop continues from 0x00112430 ────────────────────
0x00112430: 0x24040002   addiu  $a0, $zero, 2
0x00112434: 0x0c044264   jal    0x00110990         ; one more 0x7A call (different $a0)
0x00112438: 0x3c110013   lui    $s1, 0x13
0x0011243c: 0x2630c000   addiu  $s0, $s1, -16384   ; $s0 = 0x0012C000
...

The called function at 0x00110990

0x00110990: 0x2403007a   addiu  $v1, $zero, 122   ; $v1 = 0x7A
0x00110994: 0x0000000c   syscall                  ; ← syscall 0x7A
0x00110998: 0x03e00008   jr     $ra
0x0011099c: 0x00000000   nop                      ; (delay slot)

A 4-instruction syscall-0x7A wrapper. Zero memory access. Just sets $v1 = 0x7A and traps. Whatever arg is in $a0 at call-time gets threaded through.

A neighboring wrapper at 0x00110980 does the same for syscall 0x71 (= 113) — not exercised by this wait loop but visible in the disassembly.

Runtime confirmation (from trace files)

After re-running qbert.elf with the current model:

PC IFETCH count Notes
0x00112420 (loop-top JAL) 181,494 matches syscall_0x7A count=181494 exactly
0x00112424 (addiu delay) 181,494 (same)
0x00112428 (AND) 181,494 (same)
0x0011242C (BEQ) 181,493 one fewer — the iteration that left the loop never reached it... wait, that's the OPPOSITE direction. Actually 181,494 reaches BEQ but loops back, the 181,495th call doesn't fire because we hit the watchdog mid-iteration. Either way: ~181k iterations confirmed.
0x00110990 (wrapper) 181,494 matches
0x00110994 (syscall) 181,494 matches

Map-event region breakdown across the full 1.66M-retire run:

Region Count Meaning
REGION_USEG_SHADOW (0x0B) 1,677,113 qbert's own code+data (almost all IFETCH-side)
REGION_BIOS (0x00) 4 initial trampoline (before ELF entry)
REGION_EE_DMAC_PASSIVE (0x0E) 1 one access during Ch288's per-channel init
REGION_EE_DMAC_CTRL (0x0D) 1 one access during Ch287's D_STAT init

The wait loop performs ZERO MMIO accesses. Not INTC, not D_STAT, not GS CSR, not BIU, not GS_PRIV. The only data traffic in the loop is the syscall return value through $v0.

Verdict, per Codex's 5-verdict enum

qbert_waiting_on_memory_flag is the closest match — though strictly the polled state is a syscall-returned bitmask, not a direct memory read. The "memory" being polled is the kernel's internal state, surfaced via the syscall 0x7A return value.

Specifically: bit 17 (0x00020000) of the value returned by syscall 0x7A($a0=4).

Other verdicts ruled out:

  • qbert_waiting_on_dmac_handler — qbert is NOT polling D_STAT or D_PCR. (Although the wait might exit when the registered DMAC handler at 0x00112AB0 fires and sets some kernel state that syscall 0x7A surfaces. That's an indirect dependency.)
  • qbert_waiting_on_vblank — qbert is NOT polling GS CSR or any VBLANK-related MMIO.
  • qbert_waiting_on_thread_scheduler — possible secondary interpretation if syscall 0x7A is a sema/event-flag poll, but there's no thread-switch primitive being called.
  • qbert_wait_loop_unknown — definitely not unknown; we have full decoding.

What is syscall 0x7A really?

Two earlier chapters introduced syscall 0x7A as a stub. At Ch292 we labeled it "likely SyncDCache" because of the proximity to MIPS SYNC. The Ch294 autopsy makes that label questionable. A real SyncDCache wouldn't be invoked 181k+ times in a tight poll, and SyncDCache returns void or a status code with bit 17 having no defined meaning.

The observed shape — (small int $a0)(bitmask $v0) polled in a loop — fits better with one of:

  1. GsGetIMR / iGsGetIMR / GsPutIMR — GS Interrupt Mask Register access. Bit 17 in some kernel-layered GS-IMR-related word could correspond to "VSYNC complete" or "GS finish."
  2. PollSema / iPollSema — semaphore-state poll. $a0 would be a sema handle; the return is a status word with one of the bits indicating "released."
  3. A multiplexed GetEvent / iGetEvent — kernel event-channel query. $a0 is a channel selector; return is a bitmask of pending events.
  4. A kernel-internal status word that the SyncDCache call also returns alongside the cache-sync side effect. Bit 17 would be some "subsystem ready" flag.

In all four cases, the structural fact is the same: qbert is waiting for a kernel-managed bit that the HLE doesn't currently update. The exact SDK name is less important than: "what should make bit 17 set?"

Notable: the call at 0x00112408 (BEFORE the wait loop) uses $a0 = 0x80000000, and qbert expects $v0 = 0 (BNE not-taken falls into the wait). With our HLE returning 0, qbert correctly takes the "init OK" path and enters the wait. So this is not a case where syscall 0x7A's HLE is wrong universally — it's only wrong for the $a0 = 4 polling call, where qbert wants a non-zero specific bit.

Ch295 framing — the gate is named, now decide how to open it

Three concrete strategies for Codex to weigh:

Strategy A: Bit-17-flipper HLE patch (cheapest)

After N calls to syscall 0x7A with $a0 = 4, the dispatcher returns $v0 with bit 17 set (0x00020000). Lets qbert progress. Risk: bit 17 may not be the only thing qbert checks; downstream code might check additional bits (different $a0 selectors, different bit masks). Empirically cheap; one experiment.

Sub-question for Codex: should bit 17 set on every call, or only after N calls? Setting it always might cause downstream "saw the ready bit, now go process the event" code to misbehave (e.g., it might try to read a "completed" event that doesn't exist). Setting after N might let qbert see one "no" then a "yes" — matching realistic interrupt-arrival semantics.

Strategy B: Identify the real SDK semantics (correct path)

Look up PS2 SDK syscall 122 / 0x7A in the canonical kernel sources (ps2sdk's iop/kernel/include/kernel.h or similar). The syscall name + arg-shape + return-shape will tell us what kernel state to model. If it's GsGetIMR, we need a GS IMR register; if it's PollSema, a sema table; if it's GetEvent, an event- channel table.

This is more correct but requires more upfront work. The disassembly is rich enough that the SDK name is probably identifiable. Codex likely knows or can look up.

Strategy C: Wire DMAC-completion to bit 17 (interpretive)

The handler registered in Ch290/291 (at 0x00112AB0, for DMAC ch5 SIF0) was never invoked. Hypothesis: the wait loop is qbert asking "has my DMAC-ch5-SIF0 handler run yet?" If we can fire that handler — even just once — bit 17 might set as a side effect. This requires modeling interrupt delivery: COP0 Status → Cause IP → vector to handler.

Strategy C is correct architecturally but is multiple chapters worth of work (interrupt delivery isn't modeled at all yet). Don't pivot to this without confirming the hypothesis first.

Recommendation for Codex

Try Strategy A as a one-experiment chapter: HLE patches syscall 0x7A($a0=4) to return $v0 = 0x00020000 after, say, the 10th call. If qbert progresses past the wait and the next blocker is informative, great. If qbert misbranches into garbage, fall back to Strategy B (look up the SDK semantics) and we'll know which bit-17 source to model.

The disassembly evidence makes Strategy A safe to try: bit 17 is the only thing the wait loop checks; there's no other "consumer" state that depends on the value being a specific channel-bitmask encoding. Setting bit 17 alone should make the wait exit cleanly.

Files

  • /tmp/ch294_disasm.py — focused R5900 disassembler used to produce the listings above. Not committed; one-shot diagnostic.
  • This closeout document.

Pattern review (24 chapters; first investigation chapter since

Ch263..Ch269)

Era Chapters Description
Opcode-blocker Ch271..Ch286 R5900 opcodes, one per chapter
MMIO stubs Ch287..Ch288 DMAC ctrl + per-channel
Syscall HLE Ch273, 285, 289..291, 293 $v0=0 narrow extensions
Narrow NOP-class Ch286 (EI), Ch292 (SYNC) side-effect-free accepts
Investigation Ch294 wait-loop autopsy, no RTL change

The Ch263..Ch269 BIOS-treadmill autopsies established the "investigation chapter" pattern: spend a chapter understanding a steady-state loop before deciding what to change. Ch294 is the qbert-side analog and produces the same artifact: a named gate

  • a concrete next-step proposal.

Regression

Unchanged at 176/176 — no RTL or TB changes in Ch294.