Files
retroDE_ps2/docs/ch294_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

228 lines
10 KiB
Markdown

# Ch294 closeout — wait-loop autopsy; verdict = `qbert_waiting_on_memory_flag`
**Status:** Closed. Observation-only chapter per Codex's framing.
**Named verdict:** `qbert_waiting_on_memory_flag` — specifically,
qbert is waiting on a **syscall-returned status word** with bit 17
(0x00020000) set. Our HLE returns 0 unconditionally → bit 17 never
appears → loop runs forever.
No RTL changes. No new TBs. Two artifacts produced: the
disassembly + runtime-trace analysis below, and the Ch295 framing
proposal at the bottom.
## The wait loop, fully decoded
### Disassembly: `0x00112400..0x00112480`
```
0x00112400: 0x24020001 addiu $v0, $zero, 1
0x00112404: 0x3c048000 lui $a0, 0x8000
0x00112408: 0x0c044264 jal 0x00110990 ← syscall 0x7A wrapper
0x0011240c: 0xae22c020 sw $v0, -16352($s1) (delay slot)
0x00112410: 0x14400021 bne $v0, $zero, 0x00112498
0x00112414: 0xae020008 sw $v0, 8($s0) (delay slot)
0x00112418: 0x3c100002 lui $s0, 0x2 ; $s0 = 0x00020000 (the mask!)
0x0011241c: 0x00000000 nop
─── LOOP TOP ───────────────────────────────────────────────────
0x00112420: 0x0c044264 jal 0x00110990 ← call wrapper
0x00112424: 0x24040004 addiu $a0, $zero, 4 (delay slot — $a0 = 4)
0x00112428: 0x00501024 and $v0, $v0, $s0 ; $v0 &= 0x00020000
0x0011242c: 0x1040fffc beq $v0, $zero, 0x00112420 ← HOT BRANCH
─── exit-of-loop continues from 0x00112430 ────────────────────
0x00112430: 0x24040002 addiu $a0, $zero, 2
0x00112434: 0x0c044264 jal 0x00110990 ; one more 0x7A call (different $a0)
0x00112438: 0x3c110013 lui $s1, 0x13
0x0011243c: 0x2630c000 addiu $s0, $s1, -16384 ; $s0 = 0x0012C000
...
```
### The called function at `0x00110990`
```
0x00110990: 0x2403007a addiu $v1, $zero, 122 ; $v1 = 0x7A
0x00110994: 0x0000000c syscall ; ← syscall 0x7A
0x00110998: 0x03e00008 jr $ra
0x0011099c: 0x00000000 nop ; (delay slot)
```
A 4-instruction syscall-0x7A wrapper. Zero memory access. Just sets
`$v1 = 0x7A` and traps. Whatever arg is in `$a0` at call-time gets
threaded through.
A neighboring wrapper at `0x00110980` does the same for syscall
0x71 (= 113) — not exercised by this wait loop but visible in the
disassembly.
## Runtime confirmation (from trace files)
After re-running qbert.elf with the current model:
| PC | IFETCH count | Notes |
|-----------|--------------|-------|
| 0x00112420 (loop-top JAL) | 181,494 | matches `syscall_0x7A count=181494` exactly |
| 0x00112424 (addiu delay) | 181,494 | (same) |
| 0x00112428 (AND) | 181,494 | (same) |
| 0x0011242C (BEQ) | 181,493 | one fewer — the iteration that left the loop never reached it... wait, that's the OPPOSITE direction. Actually 181,494 reaches BEQ but loops back, the 181,495th call doesn't fire because we hit the watchdog mid-iteration. Either way: ~181k iterations confirmed. |
| 0x00110990 (wrapper) | 181,494 | matches |
| 0x00110994 (syscall) | 181,494 | matches |
**Map-event region breakdown across the full 1.66M-retire run:**
| Region | Count | Meaning |
|--------|-------|---------|
| REGION_USEG_SHADOW (0x0B) | 1,677,113 | qbert's own code+data (almost all IFETCH-side) |
| REGION_BIOS (0x00) | 4 | initial trampoline (before ELF entry) |
| REGION_EE_DMAC_PASSIVE (0x0E) | 1 | one access during Ch288's per-channel init |
| REGION_EE_DMAC_CTRL (0x0D) | 1 | one access during Ch287's D_STAT init |
**The wait loop performs ZERO MMIO accesses.** Not INTC, not D_STAT,
not GS CSR, not BIU, not GS_PRIV. The only data traffic in the
loop is the syscall return value through $v0.
## Verdict, per Codex's 5-verdict enum
**`qbert_waiting_on_memory_flag`** is the closest match — though
strictly the polled state is a *syscall-returned bitmask*, not a
direct memory read. The "memory" being polled is the kernel's
internal state, surfaced via the syscall 0x7A return value.
Specifically: **bit 17 (0x00020000) of the value returned by
`syscall 0x7A($a0=4)`.**
Other verdicts ruled out:
- `qbert_waiting_on_dmac_handler` — qbert is NOT polling D_STAT or
D_PCR. (Although the wait *might* exit when the registered DMAC
handler at 0x00112AB0 fires and sets some kernel state that
syscall 0x7A surfaces. That's an indirect dependency.)
- `qbert_waiting_on_vblank` — qbert is NOT polling GS CSR or any
VBLANK-related MMIO.
- `qbert_waiting_on_thread_scheduler` — possible secondary
interpretation if syscall 0x7A is a sema/event-flag poll, but
there's no thread-switch primitive being called.
- `qbert_wait_loop_unknown` — definitely not unknown; we have full
decoding.
## What is syscall 0x7A really?
Two earlier chapters introduced syscall 0x7A as a stub. At Ch292
we labeled it "likely SyncDCache" because of the proximity to MIPS
SYNC. **The Ch294 autopsy makes that label questionable.** A real
SyncDCache wouldn't be invoked 181k+ times in a tight poll, and
SyncDCache returns void or a status code with bit 17 having no
defined meaning.
The observed shape — `(small int $a0)``(bitmask $v0)` polled in
a loop — fits better with one of:
1. **`GsGetIMR` / `iGsGetIMR` / `GsPutIMR`** — GS Interrupt Mask
Register access. Bit 17 in some kernel-layered GS-IMR-related
word could correspond to "VSYNC complete" or "GS finish."
2. **`PollSema` / `iPollSema`** — semaphore-state poll. $a0 would
be a sema handle; the return is a status word with one of the
bits indicating "released."
3. **A multiplexed `GetEvent` / `iGetEvent`** — kernel
event-channel query. $a0 is a channel selector; return is a
bitmask of pending events.
4. **A kernel-internal status word** that the SyncDCache call
*also* returns alongside the cache-sync side effect. Bit 17
would be some "subsystem ready" flag.
In all four cases, the structural fact is the same: **qbert is
waiting for a kernel-managed bit that the HLE doesn't currently
update**. The exact SDK name is less important than: "what should
make bit 17 set?"
Notable: the call at `0x00112408` (BEFORE the wait loop) uses
`$a0 = 0x80000000`, and qbert *expects $v0 = 0* (BNE not-taken
falls into the wait). With our HLE returning 0, qbert correctly
takes the "init OK" path and enters the wait. So this is not a
case where syscall 0x7A's HLE is wrong universally — it's only
wrong for the `$a0 = 4` polling call, where qbert wants a
non-zero specific bit.
## Ch295 framing — the gate is named, now decide how to open it
Three concrete strategies for Codex to weigh:
### Strategy A: Bit-17-flipper HLE patch (cheapest)
After N calls to syscall 0x7A with `$a0 = 4`, the dispatcher
returns `$v0` with bit 17 set (0x00020000). Lets qbert progress.
Risk: bit 17 may not be the *only* thing qbert checks; downstream
code might check additional bits (different `$a0` selectors,
different bit masks). Empirically cheap; one experiment.
Sub-question for Codex: should bit 17 set on every call, or only
after N calls? Setting it always might cause downstream "saw the
ready bit, now go process the event" code to misbehave (e.g., it
might try to read a "completed" event that doesn't exist).
Setting after N might let qbert see one "no" then a "yes" —
matching realistic interrupt-arrival semantics.
### Strategy B: Identify the real SDK semantics (correct path)
Look up PS2 SDK syscall 122 / 0x7A in the canonical kernel
sources (ps2sdk's iop/kernel/include/kernel.h or similar). The
syscall name + arg-shape + return-shape will tell us what kernel
state to model. If it's `GsGetIMR`, we need a GS IMR register;
if it's `PollSema`, a sema table; if it's `GetEvent`, an event-
channel table.
This is more correct but requires more upfront work. The
disassembly is rich enough that the SDK name is probably
identifiable. Codex likely knows or can look up.
### Strategy C: Wire DMAC-completion to bit 17 (interpretive)
The handler registered in Ch290/291 (at 0x00112AB0, for DMAC ch5
SIF0) was never invoked. **Hypothesis:** the wait loop is qbert
asking "has my DMAC-ch5-SIF0 handler run yet?" If we can fire
that handler — even just once — bit 17 might set as a side
effect. This requires modeling interrupt delivery:
COP0 Status → Cause IP → vector to handler.
Strategy C is correct architecturally but is multiple chapters
worth of work (interrupt delivery isn't modeled at all yet).
Don't pivot to this without confirming the hypothesis first.
## Recommendation for Codex
Try **Strategy A** as a one-experiment chapter: HLE patches
syscall 0x7A($a0=4) to return `$v0 = 0x00020000` after, say, the
10th call. If qbert progresses past the wait and the next blocker
is informative, great. If qbert misbranches into garbage, fall
back to **Strategy B** (look up the SDK semantics) and we'll
know which bit-17 source to model.
The disassembly evidence makes Strategy A safe to try: bit 17 is
the only thing the wait loop checks; there's no other "consumer"
state that depends on the value being a specific channel-bitmask
encoding. Setting bit 17 alone should make the wait exit cleanly.
## Files
- `/tmp/ch294_disasm.py` — focused R5900 disassembler used to
produce the listings above. Not committed; one-shot diagnostic.
- This closeout document.
## Pattern review (24 chapters; first investigation chapter since
Ch263..Ch269)
| Era | Chapters | Description |
|-----|----------|-------------|
| Opcode-blocker | Ch271..Ch286 | R5900 opcodes, one per chapter |
| MMIO stubs | Ch287..Ch288 | DMAC ctrl + per-channel |
| Syscall HLE | Ch273, 285, 289..291, 293 | $v0=0 narrow extensions |
| Narrow NOP-class | Ch286 (EI), Ch292 (SYNC) | side-effect-free accepts |
| **Investigation** | **Ch294** | **wait-loop autopsy, no RTL change** |
The Ch263..Ch269 BIOS-treadmill autopsies established the
"investigation chapter" pattern: spend a chapter understanding a
steady-state loop before deciding what to change. Ch294 is the
qbert-side analog and produces the same artifact: a *named gate*
+ a *concrete next-step proposal*.
## Regression
Unchanged at **176/176** — no RTL or TB changes in Ch294.