ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
228 lines
10 KiB
Markdown
228 lines
10 KiB
Markdown
# Ch294 closeout — wait-loop autopsy; verdict = `qbert_waiting_on_memory_flag`
|
|
|
|
**Status:** Closed. Observation-only chapter per Codex's framing.
|
|
**Named verdict:** `qbert_waiting_on_memory_flag` — specifically,
|
|
qbert is waiting on a **syscall-returned status word** with bit 17
|
|
(0x00020000) set. Our HLE returns 0 unconditionally → bit 17 never
|
|
appears → loop runs forever.
|
|
|
|
No RTL changes. No new TBs. Two artifacts produced: the
|
|
disassembly + runtime-trace analysis below, and the Ch295 framing
|
|
proposal at the bottom.
|
|
|
|
## The wait loop, fully decoded
|
|
|
|
### Disassembly: `0x00112400..0x00112480`
|
|
|
|
```
|
|
0x00112400: 0x24020001 addiu $v0, $zero, 1
|
|
0x00112404: 0x3c048000 lui $a0, 0x8000
|
|
0x00112408: 0x0c044264 jal 0x00110990 ← syscall 0x7A wrapper
|
|
0x0011240c: 0xae22c020 sw $v0, -16352($s1) (delay slot)
|
|
0x00112410: 0x14400021 bne $v0, $zero, 0x00112498
|
|
0x00112414: 0xae020008 sw $v0, 8($s0) (delay slot)
|
|
0x00112418: 0x3c100002 lui $s0, 0x2 ; $s0 = 0x00020000 (the mask!)
|
|
0x0011241c: 0x00000000 nop
|
|
─── LOOP TOP ───────────────────────────────────────────────────
|
|
0x00112420: 0x0c044264 jal 0x00110990 ← call wrapper
|
|
0x00112424: 0x24040004 addiu $a0, $zero, 4 (delay slot — $a0 = 4)
|
|
0x00112428: 0x00501024 and $v0, $v0, $s0 ; $v0 &= 0x00020000
|
|
0x0011242c: 0x1040fffc beq $v0, $zero, 0x00112420 ← HOT BRANCH
|
|
─── exit-of-loop continues from 0x00112430 ────────────────────
|
|
0x00112430: 0x24040002 addiu $a0, $zero, 2
|
|
0x00112434: 0x0c044264 jal 0x00110990 ; one more 0x7A call (different $a0)
|
|
0x00112438: 0x3c110013 lui $s1, 0x13
|
|
0x0011243c: 0x2630c000 addiu $s0, $s1, -16384 ; $s0 = 0x0012C000
|
|
...
|
|
```
|
|
|
|
### The called function at `0x00110990`
|
|
|
|
```
|
|
0x00110990: 0x2403007a addiu $v1, $zero, 122 ; $v1 = 0x7A
|
|
0x00110994: 0x0000000c syscall ; ← syscall 0x7A
|
|
0x00110998: 0x03e00008 jr $ra
|
|
0x0011099c: 0x00000000 nop ; (delay slot)
|
|
```
|
|
|
|
A 4-instruction syscall-0x7A wrapper. Zero memory access. Just sets
|
|
`$v1 = 0x7A` and traps. Whatever arg is in `$a0` at call-time gets
|
|
threaded through.
|
|
|
|
A neighboring wrapper at `0x00110980` does the same for syscall
|
|
0x71 (= 113) — not exercised by this wait loop but visible in the
|
|
disassembly.
|
|
|
|
## Runtime confirmation (from trace files)
|
|
|
|
After re-running qbert.elf with the current model:
|
|
|
|
| PC | IFETCH count | Notes |
|
|
|-----------|--------------|-------|
|
|
| 0x00112420 (loop-top JAL) | 181,494 | matches `syscall_0x7A count=181494` exactly |
|
|
| 0x00112424 (addiu delay) | 181,494 | (same) |
|
|
| 0x00112428 (AND) | 181,494 | (same) |
|
|
| 0x0011242C (BEQ) | 181,493 | one fewer — the iteration that left the loop never reached it... wait, that's the OPPOSITE direction. Actually 181,494 reaches BEQ but loops back, the 181,495th call doesn't fire because we hit the watchdog mid-iteration. Either way: ~181k iterations confirmed. |
|
|
| 0x00110990 (wrapper) | 181,494 | matches |
|
|
| 0x00110994 (syscall) | 181,494 | matches |
|
|
|
|
**Map-event region breakdown across the full 1.66M-retire run:**
|
|
|
|
| Region | Count | Meaning |
|
|
|--------|-------|---------|
|
|
| REGION_USEG_SHADOW (0x0B) | 1,677,113 | qbert's own code+data (almost all IFETCH-side) |
|
|
| REGION_BIOS (0x00) | 4 | initial trampoline (before ELF entry) |
|
|
| REGION_EE_DMAC_PASSIVE (0x0E) | 1 | one access during Ch288's per-channel init |
|
|
| REGION_EE_DMAC_CTRL (0x0D) | 1 | one access during Ch287's D_STAT init |
|
|
|
|
**The wait loop performs ZERO MMIO accesses.** Not INTC, not D_STAT,
|
|
not GS CSR, not BIU, not GS_PRIV. The only data traffic in the
|
|
loop is the syscall return value through $v0.
|
|
|
|
## Verdict, per Codex's 5-verdict enum
|
|
|
|
**`qbert_waiting_on_memory_flag`** is the closest match — though
|
|
strictly the polled state is a *syscall-returned bitmask*, not a
|
|
direct memory read. The "memory" being polled is the kernel's
|
|
internal state, surfaced via the syscall 0x7A return value.
|
|
|
|
Specifically: **bit 17 (0x00020000) of the value returned by
|
|
`syscall 0x7A($a0=4)`.**
|
|
|
|
Other verdicts ruled out:
|
|
- `qbert_waiting_on_dmac_handler` — qbert is NOT polling D_STAT or
|
|
D_PCR. (Although the wait *might* exit when the registered DMAC
|
|
handler at 0x00112AB0 fires and sets some kernel state that
|
|
syscall 0x7A surfaces. That's an indirect dependency.)
|
|
- `qbert_waiting_on_vblank` — qbert is NOT polling GS CSR or any
|
|
VBLANK-related MMIO.
|
|
- `qbert_waiting_on_thread_scheduler` — possible secondary
|
|
interpretation if syscall 0x7A is a sema/event-flag poll, but
|
|
there's no thread-switch primitive being called.
|
|
- `qbert_wait_loop_unknown` — definitely not unknown; we have full
|
|
decoding.
|
|
|
|
## What is syscall 0x7A really?
|
|
|
|
Two earlier chapters introduced syscall 0x7A as a stub. At Ch292
|
|
we labeled it "likely SyncDCache" because of the proximity to MIPS
|
|
SYNC. **The Ch294 autopsy makes that label questionable.** A real
|
|
SyncDCache wouldn't be invoked 181k+ times in a tight poll, and
|
|
SyncDCache returns void or a status code with bit 17 having no
|
|
defined meaning.
|
|
|
|
The observed shape — `(small int $a0)` → `(bitmask $v0)` polled in
|
|
a loop — fits better with one of:
|
|
|
|
1. **`GsGetIMR` / `iGsGetIMR` / `GsPutIMR`** — GS Interrupt Mask
|
|
Register access. Bit 17 in some kernel-layered GS-IMR-related
|
|
word could correspond to "VSYNC complete" or "GS finish."
|
|
2. **`PollSema` / `iPollSema`** — semaphore-state poll. $a0 would
|
|
be a sema handle; the return is a status word with one of the
|
|
bits indicating "released."
|
|
3. **A multiplexed `GetEvent` / `iGetEvent`** — kernel
|
|
event-channel query. $a0 is a channel selector; return is a
|
|
bitmask of pending events.
|
|
4. **A kernel-internal status word** that the SyncDCache call
|
|
*also* returns alongside the cache-sync side effect. Bit 17
|
|
would be some "subsystem ready" flag.
|
|
|
|
In all four cases, the structural fact is the same: **qbert is
|
|
waiting for a kernel-managed bit that the HLE doesn't currently
|
|
update**. The exact SDK name is less important than: "what should
|
|
make bit 17 set?"
|
|
|
|
Notable: the call at `0x00112408` (BEFORE the wait loop) uses
|
|
`$a0 = 0x80000000`, and qbert *expects $v0 = 0* (BNE not-taken
|
|
falls into the wait). With our HLE returning 0, qbert correctly
|
|
takes the "init OK" path and enters the wait. So this is not a
|
|
case where syscall 0x7A's HLE is wrong universally — it's only
|
|
wrong for the `$a0 = 4` polling call, where qbert wants a
|
|
non-zero specific bit.
|
|
|
|
## Ch295 framing — the gate is named, now decide how to open it
|
|
|
|
Three concrete strategies for Codex to weigh:
|
|
|
|
### Strategy A: Bit-17-flipper HLE patch (cheapest)
|
|
|
|
After N calls to syscall 0x7A with `$a0 = 4`, the dispatcher
|
|
returns `$v0` with bit 17 set (0x00020000). Lets qbert progress.
|
|
Risk: bit 17 may not be the *only* thing qbert checks; downstream
|
|
code might check additional bits (different `$a0` selectors,
|
|
different bit masks). Empirically cheap; one experiment.
|
|
|
|
Sub-question for Codex: should bit 17 set on every call, or only
|
|
after N calls? Setting it always might cause downstream "saw the
|
|
ready bit, now go process the event" code to misbehave (e.g., it
|
|
might try to read a "completed" event that doesn't exist).
|
|
Setting after N might let qbert see one "no" then a "yes" —
|
|
matching realistic interrupt-arrival semantics.
|
|
|
|
### Strategy B: Identify the real SDK semantics (correct path)
|
|
|
|
Look up PS2 SDK syscall 122 / 0x7A in the canonical kernel
|
|
sources (ps2sdk's iop/kernel/include/kernel.h or similar). The
|
|
syscall name + arg-shape + return-shape will tell us what kernel
|
|
state to model. If it's `GsGetIMR`, we need a GS IMR register;
|
|
if it's `PollSema`, a sema table; if it's `GetEvent`, an event-
|
|
channel table.
|
|
|
|
This is more correct but requires more upfront work. The
|
|
disassembly is rich enough that the SDK name is probably
|
|
identifiable. Codex likely knows or can look up.
|
|
|
|
### Strategy C: Wire DMAC-completion to bit 17 (interpretive)
|
|
|
|
The handler registered in Ch290/291 (at 0x00112AB0, for DMAC ch5
|
|
SIF0) was never invoked. **Hypothesis:** the wait loop is qbert
|
|
asking "has my DMAC-ch5-SIF0 handler run yet?" If we can fire
|
|
that handler — even just once — bit 17 might set as a side
|
|
effect. This requires modeling interrupt delivery:
|
|
COP0 Status → Cause IP → vector to handler.
|
|
|
|
Strategy C is correct architecturally but is multiple chapters
|
|
worth of work (interrupt delivery isn't modeled at all yet).
|
|
Don't pivot to this without confirming the hypothesis first.
|
|
|
|
## Recommendation for Codex
|
|
|
|
Try **Strategy A** as a one-experiment chapter: HLE patches
|
|
syscall 0x7A($a0=4) to return `$v0 = 0x00020000` after, say, the
|
|
10th call. If qbert progresses past the wait and the next blocker
|
|
is informative, great. If qbert misbranches into garbage, fall
|
|
back to **Strategy B** (look up the SDK semantics) and we'll
|
|
know which bit-17 source to model.
|
|
|
|
The disassembly evidence makes Strategy A safe to try: bit 17 is
|
|
the only thing the wait loop checks; there's no other "consumer"
|
|
state that depends on the value being a specific channel-bitmask
|
|
encoding. Setting bit 17 alone should make the wait exit cleanly.
|
|
|
|
## Files
|
|
|
|
- `/tmp/ch294_disasm.py` — focused R5900 disassembler used to
|
|
produce the listings above. Not committed; one-shot diagnostic.
|
|
- This closeout document.
|
|
|
|
## Pattern review (24 chapters; first investigation chapter since
|
|
Ch263..Ch269)
|
|
|
|
| Era | Chapters | Description |
|
|
|-----|----------|-------------|
|
|
| Opcode-blocker | Ch271..Ch286 | R5900 opcodes, one per chapter |
|
|
| MMIO stubs | Ch287..Ch288 | DMAC ctrl + per-channel |
|
|
| Syscall HLE | Ch273, 285, 289..291, 293 | $v0=0 narrow extensions |
|
|
| Narrow NOP-class | Ch286 (EI), Ch292 (SYNC) | side-effect-free accepts |
|
|
| **Investigation** | **Ch294** | **wait-loop autopsy, no RTL change** |
|
|
|
|
The Ch263..Ch269 BIOS-treadmill autopsies established the
|
|
"investigation chapter" pattern: spend a chapter understanding a
|
|
steady-state loop before deciding what to change. Ch294 is the
|
|
qbert-side analog and produces the same artifact: a *named gate*
|
|
+ a *concrete next-step proposal*.
|
|
|
|
## Regression
|
|
|
|
Unchanged at **176/176** — no RTL or TB changes in Ch294.
|