ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
215 lines
8.7 KiB
Markdown
215 lines
8.7 KiB
Markdown
# Ch298 closeout — 2nd wait-loop autopsy; verdict `qbert2_waiting_on_registered_library_state`
|
||
|
||
**Status:** Closed. Observation-only chapter per Codex's framing.
|
||
**Named verdict:** `qbert2_waiting_on_registered_library_state`
|
||
(fallback: `qbert2_waiting_on_memory_flag`). qbert polls memory
|
||
location `0x001329C0` for a non-zero value; nothing in the model
|
||
ever sets it.
|
||
|
||
No RTL changes. Artifacts: the disassembly + runtime-trace
|
||
analysis below, and the Ch299 framing proposal at the end.
|
||
|
||
## The wait loop, fully decoded
|
||
|
||
### Caller (0x00111308..0x00111314)
|
||
|
||
```
|
||
0x00111308: 0x0c044950 jal 0x00112540
|
||
0x0011130c: 0x0000202d daddu $a0, $zero, $zero ; $a0 = 0 (delay slot)
|
||
0x00111310: 0x1040fffd beq $v0, $zero, 0x00111308 ← LOOP BRANCH (TAKEN 144,089×)
|
||
0x00111314: 0x3c048000 lui $a0, 0x8000 ; (exit) post-loop
|
||
```
|
||
|
||
### Leaf (0x00112540..0x00112554) — called 144,089 times
|
||
|
||
```
|
||
0x00112540: 0x3c020013 lui $v0, 0x13 ; $v0 = 0x00130000
|
||
0x00112544: 0x00042080 sll $a0, $a0, 2 ; $a0 <<= 2 (= 0 since $a0_arg=0)
|
||
0x00112548: 0x8c43c01c lw $v1, -16356($v0) ; $v1 = *(0x0012C01C)
|
||
0x0011254c: 0x00832021 addu $a0, $a0, $v1 ; $a0 = $v1 (since $a0_arg=0)
|
||
0x00112550: 0x03e00008 jr $ra ; return
|
||
0x00112554: 0x8c820000 lw $v0, 0($a0) ; delay slot: $v0 = *($a0) = *(*(0x0012C01C))
|
||
```
|
||
|
||
**Effective gate:** `$v0 = *(*(0x0012C01C))`. Caller's branch:
|
||
`beq $v0, $zero, top` → loop while `*(*(0x0012C01C)) == 0`.
|
||
|
||
## Runtime data (from trace files)
|
||
|
||
### IFETCH counts
|
||
|
||
| PC | Count | Role |
|
||
|----|-------|------|
|
||
| 0x00111308 (caller JAL) | 144,089 | wait loop top |
|
||
| 0x0011130c (delay $a0=0) | 144,089 | |
|
||
| 0x00111310 (caller BEQ) | 144,089 | wait loop branch |
|
||
| 0x00111314 (lui — exit slot) | 144,089 | |
|
||
| 0x00112540..0x00112554 (leaf) | 144,089 each | leaf body (jr+ds) |
|
||
|
||
**144,089 iterations** of the wait loop. The leaf is a 6-instruction
|
||
function reached via JAL from caller; each iteration is 10
|
||
instructions (4 caller + 6 leaf).
|
||
|
||
(Note: 0x00112540 shows **288,178** in raw count — 2× others.
|
||
Examined further: this is because 0x00112540 is also reached as
|
||
part of a *separate* code path elsewhere in qbert, unrelated to
|
||
this wait loop. Doesn't affect the analysis.)
|
||
|
||
### Map-event addresses
|
||
|
||
Top read addresses (matches 144k loop iterations):
|
||
|
||
| Address | Reads | Meaning |
|
||
|---------|-------|---------|
|
||
| 0x0012C01C | 144,090 | pointer storage (read each iteration; value = 0x001329C0) |
|
||
| 0x001329C0 | 144,089 | **the polled flag** (read each iteration; value always 0) |
|
||
| 0x00112540..0x00112554 | 144,089 each | leaf IFETCHes |
|
||
| 0x00111308..0x00111314 | 144,089 each | caller IFETCHes |
|
||
|
||
### Writes to the polled address
|
||
|
||
```
|
||
cycle 39739 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...
|
||
cycle 98576 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...
|
||
```
|
||
|
||
**Two writes total, both writing 0.** Both happened during init,
|
||
before the wait loop started. After that, the flag is read 144,089
|
||
times and never written. **qbert itself zeroed the flag, then
|
||
entered the loop expecting an external agent to set it.**
|
||
|
||
### Map-event region breakdown (full run)
|
||
|
||
| Region | Reads/writes | Notes |
|
||
|--------|--------------|-------|
|
||
| USEG_SHADOW (0x0B) | 1,773,235 | qbert's own code+data |
|
||
| BIOS (0x00) | 4 | early trampoline |
|
||
| DMAC_CTRL (0x0D) | 1 | Ch287 stub init |
|
||
| DMAC_PASSIVE (0x0E) | 1 | Ch288 stub init |
|
||
|
||
**Still zero INTC / GS / BIU / general-MMIO traffic.** Same as
|
||
Ch294's first-loop autopsy: the wait is 100% software-side, no
|
||
hardware-side polling.
|
||
|
||
## Syscall 0x7A bucketing (per Codex's instrumentation request)
|
||
|
||
```
|
||
syscall_0x7A_split = count_a0_4=1
|
||
count_a0_0x80000000=1
|
||
count_a0_other=2
|
||
last_a0=0x80000002
|
||
first_v0=0 last_v0=0
|
||
```
|
||
|
||
**The wait loop does NOT call syscall 0x7A.** The leaf at
|
||
0x00112540 is pure memory reads. The 4 total 0x7A calls (1+1+2)
|
||
all happened earlier in qbert's init sequence, NOT in this wait
|
||
loop. The 0x80000002 shape Codex flagged in Ch297 is an
|
||
init-side call, not a polling-loop call.
|
||
|
||
So Codex's hypothesis "the wait may be polling 0x7A with $a0=
|
||
0x80000002 for a different bit" is **falsified**. The Ch295 0x7A
|
||
unblock doesn't need broadening to fix this wait — that's a
|
||
separate concern.
|
||
|
||
## Verdict, per Codex's enum
|
||
|
||
| Verdict | Fit? |
|
||
|---------|------|
|
||
| `qbert2_waiting_on_syscall_7a_bit` | **No** — the loop body doesn't issue any syscalls; the wait is pure memory polling. |
|
||
| `qbert2_waiting_on_memory_flag` | **Yes** — generic fit; the gate is a memory location, not MMIO. |
|
||
| `qbert2_waiting_on_mmio` | **No** — 0x001329C0 is EE RAM (region 0x0B), not MMIO. |
|
||
| `qbert2_waiting_on_registered_library_state` | **Yes — best fit** — the gate sits at qbert's global ctx + 0x100 (0x001328C0 + 0x100 = 0x001329C0); Ch297 just registered two library entries via syscall 0x77; the "library is ready" flag pattern matches what the registration callback would set. |
|
||
| `qbert2_wait_loop_unknown` | No, fully decoded. |
|
||
|
||
**Pick: `qbert2_waiting_on_registered_library_state`.** The gate
|
||
sits within the registration context that Ch297's syscall 0x77
|
||
calls were setting up. qbert expects whatever registers the
|
||
library to also set the "ready" flag — our HLE returns $v0=0 and
|
||
writes nothing.
|
||
|
||
## What the address 0x001329C0 means
|
||
|
||
- qbert's global ctx pointer (threaded through 0x78/0x12/0x16/0x7A/
|
||
0x79) is **0x001328C0**.
|
||
- The gate is **0x001329C0 = global_ctx + 0x100** — same data
|
||
region.
|
||
- Likely an offset into a kernel-context / library-management
|
||
struct.
|
||
|
||
## Ch299 framing — name the gate value first
|
||
|
||
Per Codex's "name the branch mask and expected return value first"
|
||
discipline:
|
||
|
||
- **Source:** memory at `*(0x0012C01C)` = `*(0x001329C0)`.
|
||
- **Mask:** none — full 32-bit `!= 0` test.
|
||
- **Expected value:** any non-zero value.
|
||
- **Setter:** TBD — nothing in our model currently writes to
|
||
0x001329C0. The setter would be the kernel-callback that
|
||
syscall 0x77 (RegisterLibraryEntries) registered, OR the
|
||
library-ready-callback mechanism.
|
||
|
||
### Three Ch299 strategies
|
||
|
||
**A. TB-poke the gate (cheap experiment).** Modify
|
||
`tb_ee_core_elf_runner.sv` to write 1 to memory address
|
||
0x001329C0 at a fixed cycle (e.g., cycle 200,000 — after init is
|
||
done but before the watchdog). Lets qbert progress. Inelegant but
|
||
falsifiable.
|
||
|
||
**B. Extend syscall 0x77 HLE to write the status word.** The
|
||
proper PS2 kernel `RegisterLibraryEntries(buf, ...)` writes a
|
||
"ready" flag somewhere derived from the buf pointer + library
|
||
ID. If the layout is `buf->status` at a known offset, the HLE can
|
||
write a non-zero value there before returning $v0=0. Requires
|
||
identifying the exact offset that maps to 0x001329C0 from $a0=
|
||
0x001DFD50 (Ch297's first call). Difference is 0x001329C0 -
|
||
0x001DFD50 = ... negative, so 0x001329C0 is **below** 0x001DFD50.
|
||
Probably points to a kernel-managed status block, not the
|
||
registration record. Not trivial without SDK semantics.
|
||
|
||
**C. Architectural — wire interrupt delivery.** If the Ch290/291
|
||
DMAC handler at 0x00112AB0 fires and that handler writes to
|
||
0x001329C0, the gate opens. Requires modeling DMAC completion →
|
||
COP0 Cause/Status → handler invocation. Multi-chapter.
|
||
|
||
**My recommendation: Strategy A** (TB-poke). It's the cheapest
|
||
falsifiable experiment, matches Ch295's "Strategy A first" pattern
|
||
that worked. If qbert progresses meaningfully, the gate's
|
||
semantic role is confirmed and Ch300+ can pursue B or C for
|
||
architectural correctness. If qbert misbranches or crashes, we
|
||
roll back and pivot.
|
||
|
||
Specifically for Ch299: the TB writes `mem[0x001329C0/16] |= (1<<0)`
|
||
(or any non-zero value at lane 0) at cycle ~200,000. The runner
|
||
observer can confirm via a new "tb_poked_gate" counter.
|
||
|
||
## Files
|
||
|
||
- `/tmp/ch294_disasm.py` — disassembler retargeted to
|
||
0x00112520..0x001125A0 then 0x001112E0..0x00111360 to find the
|
||
caller. Same one-shot diagnostic from Ch294, retargeted by
|
||
editing LO/HI constants.
|
||
- This closeout.
|
||
|
||
## Pattern review (28 chapters; second autopsy)
|
||
|
||
The Ch293→Ch294→Ch295 cycle (inflection → autopsy → unblock) is
|
||
repeating cleanly at Ch297→Ch298→Ch299. Ch298 produces the same
|
||
artifact format as Ch294: a *named gate* + a *concrete next-step
|
||
proposal*.
|
||
|
||
| Inflection | Autopsy | Unblock |
|
||
|------------|---------|---------|
|
||
| Ch293 (1.66M retires, hot_pc=0x0011242C) | Ch294 (syscall 0x7A bit-17 poll) | Ch295 ($a0-aware HLE) |
|
||
| Ch297 (1.47M retires, hot_pc=0x00112554) | **Ch298 (memory poll at 0x001329C0)** | **Ch299 (TB-poke OR HLE write)** |
|
||
|
||
The cycle's reliability (two clean iterations now) suggests this
|
||
is the right structure for the "post-opcode-era" phase of qbert.
|
||
Each cycle adds ~1.5M retires of progress.
|
||
|
||
## Regression
|
||
|
||
Unchanged at **176/176** — no RTL or TB changes in Ch298.
|