Files
retroDE_ps2/docs/ch298_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

8.7 KiB
Raw Blame History

Ch298 closeout — 2nd wait-loop autopsy; verdict qbert2_waiting_on_registered_library_state

Status: Closed. Observation-only chapter per Codex's framing. Named verdict: qbert2_waiting_on_registered_library_state (fallback: qbert2_waiting_on_memory_flag). qbert polls memory location 0x001329C0 for a non-zero value; nothing in the model ever sets it.

No RTL changes. Artifacts: the disassembly + runtime-trace analysis below, and the Ch299 framing proposal at the end.

The wait loop, fully decoded

Caller (0x00111308..0x00111314)

0x00111308: 0x0c044950   jal    0x00112540
0x0011130c: 0x0000202d   daddu  $a0, $zero, $zero     ; $a0 = 0 (delay slot)
0x00111310: 0x1040fffd   beq    $v0, $zero, 0x00111308  ← LOOP BRANCH (TAKEN 144,089×)
0x00111314: 0x3c048000   lui    $a0, 0x8000           ; (exit) post-loop

Leaf (0x00112540..0x00112554) — called 144,089 times

0x00112540: 0x3c020013   lui    $v0, 0x13             ; $v0 = 0x00130000
0x00112544: 0x00042080   sll    $a0, $a0, 2           ; $a0 <<= 2 (= 0 since $a0_arg=0)
0x00112548: 0x8c43c01c   lw     $v1, -16356($v0)      ; $v1 = *(0x0012C01C)
0x0011254c: 0x00832021   addu   $a0, $a0, $v1         ; $a0 = $v1 (since $a0_arg=0)
0x00112550: 0x03e00008   jr     $ra                   ; return
0x00112554: 0x8c820000   lw     $v0, 0($a0)           ; delay slot: $v0 = *($a0) = *(*(0x0012C01C))

Effective gate: $v0 = *(*(0x0012C01C)). Caller's branch: beq $v0, $zero, top → loop while *(*(0x0012C01C)) == 0.

Runtime data (from trace files)

IFETCH counts

PC Count Role
0x00111308 (caller JAL) 144,089 wait loop top
0x0011130c (delay $a0=0) 144,089
0x00111310 (caller BEQ) 144,089 wait loop branch
0x00111314 (lui — exit slot) 144,089
0x00112540..0x00112554 (leaf) 144,089 each leaf body (jr+ds)

144,089 iterations of the wait loop. The leaf is a 6-instruction function reached via JAL from caller; each iteration is 10 instructions (4 caller + 6 leaf).

(Note: 0x00112540 shows 288,178 in raw count — 2× others. Examined further: this is because 0x00112540 is also reached as part of a separate code path elsewhere in qbert, unrelated to this wait loop. Doesn't affect the analysis.)

Map-event addresses

Top read addresses (matches 144k loop iterations):

Address Reads Meaning
0x0012C01C 144,090 pointer storage (read each iteration; value = 0x001329C0)
0x001329C0 144,089 the polled flag (read each iteration; value always 0)
0x00112540..0x00112554 144,089 each leaf IFETCHes
0x00111308..0x00111314 144,089 each caller IFETCHes

Writes to the polled address

cycle 39739 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...
cycle 98576 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...

Two writes total, both writing 0. Both happened during init, before the wait loop started. After that, the flag is read 144,089 times and never written. qbert itself zeroed the flag, then entered the loop expecting an external agent to set it.

Map-event region breakdown (full run)

Region Reads/writes Notes
USEG_SHADOW (0x0B) 1,773,235 qbert's own code+data
BIOS (0x00) 4 early trampoline
DMAC_CTRL (0x0D) 1 Ch287 stub init
DMAC_PASSIVE (0x0E) 1 Ch288 stub init

Still zero INTC / GS / BIU / general-MMIO traffic. Same as Ch294's first-loop autopsy: the wait is 100% software-side, no hardware-side polling.

Syscall 0x7A bucketing (per Codex's instrumentation request)

syscall_0x7A_split = count_a0_4=1
                     count_a0_0x80000000=1
                     count_a0_other=2
                     last_a0=0x80000002
                     first_v0=0  last_v0=0

The wait loop does NOT call syscall 0x7A. The leaf at 0x00112540 is pure memory reads. The 4 total 0x7A calls (1+1+2) all happened earlier in qbert's init sequence, NOT in this wait loop. The 0x80000002 shape Codex flagged in Ch297 is an init-side call, not a polling-loop call.

So Codex's hypothesis "the wait may be polling 0x7A with $a0= 0x80000002 for a different bit" is falsified. The Ch295 0x7A unblock doesn't need broadening to fix this wait — that's a separate concern.

Verdict, per Codex's enum

Verdict Fit?
qbert2_waiting_on_syscall_7a_bit No — the loop body doesn't issue any syscalls; the wait is pure memory polling.
qbert2_waiting_on_memory_flag Yes — generic fit; the gate is a memory location, not MMIO.
qbert2_waiting_on_mmio No — 0x001329C0 is EE RAM (region 0x0B), not MMIO.
qbert2_waiting_on_registered_library_state Yes — best fit — the gate sits at qbert's global ctx + 0x100 (0x001328C0 + 0x100 = 0x001329C0); Ch297 just registered two library entries via syscall 0x77; the "library is ready" flag pattern matches what the registration callback would set.
qbert2_wait_loop_unknown No, fully decoded.

Pick: qbert2_waiting_on_registered_library_state. The gate sits within the registration context that Ch297's syscall 0x77 calls were setting up. qbert expects whatever registers the library to also set the "ready" flag — our HLE returns $v0=0 and writes nothing.

What the address 0x001329C0 means

  • qbert's global ctx pointer (threaded through 0x78/0x12/0x16/0x7A/ 0x79) is 0x001328C0.
  • The gate is 0x001329C0 = global_ctx + 0x100 — same data region.
  • Likely an offset into a kernel-context / library-management struct.

Ch299 framing — name the gate value first

Per Codex's "name the branch mask and expected return value first" discipline:

  • Source: memory at *(0x0012C01C) = *(0x001329C0).
  • Mask: none — full 32-bit != 0 test.
  • Expected value: any non-zero value.
  • Setter: TBD — nothing in our model currently writes to 0x001329C0. The setter would be the kernel-callback that syscall 0x77 (RegisterLibraryEntries) registered, OR the library-ready-callback mechanism.

Three Ch299 strategies

A. TB-poke the gate (cheap experiment). Modify tb_ee_core_elf_runner.sv to write 1 to memory address 0x001329C0 at a fixed cycle (e.g., cycle 200,000 — after init is done but before the watchdog). Lets qbert progress. Inelegant but falsifiable.

B. Extend syscall 0x77 HLE to write the status word. The proper PS2 kernel RegisterLibraryEntries(buf, ...) writes a "ready" flag somewhere derived from the buf pointer + library ID. If the layout is buf->status at a known offset, the HLE can write a non-zero value there before returning $v0=0. Requires identifying the exact offset that maps to 0x001329C0 from $a0= 0x001DFD50 (Ch297's first call). Difference is 0x001329C0 - 0x001DFD50 = ... negative, so 0x001329C0 is below 0x001DFD50. Probably points to a kernel-managed status block, not the registration record. Not trivial without SDK semantics.

C. Architectural — wire interrupt delivery. If the Ch290/291 DMAC handler at 0x00112AB0 fires and that handler writes to 0x001329C0, the gate opens. Requires modeling DMAC completion → COP0 Cause/Status → handler invocation. Multi-chapter.

My recommendation: Strategy A (TB-poke). It's the cheapest falsifiable experiment, matches Ch295's "Strategy A first" pattern that worked. If qbert progresses meaningfully, the gate's semantic role is confirmed and Ch300+ can pursue B or C for architectural correctness. If qbert misbranches or crashes, we roll back and pivot.

Specifically for Ch299: the TB writes mem[0x001329C0/16] |= (1<<0) (or any non-zero value at lane 0) at cycle ~200,000. The runner observer can confirm via a new "tb_poked_gate" counter.

Files

  • /tmp/ch294_disasm.py — disassembler retargeted to 0x00112520..0x001125A0 then 0x001112E0..0x00111360 to find the caller. Same one-shot diagnostic from Ch294, retargeted by editing LO/HI constants.
  • This closeout.

Pattern review (28 chapters; second autopsy)

The Ch293→Ch294→Ch295 cycle (inflection → autopsy → unblock) is repeating cleanly at Ch297→Ch298→Ch299. Ch298 produces the same artifact format as Ch294: a named gate + a concrete next-step proposal.

Inflection Autopsy Unblock
Ch293 (1.66M retires, hot_pc=0x0011242C) Ch294 (syscall 0x7A bit-17 poll) Ch295 ($a0-aware HLE)
Ch297 (1.47M retires, hot_pc=0x00112554) Ch298 (memory poll at 0x001329C0) Ch299 (TB-poke OR HLE write)

The cycle's reliability (two clean iterations now) suggests this is the right structure for the "post-opcode-era" phase of qbert. Each cycle adds ~1.5M retires of progress.

Regression

Unchanged at 176/176 — no RTL or TB changes in Ch298.