Files
retroDE_ps2/docs/ch298_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

215 lines
8.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch298 closeout — 2nd wait-loop autopsy; verdict `qbert2_waiting_on_registered_library_state`
**Status:** Closed. Observation-only chapter per Codex's framing.
**Named verdict:** `qbert2_waiting_on_registered_library_state`
(fallback: `qbert2_waiting_on_memory_flag`). qbert polls memory
location `0x001329C0` for a non-zero value; nothing in the model
ever sets it.
No RTL changes. Artifacts: the disassembly + runtime-trace
analysis below, and the Ch299 framing proposal at the end.
## The wait loop, fully decoded
### Caller (0x00111308..0x00111314)
```
0x00111308: 0x0c044950 jal 0x00112540
0x0011130c: 0x0000202d daddu $a0, $zero, $zero ; $a0 = 0 (delay slot)
0x00111310: 0x1040fffd beq $v0, $zero, 0x00111308 ← LOOP BRANCH (TAKEN 144,089×)
0x00111314: 0x3c048000 lui $a0, 0x8000 ; (exit) post-loop
```
### Leaf (0x00112540..0x00112554) — called 144,089 times
```
0x00112540: 0x3c020013 lui $v0, 0x13 ; $v0 = 0x00130000
0x00112544: 0x00042080 sll $a0, $a0, 2 ; $a0 <<= 2 (= 0 since $a0_arg=0)
0x00112548: 0x8c43c01c lw $v1, -16356($v0) ; $v1 = *(0x0012C01C)
0x0011254c: 0x00832021 addu $a0, $a0, $v1 ; $a0 = $v1 (since $a0_arg=0)
0x00112550: 0x03e00008 jr $ra ; return
0x00112554: 0x8c820000 lw $v0, 0($a0) ; delay slot: $v0 = *($a0) = *(*(0x0012C01C))
```
**Effective gate:** `$v0 = *(*(0x0012C01C))`. Caller's branch:
`beq $v0, $zero, top` → loop while `*(*(0x0012C01C)) == 0`.
## Runtime data (from trace files)
### IFETCH counts
| PC | Count | Role |
|----|-------|------|
| 0x00111308 (caller JAL) | 144,089 | wait loop top |
| 0x0011130c (delay $a0=0) | 144,089 | |
| 0x00111310 (caller BEQ) | 144,089 | wait loop branch |
| 0x00111314 (lui — exit slot) | 144,089 | |
| 0x00112540..0x00112554 (leaf) | 144,089 each | leaf body (jr+ds) |
**144,089 iterations** of the wait loop. The leaf is a 6-instruction
function reached via JAL from caller; each iteration is 10
instructions (4 caller + 6 leaf).
(Note: 0x00112540 shows **288,178** in raw count — 2× others.
Examined further: this is because 0x00112540 is also reached as
part of a *separate* code path elsewhere in qbert, unrelated to
this wait loop. Doesn't affect the analysis.)
### Map-event addresses
Top read addresses (matches 144k loop iterations):
| Address | Reads | Meaning |
|---------|-------|---------|
| 0x0012C01C | 144,090 | pointer storage (read each iteration; value = 0x001329C0) |
| 0x001329C0 | 144,089 | **the polled flag** (read each iteration; value always 0) |
| 0x00112540..0x00112554 | 144,089 each | leaf IFETCHes |
| 0x00111308..0x00111314 | 144,089 each | caller IFETCHes |
### Writes to the polled address
```
cycle 39739 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...
cycle 98576 MEM WRITE 0x00000000001329c0 0x0000000000000000 ...
```
**Two writes total, both writing 0.** Both happened during init,
before the wait loop started. After that, the flag is read 144,089
times and never written. **qbert itself zeroed the flag, then
entered the loop expecting an external agent to set it.**
### Map-event region breakdown (full run)
| Region | Reads/writes | Notes |
|--------|--------------|-------|
| USEG_SHADOW (0x0B) | 1,773,235 | qbert's own code+data |
| BIOS (0x00) | 4 | early trampoline |
| DMAC_CTRL (0x0D) | 1 | Ch287 stub init |
| DMAC_PASSIVE (0x0E) | 1 | Ch288 stub init |
**Still zero INTC / GS / BIU / general-MMIO traffic.** Same as
Ch294's first-loop autopsy: the wait is 100% software-side, no
hardware-side polling.
## Syscall 0x7A bucketing (per Codex's instrumentation request)
```
syscall_0x7A_split = count_a0_4=1
count_a0_0x80000000=1
count_a0_other=2
last_a0=0x80000002
first_v0=0 last_v0=0
```
**The wait loop does NOT call syscall 0x7A.** The leaf at
0x00112540 is pure memory reads. The 4 total 0x7A calls (1+1+2)
all happened earlier in qbert's init sequence, NOT in this wait
loop. The 0x80000002 shape Codex flagged in Ch297 is an
init-side call, not a polling-loop call.
So Codex's hypothesis "the wait may be polling 0x7A with $a0=
0x80000002 for a different bit" is **falsified**. The Ch295 0x7A
unblock doesn't need broadening to fix this wait — that's a
separate concern.
## Verdict, per Codex's enum
| Verdict | Fit? |
|---------|------|
| `qbert2_waiting_on_syscall_7a_bit` | **No** — the loop body doesn't issue any syscalls; the wait is pure memory polling. |
| `qbert2_waiting_on_memory_flag` | **Yes** — generic fit; the gate is a memory location, not MMIO. |
| `qbert2_waiting_on_mmio` | **No** — 0x001329C0 is EE RAM (region 0x0B), not MMIO. |
| `qbert2_waiting_on_registered_library_state` | **Yes — best fit** — the gate sits at qbert's global ctx + 0x100 (0x001328C0 + 0x100 = 0x001329C0); Ch297 just registered two library entries via syscall 0x77; the "library is ready" flag pattern matches what the registration callback would set. |
| `qbert2_wait_loop_unknown` | No, fully decoded. |
**Pick: `qbert2_waiting_on_registered_library_state`.** The gate
sits within the registration context that Ch297's syscall 0x77
calls were setting up. qbert expects whatever registers the
library to also set the "ready" flag — our HLE returns $v0=0 and
writes nothing.
## What the address 0x001329C0 means
- qbert's global ctx pointer (threaded through 0x78/0x12/0x16/0x7A/
0x79) is **0x001328C0**.
- The gate is **0x001329C0 = global_ctx + 0x100** — same data
region.
- Likely an offset into a kernel-context / library-management
struct.
## Ch299 framing — name the gate value first
Per Codex's "name the branch mask and expected return value first"
discipline:
- **Source:** memory at `*(0x0012C01C)` = `*(0x001329C0)`.
- **Mask:** none — full 32-bit `!= 0` test.
- **Expected value:** any non-zero value.
- **Setter:** TBD — nothing in our model currently writes to
0x001329C0. The setter would be the kernel-callback that
syscall 0x77 (RegisterLibraryEntries) registered, OR the
library-ready-callback mechanism.
### Three Ch299 strategies
**A. TB-poke the gate (cheap experiment).** Modify
`tb_ee_core_elf_runner.sv` to write 1 to memory address
0x001329C0 at a fixed cycle (e.g., cycle 200,000 — after init is
done but before the watchdog). Lets qbert progress. Inelegant but
falsifiable.
**B. Extend syscall 0x77 HLE to write the status word.** The
proper PS2 kernel `RegisterLibraryEntries(buf, ...)` writes a
"ready" flag somewhere derived from the buf pointer + library
ID. If the layout is `buf->status` at a known offset, the HLE can
write a non-zero value there before returning $v0=0. Requires
identifying the exact offset that maps to 0x001329C0 from $a0=
0x001DFD50 (Ch297's first call). Difference is 0x001329C0 -
0x001DFD50 = ... negative, so 0x001329C0 is **below** 0x001DFD50.
Probably points to a kernel-managed status block, not the
registration record. Not trivial without SDK semantics.
**C. Architectural — wire interrupt delivery.** If the Ch290/291
DMAC handler at 0x00112AB0 fires and that handler writes to
0x001329C0, the gate opens. Requires modeling DMAC completion →
COP0 Cause/Status → handler invocation. Multi-chapter.
**My recommendation: Strategy A** (TB-poke). It's the cheapest
falsifiable experiment, matches Ch295's "Strategy A first" pattern
that worked. If qbert progresses meaningfully, the gate's
semantic role is confirmed and Ch300+ can pursue B or C for
architectural correctness. If qbert misbranches or crashes, we
roll back and pivot.
Specifically for Ch299: the TB writes `mem[0x001329C0/16] |= (1<<0)`
(or any non-zero value at lane 0) at cycle ~200,000. The runner
observer can confirm via a new "tb_poked_gate" counter.
## Files
- `/tmp/ch294_disasm.py` — disassembler retargeted to
0x00112520..0x001125A0 then 0x001112E0..0x00111360 to find the
caller. Same one-shot diagnostic from Ch294, retargeted by
editing LO/HI constants.
- This closeout.
## Pattern review (28 chapters; second autopsy)
The Ch293→Ch294→Ch295 cycle (inflection → autopsy → unblock) is
repeating cleanly at Ch297→Ch298→Ch299. Ch298 produces the same
artifact format as Ch294: a *named gate* + a *concrete next-step
proposal*.
| Inflection | Autopsy | Unblock |
|------------|---------|---------|
| Ch293 (1.66M retires, hot_pc=0x0011242C) | Ch294 (syscall 0x7A bit-17 poll) | Ch295 ($a0-aware HLE) |
| Ch297 (1.47M retires, hot_pc=0x00112554) | **Ch298 (memory poll at 0x001329C0)** | **Ch299 (TB-poke OR HLE write)** |
The cycle's reliability (two clean iterations now) suggests this
is the right structure for the "post-opcode-era" phase of qbert.
Each cycle adds ~1.5M retires of progress.
## Regression
Unchanged at **176/176** — no RTL or TB changes in Ch298.