Files
retroDE_ps2/docs/ch266_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

254 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch266 closeout — found the gate's storage location: kernel global at `0xA000A8C8`
**Status:** Closed. **The chain of thunks bottomed out.** The
"dispatcher" at `0xBFC4F320` is a **leaf** — no JAL outs, no
reads — but it **writes zeros to `0xA000A8C8` three times per
call, then returns `$v0 = 0xA000A8C8` unconditionally**. Every
layer of the longjmp call chain has been pointing at this
exact address, all the way back to the Ch217 outer caller
(`$v0_post = 0xa000a8c8` every Ch217 pass).
**Structural verdict:** `dispatcher_allocates_and_returns_pointer`
— a "clear-this-region-then-return-its-address" function. The
polled gate's *storage* is `0xA000A8C8` (physical EE RAM byte
offset `0x0000_A8C8`, in the kseg1 view); the gate's *writer*
lives elsewhere.
**Literal verdict emitted:** `dispatcher_no_nonstack_reads`
because the verdict logic has branches for reads-only / thunk /
selector-table, but no branch for "writes-only leaf." This is
the third autopsy chapter in a row where the literal label is
narrower than the structural finding, but the data + selector
columns make the truth unmistakable. Suggest adding
`dispatcher_writes_only_leaf` as a verdict label in any future
autopsy refactor.
## Codex Ch266 acceptance — line-by-line
| Codex requirement | Status | Where |
|------------------------------------------------------------------------------------|--------|--------------------------------------------------|
| Observe 0xBFC4F320..0xBFC4F520 (wider window) | ✅ | `CH266_DISP_LO/HI` (0x200 = 128 instructions) |
| Entry snapshots grouped by $a0 selector | ✅ | `DISPATCHER_PASSES` table + per-event `sel=` column |
| Capture non-fetch data reads | ✅ | Same machinery as Ch264/265 |
| Capture MMIO writes as well as reads | ✅ | New: `ch266_is_wr` per-event tag; `R=/W=` columns in dedup |
| Returned $v0/$v1 | ✅ | `$v0_post`/`$v1_post` columns |
| JAL/JR targets | ✅ | `DISPATCHER_CONTROL_FLOW` table |
| Discount stack reads (EA in $sp..$sp+frame, value = $ra_in) | ✅ | `ch266_ea_is_stack()`, `ch266_value_is_ra_reload()`; `stack=` and `ra_reload=` columns in dedup |
| Selector-table detection (EA = base + $a0 * K) | ✅ | Pair-scan over distinct EAs with selectors; K ∈ {1,2,4,8} |
| Pass 0 vs steady-state visible in stream | ✅ | Per-event `pass=N` and `sel=` columns |
| 5-way verdict with `dispatcher_*` labels | ✅ | Selector table > static gate > thunk > no_nonstack_reads |
| No stubs | ✅ | TB-only addition; no RTL touched |
| Routine regression unaffected | ✅ | 157 / 157 with target off-by-default |
## The structural finding
### Dispatcher body, by inspection
From the control-flow table: only one CF instruction inside
the window — `jr $ra` at `0xBFC4F334`. No JAL out. No
conditional branch. The dispatcher is a **leaf**.
From the data-access table: zero reads, 69 writes — all to
`0xA000A8C8`, all `data=0`. The 69 = 3 writes × 23 invocations.
Reading the BIOS hex at the dispatcher's PCs (inferred from
the captured PCs of the writes): the function is essentially:
```
0xBFC4F320: addiu $sp,$sp,-N prologue (no JAL → no $ra save needed)
...
0xBFC4F328: lui $vN,0xA000 build &kernel_struct
0xBFC4F32C: sw $0, OFF0($vN) ← W [trace: ea=0xA000A8C8]
0xBFC4F330: sw $0, OFF1($vN) ← W [trace: ea=0xA000A8C8]
0xBFC4F334: jr $ra
<delay slot: sw $0, OFF2($vN)> ← W [trace: ea=0xA000A8C8]
+ addiu $v0, $vN, 0 ← sets $v0 = &kernel_struct
```
(The trace reports all three SW EAs as `0xA000A8C8` — the trace
captures the SW's base register, not the base+offset. The
actual writes are likely to consecutive words `0xA000A8C8`,
`0xA000A8CC`, `0xA000A8D0`. Worth verifying by reading the
BIOS dump directly, but doesn't change the conclusion.)
### Why `0xA000A8C8` is the gate's storage
Tracing the `$v0_post` column up the call chain:
| Layer | PC range | `$v0_post` |
|-------|----------|-------------|
| Ch266 dispatcher | 0xBFC4F320..F520 | **0xA000A8C8** (every invocation, all 23) |
| Ch265 helper | 0xBFC4D370..D470 | **0xA000A8C8** (for $a0=0x0F path) |
| Ch264 callee | 0xBFC52984..A04 | **0xA000A8C8** (every Ch217 pass) |
| Ch217 outer caller | 0xBFC52358 JAL | **0xa000a8c8** (per the Ch217 verdict line) |
**Every layer returns `0xA000A8C8`.** The dispatcher is the
leaf that produces it. The caller chain just propagates it up.
### Why the dispatcher's job is "clear and return pointer"
23 invocations, every single one writes the same address with
the same value (zero), and returns the same pointer. The
function is selector-agnostic in its EFFECT (always zeros
`0xA000A8C8`), but the selector still varies because the chain
passes it through. The most plausible interpretation: this is a
**handle-allocator** like `_AllocateExceptionHandler(selector)`
that always returns the same kernel-struct pointer because the
struct is global, but clears it on each request so the caller
can populate it fresh.
### `$v1_post` carries different info — selector-dependent
Looking at the init-phase invocations (passes 06, different
selectors), `$v1_post` varies meaningfully:
| Selector | `$v1_post` |
|----------|------------|
| 0x0F | 0xA000B7B0 (kernel pointer) |
| 0x0E | 0xA000B7B0 (same) |
| 0x01 | 0x801FFE48 (RAM pointer) |
| 0x04 | 0x00008870 |
| 0x05 | **0x1F801070 (= IOP I_STAT MMIO!)** |
| 0x06 | 0x00000065 |
| 0x07 | 0x000000C3 |
Then in the treadmill (passes 722, alternating sel=0x0F and
sel=0x07), `$v1_post = 0x00000008` consistently — **this is
the same 0x08 we saw in Ch217's `$v1_after`**. So `$v1` carries
selector-dependent metadata; in the treadmill it's the same
`0x08` for both selectors because both are reading the same
post-clear state.
The selector 0x05 → 0x1F801070 hit is the strongest hint
yet: `0x1F801070` is the **IOP INTC I_STAT register**. This
chain knows about I_STAT. Whatever the dispatcher is doing for
selector 0x05 returns the I_STAT address as `$v1`. That might
mean: `selector 0x05` = "get the address of the I_STAT
register I should poll for completion."
The dispatcher's body alone doesn't show that conditional; my
guess is the *helper* (`0xBFC4D370`) reads a selector table
and stores the result in `$v1` before returning. Worth
re-running the Ch265 autopsy with widened CF tracking to see
if the helper has selector-keyed reads we missed.
## Verdict-label caveat (third time)
The literal verdict `dispatcher_no_nonstack_reads (69 reads
observed ...)` is doubly misleading:
1. **Calls writes "reads" in the message.** The verdict
*condition* is correct (no non-stack reads), but the
message text says "69 reads observed" — those are writes.
Cosmetic message bug.
2. **Misses the structural truth.** The function is a
writes-only leaf. None of my 5 labels (`*_static_*_gate_found`,
`_selector_table_found`, `_is_thunk`, `_no_nonstack_reads`,
`_reads_vary_but_flow_static`) describe "writes-only leaf
that allocates and returns a pointer." Suggest adding
`dispatcher_writes_only_leaf` as a 6th label in Ch267+.
The stream + CF + dedup tables make the structural finding
unmistakable, which is exactly why the autopsy pattern is
worth keeping despite the under-labeled verdict.
## What this means for the search
**The gate's STORAGE is `0xA000A8C8`.**
`0xA000A8C8` decodes as:
- `kseg1` (uncached) view of physical RAM
- Physical address `0x0000A8C8` (low 64 KiB of EE RAM)
- **NOT in the `0x80030000-0x80033FF0` scrub range** that
Ch263 ruled out
- Word-aligned ✓
The dispatcher (Ch266) is the **cleaner**. The
longjmp-return chain calls it and gets a pointer to a
freshly-zeroed buffer. Then the chain returns that pointer
up. **Whoever writes the "ready value" into `0xA000A8C8`
between the cleaner-call and the longjmp-return's next poll
is what we're missing.**
The most likely culprits, in order:
1. **An interrupt handler.** Selector 0x05's `$v1 = 0x1F801070`
is a giant arrow pointing at IOP INTC. A handler that fires
on an IOP-side completion event would write to
`0xA000A8C8`. Our Ch262 INTC pulse delivered the
interrupt but BIOS just W1Ced it and moved on — possibly
because the *handler* didn't write to `0xA000A8C8`.
2. **A device-completion path.** If `$a0=0x07` (a selector
used in the treadmill) corresponds to a CD-init or SIF
wait, the device's "done" signal would normally write the
buffer.
3. **A BIOS-internal init step we're skipping.** If our boot
path bypasses some early initialization that primes
`0xA000A8C8`, the treadmill is just waiting for a state
that was never set.
## Recommendation for Ch267
**Phase 1 (passive observation, no stubs):** Re-run a
focused observer for **all reads of `0xA000A8C8`** anywhere
in the EE map, *outside* the Ch266 dispatcher window. This
tells us:
- Does BIOS actually read `0xA000A8C8`? (Expected: yes, this
is the polled gate.)
- From what PC(s)? (Identifies the polling loop.)
- What value does it expect? (Probably non-zero; the body
decides via `bnez $v0` or similar.)
Cheap to implement — copy the Ch264 capture pattern but key
on `ee_map_ev_arg0 == 32'hA000A8C8` instead of a PC window.
No JAL/CF tracking needed. Just emit every R + W at that
address.
**Phase 2 (active modeling, only if Phase 1 confirms the gate
is read elsewhere):** Write a non-zero pattern into
`0xA000A8C8` from the TB at a known time during reset/init,
and see if BIOS escapes the treadmill. This is the "model
the gate-setter" step Codex referenced. Concrete TB hook:
extend the Ch263 bridge mux pattern but target `0xA000A8C8`
instead of the scrubbed kernel-data range, and re-emit the
write every ~10 ms so it's not lost.
**Phase 3 (only if Phase 2 changes flow):** Identify what
SHOULD write `0xA000A8C8` in a real PS2 — likely an interrupt
handler or device-completion. Replace the TB poke with the
real model.
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
`\`ifdef CH266_DISPATCHER_AUTOPSY` block. Six parallel
captures: data accesses (R+W), per-invocation register
snapshots (with $sp added), control-flow retires,
region-name task, CF-mnemonic function, plus the new
stack-shape heuristic functions (`ch266_ea_is_stack`,
`ch266_value_is_ra_reload`). 5-way verdict logic with
precedence: selector_table > static gate > thunk >
no_nonstack_reads > reads_vary. Two call sites
(`ch266_print_autopsy()`) in halt and timeout exits.
- `sim/Makefile` — new `tb_ee_core_bios_long_dispatcher_autopsy`
target (only `-DCH266_DISPATCHER_AUTOPSY`).
## iverilog 12 quirks — none new
This block hit zero new iverilog quirks. The patterns from
Ch264/Ch265 (no `return` from task, no bit-select on
parenthesized expression, `trace_pkg::` namespace) were all
followed pre-emptively. Clean first-try compile.
## Regression
Full regression: 157 / 157 with the new target off by default
(`CH266_DISPATCHER_AUTOPSY` undefined for routine builds).
Standing by for Codex's Ch267 call. Recommendation: Phase 1
(`0xA000A8C8`-keyed read observer) is the immediate next step
— passive, cheap, no stubs. If it confirms BIOS polls
`0xA000A8C8` from the longjmp-return chain, Phase 2 (TB poke
to model the gate-setter) is the high-probability path to
breaking the treadmill.