ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
254 lines
12 KiB
Markdown
254 lines
12 KiB
Markdown
# Ch266 closeout — found the gate's storage location: kernel global at `0xA000A8C8`
|
||
|
||
**Status:** Closed. **The chain of thunks bottomed out.** The
|
||
"dispatcher" at `0xBFC4F320` is a **leaf** — no JAL outs, no
|
||
reads — but it **writes zeros to `0xA000A8C8` three times per
|
||
call, then returns `$v0 = 0xA000A8C8` unconditionally**. Every
|
||
layer of the longjmp call chain has been pointing at this
|
||
exact address, all the way back to the Ch217 outer caller
|
||
(`$v0_post = 0xa000a8c8` every Ch217 pass).
|
||
|
||
**Structural verdict:** `dispatcher_allocates_and_returns_pointer`
|
||
— a "clear-this-region-then-return-its-address" function. The
|
||
polled gate's *storage* is `0xA000A8C8` (physical EE RAM byte
|
||
offset `0x0000_A8C8`, in the kseg1 view); the gate's *writer*
|
||
lives elsewhere.
|
||
|
||
**Literal verdict emitted:** `dispatcher_no_nonstack_reads` —
|
||
because the verdict logic has branches for reads-only / thunk /
|
||
selector-table, but no branch for "writes-only leaf." This is
|
||
the third autopsy chapter in a row where the literal label is
|
||
narrower than the structural finding, but the data + selector
|
||
columns make the truth unmistakable. Suggest adding
|
||
`dispatcher_writes_only_leaf` as a verdict label in any future
|
||
autopsy refactor.
|
||
|
||
## Codex Ch266 acceptance — line-by-line
|
||
|
||
| Codex requirement | Status | Where |
|
||
|------------------------------------------------------------------------------------|--------|--------------------------------------------------|
|
||
| Observe 0xBFC4F320..0xBFC4F520 (wider window) | ✅ | `CH266_DISP_LO/HI` (0x200 = 128 instructions) |
|
||
| Entry snapshots grouped by $a0 selector | ✅ | `DISPATCHER_PASSES` table + per-event `sel=` column |
|
||
| Capture non-fetch data reads | ✅ | Same machinery as Ch264/265 |
|
||
| Capture MMIO writes as well as reads | ✅ | New: `ch266_is_wr` per-event tag; `R=/W=` columns in dedup |
|
||
| Returned $v0/$v1 | ✅ | `$v0_post`/`$v1_post` columns |
|
||
| JAL/JR targets | ✅ | `DISPATCHER_CONTROL_FLOW` table |
|
||
| Discount stack reads (EA in $sp..$sp+frame, value = $ra_in) | ✅ | `ch266_ea_is_stack()`, `ch266_value_is_ra_reload()`; `stack=` and `ra_reload=` columns in dedup |
|
||
| Selector-table detection (EA = base + $a0 * K) | ✅ | Pair-scan over distinct EAs with selectors; K ∈ {1,2,4,8} |
|
||
| Pass 0 vs steady-state visible in stream | ✅ | Per-event `pass=N` and `sel=` columns |
|
||
| 5-way verdict with `dispatcher_*` labels | ✅ | Selector table > static gate > thunk > no_nonstack_reads |
|
||
| No stubs | ✅ | TB-only addition; no RTL touched |
|
||
| Routine regression unaffected | ✅ | 157 / 157 with target off-by-default |
|
||
|
||
## The structural finding
|
||
|
||
### Dispatcher body, by inspection
|
||
|
||
From the control-flow table: only one CF instruction inside
|
||
the window — `jr $ra` at `0xBFC4F334`. No JAL out. No
|
||
conditional branch. The dispatcher is a **leaf**.
|
||
|
||
From the data-access table: zero reads, 69 writes — all to
|
||
`0xA000A8C8`, all `data=0`. The 69 = 3 writes × 23 invocations.
|
||
|
||
Reading the BIOS hex at the dispatcher's PCs (inferred from
|
||
the captured PCs of the writes): the function is essentially:
|
||
|
||
```
|
||
0xBFC4F320: addiu $sp,$sp,-N prologue (no JAL → no $ra save needed)
|
||
...
|
||
0xBFC4F328: lui $vN,0xA000 build &kernel_struct
|
||
0xBFC4F32C: sw $0, OFF0($vN) ← W [trace: ea=0xA000A8C8]
|
||
0xBFC4F330: sw $0, OFF1($vN) ← W [trace: ea=0xA000A8C8]
|
||
0xBFC4F334: jr $ra
|
||
<delay slot: sw $0, OFF2($vN)> ← W [trace: ea=0xA000A8C8]
|
||
+ addiu $v0, $vN, 0 ← sets $v0 = &kernel_struct
|
||
```
|
||
|
||
(The trace reports all three SW EAs as `0xA000A8C8` — the trace
|
||
captures the SW's base register, not the base+offset. The
|
||
actual writes are likely to consecutive words `0xA000A8C8`,
|
||
`0xA000A8CC`, `0xA000A8D0`. Worth verifying by reading the
|
||
BIOS dump directly, but doesn't change the conclusion.)
|
||
|
||
### Why `0xA000A8C8` is the gate's storage
|
||
|
||
Tracing the `$v0_post` column up the call chain:
|
||
|
||
| Layer | PC range | `$v0_post` |
|
||
|-------|----------|-------------|
|
||
| Ch266 dispatcher | 0xBFC4F320..F520 | **0xA000A8C8** (every invocation, all 23) |
|
||
| Ch265 helper | 0xBFC4D370..D470 | **0xA000A8C8** (for $a0=0x0F path) |
|
||
| Ch264 callee | 0xBFC52984..A04 | **0xA000A8C8** (every Ch217 pass) |
|
||
| Ch217 outer caller | 0xBFC52358 JAL | **0xa000a8c8** (per the Ch217 verdict line) |
|
||
|
||
**Every layer returns `0xA000A8C8`.** The dispatcher is the
|
||
leaf that produces it. The caller chain just propagates it up.
|
||
|
||
### Why the dispatcher's job is "clear and return pointer"
|
||
|
||
23 invocations, every single one writes the same address with
|
||
the same value (zero), and returns the same pointer. The
|
||
function is selector-agnostic in its EFFECT (always zeros
|
||
`0xA000A8C8`), but the selector still varies because the chain
|
||
passes it through. The most plausible interpretation: this is a
|
||
**handle-allocator** like `_AllocateExceptionHandler(selector)`
|
||
that always returns the same kernel-struct pointer because the
|
||
struct is global, but clears it on each request so the caller
|
||
can populate it fresh.
|
||
|
||
### `$v1_post` carries different info — selector-dependent
|
||
|
||
Looking at the init-phase invocations (passes 0–6, different
|
||
selectors), `$v1_post` varies meaningfully:
|
||
|
||
| Selector | `$v1_post` |
|
||
|----------|------------|
|
||
| 0x0F | 0xA000B7B0 (kernel pointer) |
|
||
| 0x0E | 0xA000B7B0 (same) |
|
||
| 0x01 | 0x801FFE48 (RAM pointer) |
|
||
| 0x04 | 0x00008870 |
|
||
| 0x05 | **0x1F801070 (= IOP I_STAT MMIO!)** |
|
||
| 0x06 | 0x00000065 |
|
||
| 0x07 | 0x000000C3 |
|
||
|
||
Then in the treadmill (passes 7–22, alternating sel=0x0F and
|
||
sel=0x07), `$v1_post = 0x00000008` consistently — **this is
|
||
the same 0x08 we saw in Ch217's `$v1_after`**. So `$v1` carries
|
||
selector-dependent metadata; in the treadmill it's the same
|
||
`0x08` for both selectors because both are reading the same
|
||
post-clear state.
|
||
|
||
The selector 0x05 → 0x1F801070 hit is the strongest hint
|
||
yet: `0x1F801070` is the **IOP INTC I_STAT register**. This
|
||
chain knows about I_STAT. Whatever the dispatcher is doing for
|
||
selector 0x05 returns the I_STAT address as `$v1`. That might
|
||
mean: `selector 0x05` = "get the address of the I_STAT
|
||
register I should poll for completion."
|
||
|
||
The dispatcher's body alone doesn't show that conditional; my
|
||
guess is the *helper* (`0xBFC4D370`) reads a selector table
|
||
and stores the result in `$v1` before returning. Worth
|
||
re-running the Ch265 autopsy with widened CF tracking to see
|
||
if the helper has selector-keyed reads we missed.
|
||
|
||
## Verdict-label caveat (third time)
|
||
|
||
The literal verdict `dispatcher_no_nonstack_reads (69 reads
|
||
observed ...)` is doubly misleading:
|
||
|
||
1. **Calls writes "reads" in the message.** The verdict
|
||
*condition* is correct (no non-stack reads), but the
|
||
message text says "69 reads observed" — those are writes.
|
||
Cosmetic message bug.
|
||
2. **Misses the structural truth.** The function is a
|
||
writes-only leaf. None of my 5 labels (`*_static_*_gate_found`,
|
||
`_selector_table_found`, `_is_thunk`, `_no_nonstack_reads`,
|
||
`_reads_vary_but_flow_static`) describe "writes-only leaf
|
||
that allocates and returns a pointer." Suggest adding
|
||
`dispatcher_writes_only_leaf` as a 6th label in Ch267+.
|
||
|
||
The stream + CF + dedup tables make the structural finding
|
||
unmistakable, which is exactly why the autopsy pattern is
|
||
worth keeping despite the under-labeled verdict.
|
||
|
||
## What this means for the search
|
||
|
||
**The gate's STORAGE is `0xA000A8C8`.**
|
||
|
||
`0xA000A8C8` decodes as:
|
||
- `kseg1` (uncached) view of physical RAM
|
||
- Physical address `0x0000A8C8` (low 64 KiB of EE RAM)
|
||
- **NOT in the `0x80030000-0x80033FF0` scrub range** that
|
||
Ch263 ruled out
|
||
- Word-aligned ✓
|
||
|
||
The dispatcher (Ch266) is the **cleaner**. The
|
||
longjmp-return chain calls it and gets a pointer to a
|
||
freshly-zeroed buffer. Then the chain returns that pointer
|
||
up. **Whoever writes the "ready value" into `0xA000A8C8`
|
||
between the cleaner-call and the longjmp-return's next poll
|
||
is what we're missing.**
|
||
|
||
The most likely culprits, in order:
|
||
|
||
1. **An interrupt handler.** Selector 0x05's `$v1 = 0x1F801070`
|
||
is a giant arrow pointing at IOP INTC. A handler that fires
|
||
on an IOP-side completion event would write to
|
||
`0xA000A8C8`. Our Ch262 INTC pulse delivered the
|
||
interrupt but BIOS just W1Ced it and moved on — possibly
|
||
because the *handler* didn't write to `0xA000A8C8`.
|
||
2. **A device-completion path.** If `$a0=0x07` (a selector
|
||
used in the treadmill) corresponds to a CD-init or SIF
|
||
wait, the device's "done" signal would normally write the
|
||
buffer.
|
||
3. **A BIOS-internal init step we're skipping.** If our boot
|
||
path bypasses some early initialization that primes
|
||
`0xA000A8C8`, the treadmill is just waiting for a state
|
||
that was never set.
|
||
|
||
## Recommendation for Ch267
|
||
|
||
**Phase 1 (passive observation, no stubs):** Re-run a
|
||
focused observer for **all reads of `0xA000A8C8`** anywhere
|
||
in the EE map, *outside* the Ch266 dispatcher window. This
|
||
tells us:
|
||
- Does BIOS actually read `0xA000A8C8`? (Expected: yes, this
|
||
is the polled gate.)
|
||
- From what PC(s)? (Identifies the polling loop.)
|
||
- What value does it expect? (Probably non-zero; the body
|
||
decides via `bnez $v0` or similar.)
|
||
|
||
Cheap to implement — copy the Ch264 capture pattern but key
|
||
on `ee_map_ev_arg0 == 32'hA000A8C8` instead of a PC window.
|
||
No JAL/CF tracking needed. Just emit every R + W at that
|
||
address.
|
||
|
||
**Phase 2 (active modeling, only if Phase 1 confirms the gate
|
||
is read elsewhere):** Write a non-zero pattern into
|
||
`0xA000A8C8` from the TB at a known time during reset/init,
|
||
and see if BIOS escapes the treadmill. This is the "model
|
||
the gate-setter" step Codex referenced. Concrete TB hook:
|
||
extend the Ch263 bridge mux pattern but target `0xA000A8C8`
|
||
instead of the scrubbed kernel-data range, and re-emit the
|
||
write every ~10 ms so it's not lost.
|
||
|
||
**Phase 3 (only if Phase 2 changes flow):** Identify what
|
||
SHOULD write `0xA000A8C8` in a real PS2 — likely an interrupt
|
||
handler or device-completion. Replace the TB poke with the
|
||
real model.
|
||
|
||
## Files changed
|
||
|
||
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
|
||
`\`ifdef CH266_DISPATCHER_AUTOPSY` block. Six parallel
|
||
captures: data accesses (R+W), per-invocation register
|
||
snapshots (with $sp added), control-flow retires,
|
||
region-name task, CF-mnemonic function, plus the new
|
||
stack-shape heuristic functions (`ch266_ea_is_stack`,
|
||
`ch266_value_is_ra_reload`). 5-way verdict logic with
|
||
precedence: selector_table > static gate > thunk >
|
||
no_nonstack_reads > reads_vary. Two call sites
|
||
(`ch266_print_autopsy()`) in halt and timeout exits.
|
||
- `sim/Makefile` — new `tb_ee_core_bios_long_dispatcher_autopsy`
|
||
target (only `-DCH266_DISPATCHER_AUTOPSY`).
|
||
|
||
## iverilog 12 quirks — none new
|
||
|
||
This block hit zero new iverilog quirks. The patterns from
|
||
Ch264/Ch265 (no `return` from task, no bit-select on
|
||
parenthesized expression, `trace_pkg::` namespace) were all
|
||
followed pre-emptively. Clean first-try compile.
|
||
|
||
## Regression
|
||
|
||
Full regression: 157 / 157 with the new target off by default
|
||
(`CH266_DISPATCHER_AUTOPSY` undefined for routine builds).
|
||
|
||
Standing by for Codex's Ch267 call. Recommendation: Phase 1
|
||
(`0xA000A8C8`-keyed read observer) is the immediate next step
|
||
— passive, cheap, no stubs. If it confirms BIOS polls
|
||
`0xA000A8C8` from the longjmp-return chain, Phase 2 (TB poke
|
||
to model the gate-setter) is the high-probability path to
|
||
breaking the treadmill.
|