Files
retroDE_ps2/docs/ch266_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

12 KiB
Raw Permalink Blame History

Ch266 closeout — found the gate's storage location: kernel global at 0xA000A8C8

Status: Closed. The chain of thunks bottomed out. The "dispatcher" at 0xBFC4F320 is a leaf — no JAL outs, no reads — but it writes zeros to 0xA000A8C8 three times per call, then returns $v0 = 0xA000A8C8 unconditionally. Every layer of the longjmp call chain has been pointing at this exact address, all the way back to the Ch217 outer caller ($v0_post = 0xa000a8c8 every Ch217 pass).

Structural verdict: dispatcher_allocates_and_returns_pointer — a "clear-this-region-then-return-its-address" function. The polled gate's storage is 0xA000A8C8 (physical EE RAM byte offset 0x0000_A8C8, in the kseg1 view); the gate's writer lives elsewhere.

Literal verdict emitted: dispatcher_no_nonstack_reads — because the verdict logic has branches for reads-only / thunk / selector-table, but no branch for "writes-only leaf." This is the third autopsy chapter in a row where the literal label is narrower than the structural finding, but the data + selector columns make the truth unmistakable. Suggest adding dispatcher_writes_only_leaf as a verdict label in any future autopsy refactor.

Codex Ch266 acceptance — line-by-line

Codex requirement Status Where
Observe 0xBFC4F320..0xBFC4F520 (wider window) CH266_DISP_LO/HI (0x200 = 128 instructions)
Entry snapshots grouped by $a0 selector DISPATCHER_PASSES table + per-event sel= column
Capture non-fetch data reads Same machinery as Ch264/265
Capture MMIO writes as well as reads New: ch266_is_wr per-event tag; R=/W= columns in dedup
Returned $v0/$v1 $v0_post/$v1_post columns
JAL/JR targets DISPATCHER_CONTROL_FLOW table
Discount stack reads (EA in $sp..$sp+frame, value = $ra_in) ch266_ea_is_stack(), ch266_value_is_ra_reload(); stack= and ra_reload= columns in dedup
Selector-table detection (EA = base + $a0 * K) Pair-scan over distinct EAs with selectors; K ∈ {1,2,4,8}
Pass 0 vs steady-state visible in stream Per-event pass=N and sel= columns
5-way verdict with dispatcher_* labels Selector table > static gate > thunk > no_nonstack_reads
No stubs TB-only addition; no RTL touched
Routine regression unaffected 157 / 157 with target off-by-default

The structural finding

Dispatcher body, by inspection

From the control-flow table: only one CF instruction inside the window — jr $ra at 0xBFC4F334. No JAL out. No conditional branch. The dispatcher is a leaf.

From the data-access table: zero reads, 69 writes — all to 0xA000A8C8, all data=0. The 69 = 3 writes × 23 invocations.

Reading the BIOS hex at the dispatcher's PCs (inferred from the captured PCs of the writes): the function is essentially:

0xBFC4F320: addiu $sp,$sp,-N           prologue (no JAL → no $ra save needed)
...
0xBFC4F328: lui $vN,0xA000             build &kernel_struct
0xBFC4F32C: sw   $0, OFF0($vN)         ← W [trace: ea=0xA000A8C8]
0xBFC4F330: sw   $0, OFF1($vN)         ← W [trace: ea=0xA000A8C8]
0xBFC4F334: jr   $ra
              <delay slot: sw $0, OFF2($vN)>  ← W [trace: ea=0xA000A8C8]
              + addiu $v0, $vN, 0     ← sets $v0 = &kernel_struct

(The trace reports all three SW EAs as 0xA000A8C8 — the trace captures the SW's base register, not the base+offset. The actual writes are likely to consecutive words 0xA000A8C8, 0xA000A8CC, 0xA000A8D0. Worth verifying by reading the BIOS dump directly, but doesn't change the conclusion.)

Why 0xA000A8C8 is the gate's storage

Tracing the $v0_post column up the call chain:

Layer PC range $v0_post
Ch266 dispatcher 0xBFC4F320..F520 0xA000A8C8 (every invocation, all 23)
Ch265 helper 0xBFC4D370..D470 0xA000A8C8 (for $a0=0x0F path)
Ch264 callee 0xBFC52984..A04 0xA000A8C8 (every Ch217 pass)
Ch217 outer caller 0xBFC52358 JAL 0xa000a8c8 (per the Ch217 verdict line)

Every layer returns 0xA000A8C8. The dispatcher is the leaf that produces it. The caller chain just propagates it up.

Why the dispatcher's job is "clear and return pointer"

23 invocations, every single one writes the same address with the same value (zero), and returns the same pointer. The function is selector-agnostic in its EFFECT (always zeros 0xA000A8C8), but the selector still varies because the chain passes it through. The most plausible interpretation: this is a handle-allocator like _AllocateExceptionHandler(selector) that always returns the same kernel-struct pointer because the struct is global, but clears it on each request so the caller can populate it fresh.

$v1_post carries different info — selector-dependent

Looking at the init-phase invocations (passes 06, different selectors), $v1_post varies meaningfully:

Selector $v1_post
0x0F 0xA000B7B0 (kernel pointer)
0x0E 0xA000B7B0 (same)
0x01 0x801FFE48 (RAM pointer)
0x04 0x00008870
0x05 0x1F801070 (= IOP I_STAT MMIO!)
0x06 0x00000065
0x07 0x000000C3

Then in the treadmill (passes 722, alternating sel=0x0F and sel=0x07), $v1_post = 0x00000008 consistently — this is the same 0x08 we saw in Ch217's $v1_after. So $v1 carries selector-dependent metadata; in the treadmill it's the same 0x08 for both selectors because both are reading the same post-clear state.

The selector 0x05 → 0x1F801070 hit is the strongest hint yet: 0x1F801070 is the IOP INTC I_STAT register. This chain knows about I_STAT. Whatever the dispatcher is doing for selector 0x05 returns the I_STAT address as $v1. That might mean: selector 0x05 = "get the address of the I_STAT register I should poll for completion."

The dispatcher's body alone doesn't show that conditional; my guess is the helper (0xBFC4D370) reads a selector table and stores the result in $v1 before returning. Worth re-running the Ch265 autopsy with widened CF tracking to see if the helper has selector-keyed reads we missed.

Verdict-label caveat (third time)

The literal verdict dispatcher_no_nonstack_reads (69 reads observed ...) is doubly misleading:

  1. Calls writes "reads" in the message. The verdict condition is correct (no non-stack reads), but the message text says "69 reads observed" — those are writes. Cosmetic message bug.
  2. Misses the structural truth. The function is a writes-only leaf. None of my 5 labels (*_static_*_gate_found, _selector_table_found, _is_thunk, _no_nonstack_reads, _reads_vary_but_flow_static) describe "writes-only leaf that allocates and returns a pointer." Suggest adding dispatcher_writes_only_leaf as a 6th label in Ch267+.

The stream + CF + dedup tables make the structural finding unmistakable, which is exactly why the autopsy pattern is worth keeping despite the under-labeled verdict.

The gate's STORAGE is 0xA000A8C8.

0xA000A8C8 decodes as:

  • kseg1 (uncached) view of physical RAM
  • Physical address 0x0000A8C8 (low 64 KiB of EE RAM)
  • NOT in the 0x80030000-0x80033FF0 scrub range that Ch263 ruled out
  • Word-aligned ✓

The dispatcher (Ch266) is the cleaner. The longjmp-return chain calls it and gets a pointer to a freshly-zeroed buffer. Then the chain returns that pointer up. Whoever writes the "ready value" into 0xA000A8C8 between the cleaner-call and the longjmp-return's next poll is what we're missing.

The most likely culprits, in order:

  1. An interrupt handler. Selector 0x05's $v1 = 0x1F801070 is a giant arrow pointing at IOP INTC. A handler that fires on an IOP-side completion event would write to 0xA000A8C8. Our Ch262 INTC pulse delivered the interrupt but BIOS just W1Ced it and moved on — possibly because the handler didn't write to 0xA000A8C8.
  2. A device-completion path. If $a0=0x07 (a selector used in the treadmill) corresponds to a CD-init or SIF wait, the device's "done" signal would normally write the buffer.
  3. A BIOS-internal init step we're skipping. If our boot path bypasses some early initialization that primes 0xA000A8C8, the treadmill is just waiting for a state that was never set.

Recommendation for Ch267

Phase 1 (passive observation, no stubs): Re-run a focused observer for all reads of 0xA000A8C8 anywhere in the EE map, outside the Ch266 dispatcher window. This tells us:

  • Does BIOS actually read 0xA000A8C8? (Expected: yes, this is the polled gate.)
  • From what PC(s)? (Identifies the polling loop.)
  • What value does it expect? (Probably non-zero; the body decides via bnez $v0 or similar.)

Cheap to implement — copy the Ch264 capture pattern but key on ee_map_ev_arg0 == 32'hA000A8C8 instead of a PC window. No JAL/CF tracking needed. Just emit every R + W at that address.

Phase 2 (active modeling, only if Phase 1 confirms the gate is read elsewhere): Write a non-zero pattern into 0xA000A8C8 from the TB at a known time during reset/init, and see if BIOS escapes the treadmill. This is the "model the gate-setter" step Codex referenced. Concrete TB hook: extend the Ch263 bridge mux pattern but target 0xA000A8C8 instead of the scrubbed kernel-data range, and re-emit the write every ~10 ms so it's not lost.

Phase 3 (only if Phase 2 changes flow): Identify what SHOULD write 0xA000A8C8 in a real PS2 — likely an interrupt handler or device-completion. Replace the TB poke with the real model.

Files changed

  • sim/tb/integration/tb_ee_core_bios_smoke.sv — added \ifdef CH266_DISPATCHER_AUTOPSY block. Six parallel captures: data accesses (R+W), per-invocation register snapshots (with $sp added), control-flow retires, region-name task, CF-mnemonic function, plus the new stack-shape heuristic functions (ch266_ea_is_stack, ch266_value_is_ra_reload). 5-way verdict logic with precedence: selector_table > static gate > thunk > no_nonstack_reads > reads_vary. Two call sites (ch266_print_autopsy()`) in halt and timeout exits.
  • sim/Makefile — new tb_ee_core_bios_long_dispatcher_autopsy target (only -DCH266_DISPATCHER_AUTOPSY).

iverilog 12 quirks — none new

This block hit zero new iverilog quirks. The patterns from Ch264/Ch265 (no return from task, no bit-select on parenthesized expression, trace_pkg:: namespace) were all followed pre-emptively. Clean first-try compile.

Regression

Full regression: 157 / 157 with the new target off by default (CH266_DISPATCHER_AUTOPSY undefined for routine builds).

Standing by for Codex's Ch267 call. Recommendation: Phase 1 (0xA000A8C8-keyed read observer) is the immediate next step — passive, cheap, no stubs. If it confirms BIOS polls 0xA000A8C8 from the longjmp-return chain, Phase 2 (TB poke to model the gate-setter) is the high-probability path to breaking the treadmill.