Files
retroDE_ps2/docs/ch265_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

11 KiB
Raw Permalink Blame History

Ch265 closeout — helper is ALSO a one-call thunk (to 0xBFC4F320); recurse once more

Status: Closed. New opt-in target tb_ee_core_bios_long_helper_autopsy runs the BIOS-long flow with the Ch264 observer pattern re-aimed at the helper body 0xBFC4D370..0xBFC4D470, plus two new tracks: (1) per-invocation $a0_in/$v0_post/$v1_post snapshots on entry-and-return, (2) JAL/J/JR/JALR retire log inside the helper with statically- decoded targets and "LEAVES helper" annotations.

Literal verdict the task emits: helper_static_ram_gate_found (EA=0x801FFDE4 returns identical 0xBFC52998 across 8 hits — region=EE_RAM).

Structural verdict (visible in the stream + CF table): helper_is_thunk — the helper is another one-call thunk, this time to 0xBFC4F320. The literal label is a known false-positive (see "Verdict-label nuance" below); the real polled gate is still one frame deeper.

Codex Ch265 acceptance — line-by-line

Codex requirement Status Where
Reuse Ch264 observer one frame deeper at 0xBFC4D370..0xBFC4D470 CH265_HELPER_LO/HI = 0xBFC4D370/D470
Same region tagging and compact tables ch265_region_name task; same shape as Ch264
Capture non-fetch data reads only Same !ch265_is_fetch predicate as Ch264
Include calls/jumps out of the helper HELPER_CONTROL_FLOW table — J/JAL/JR/JALR retires inside helper, with statically-decoded J/JAL target and "LEAVES helper" notes
Track $a0=0x0F at entry and returned $v0 HELPER_PASSES table with $a0_in/$ra_in/$v0_post/$v1_post
Compare pass 0 versus steady-state passes 18 pass=N column in every table; trivial visual diff
Verdicts mirror Ch264 + helper_is_thunk 5-way verdict logic
No new side-effect stubs TB-only addition; no RTL touched
Regression unaffected 157 / 157 with target off-by-default

What the autopsy showed

HELPER_PASSES (per-invocation entry/exit register snapshots)

The helper is called from many places, not just from the Ch264 callee. The first 7 invocations are pre-treadmill BIOS init with varying $a0_in (0xF, 0xE, 0x1, 0x4, 0x5, 0x6, 0x7). The treadmill itself (cycles 10.2M onward) shows a deterministic pair every Ch217 pass:

[7]  cyc=10194426  $a0_in=0x0F  $ra_in=0xBFC52998  $v0_post=0xA000A8C8  $v1_post=0x00000008
[8]  cyc=10194505  $a0_in=0x07  $ra_in=0xBFC52368  $v0_post=???        $v1_post=???
[9]  cyc=20095076  $a0_in=0x0F  $ra_in=0xBFC52998  $v0_post=0xA000A8C8  $v1_post=0x00000008
[10] cyc=20095155  $a0_in=0x07  $ra_in=0xBFC52368  $v0_post=???        $v1_post=???
...repeats every Ch217 pass...

Two callers, interleaved:

Caller location $a0 Return target
Ch264 callee at 0xBFC52990 0x0F 0xBFC52998
Ch217 trampoline at 0xBFC52360 0x07 0xBFC52368

The $a0=0x07 path's $v0_post is x because the exit predicate was scoped only to "return-to-Ch264-callee" (PC=0xBFC52998). Future autopsy refinement: also exit on PC=0xBFC52368 to capture the other arm's $v0. Doesn't change the structural conclusion.

The $a0=0x0F path returns $v0=0xA000A8C8 identically every treadmill pass — that matches the Ch217 outer-caller's $v0_post=0xa000a8c8 exactly. Consistency check ✓.

HELPER_CONTROL_FLOW (every JAL/J/JR retired inside helper)

pc=0xBFC4D380  instr=0x0FF13CC8  jal  target=0xBFC4F320   <-- LEAVES helper
pc=0xBFC4D390  instr=0x03E00008  jr   target=0x00000000

Repeated 47 times (every helper invocation hits this exact pair). The helper has exactly one JAL out, every time, to 0xBFC4F320. No conditional branches, no other JALs, no JR that isn't the function epilog. This is a one-call thunk by structure.

HELPER_BODY_DATA_READS (every non-fetch read inside helper)

23 reads captured. All from the single PC 0xBFC4D388 — which is the instruction immediately after the JAL's delay slot, i.e. the saved-$ra reload (lw $ra,N($sp) in the standard MIPS epilog).

Three distinct EAs, all in EE_RAM:

EA Hits Pass mask First data data_varies What it is
0x801FFEE4 2 0x0001 0xBFC528AC yes Pre-treadmill $sp's $ra slot (only during BIOS init)
0x801FFDFC 13 0x01FF 0xBFC521C4..0xBFC52368 yes Ch217 trampoline's $sp+$ra-slot ($a0=0x07 caller)
0x801FFDE4 8 0x01FE 0xBFC52998 no Ch264 callee's $sp+$ra-slot ($a0=0x0F caller) — stable because that caller never changes

Each helper invocation reads exactly one EA — the saved $ra at its caller-determined stack frame. There is no MMIO read. No kernel-global read. No timer read. No non-stack read of any kind. The helper body is structurally the same shape as the Ch264 callee: prologue → JAL → restore $ra from stack → JR.

Verdict-label nuance — false-positive

The literal verdict helper_static_ram_gate_found (EA=0x801FFDE4 ... data=0xBFC52998) is a known false-positive of the stable-EA heuristic. The condition "appears in ≥2 passes AND data doesn't vary" is satisfied because the Ch264-callee-side caller path is itself stable (every pass the helper is entered with the same $ra=0xBFC52998, so the saved-$ra slot reload returns the same value).

But 0xBFC52998 is exactly $ra_in + 0 for the Ch264-callee caller — i.e. it's the return address that the helper itself stashed on entry, not a polled state. Reading it back yields a stable value because the caller doesn't change, not because external state is settled.

The stack-only check (abs(ea - first_ea) ≤ 0x40 && region=EE_RAM) didn't filter this out either — the helper is called from two caller-paths with different $sp values 0x801FFDE4 and 0x801FFDFC, which are 0x18 apart but the all-three-EAs spread is 0x100 wide (because 0x801FFEE4 - 0x801FFDE4 = 0x100), exceeding the 0x40 sibling threshold.

A more robust heuristic would discount any stable read whose returned value equals the caller's $ra_in (i.e. detect saved- $ra reloads explicitly). Not blocking — the control-flow table makes the structural truth obvious without the heuristic. Future Ch266+ autopsies can incorporate this filter.

After Ch263+Ch264+Ch265, the structural picture:

Ch217 trampoline 0xBFC52340..60
  -> JAL 0xBFC52984  (Ch264 callee, $a0=2)
      -> sw $ra,0x14($sp)
      -> JAL 0xBFC4D370 (Ch265 helper, $a0=0x0F)   ← thunk
         -> sw $ra,N($sp)
         -> JAL 0xBFC4F320  (Ch266 target)   ← thunk to ???
            -> ???
         -> lw $ra,N($sp)
         -> jr $ra
      -> lw $ra,0x14($sp)
      -> jr $ra
  -> JAL 0xBFC4D370 again with $a0=0x07  (Ch217 post-call path)
     same thunk to 0xBFC4F320

Every layer so far has been a wrapper. The actual work — the polled-state lookup — has not yet appeared. It almost certainly lives at or below 0xBFC4F320.

The constant $a0=0x0F selector passing all the way through 0xBFC52984 -> 0xBFC4D370 -> 0xBFC4F320 strongly suggests this is a selector-dispatched BIOS API: something like GetXY(selector=0x0F). The Ch217 outer-caller also calls this chain with $a0=2, and the Ch217 trampoline's second JAL goes through with $a0=0x07. Different selectors, same dispatcher. This is a classic PS2 BIOS pattern: a single entry point with a selector argument.

$v0=0xA000A8C8 is a kernel-space pointer (the kuseg of A0.. maps to physical RAM in the conventional kseg0 shadow). That return value being constant every pass is consistent with the dispatch returning a pointer to a stable kernel structure, which the longjmp-return caller then uses as a jump table base or as a data source.

Recommendation for Ch266

Recurse one more frame, to 0xBFC4F320. Same observer pattern, bump the PC window. Expected outcomes (in order of likelihood, based on the chain so far):

  1. helper_is_thunk again0xBFC4F320 is also a wrapper to something deeper. Then Ch267 follows its JAL out.
  2. helper_static_mmio_gate_found0xBFC4F320 reads from some PS2 MMIO region (EE INTC, EE BIU, EE_MISC_MMIO, or 0x1FA00000 which was the Ch263 deferred Pivot 2). That's the gate. Ch267 models the device.
  3. helper_static_ram_gate_found with a non-stack EA — a kernel global in EE_RAM. Ch267 models what writes there.

Implementation notes for the autopsy itself:

  • The verdict heuristic should add a saved-$ra filter: discount any stable EA whose returned value equals the most-common $ra_in for the same caller. Could be done in the autopsy itself, or post-hoc by reading the stream. Note this in the block.
  • The HELPER_PASSES exit predicate (PC=0xBFC52998) was scoped to the Ch264-callee return; the Ch217-trampoline caller's return was missed. For Ch266 (assuming again a single primary caller from the deeper helper), pick the most-frequent caller's post-JAL PC and gate exit on that. Alternatively widen exit: trigger on ANY retire whose PC is outside the helper window and was reached from inside in the immediately preceding cycle. Not critical.
  • The CH265_PASSES cap of 16 is fine for 8 Ch217 passes × 2 caller paths per pass = 16 invocations. For the next layer bump to 32 to leave headroom.

Files changed

  • sim/tb/integration/tb_ee_core_bios_smoke.sv — added \ifdef CH265_HELPER_AUTOPSYblock. New structure: data-read capture (mirror of Ch264),$a0/$ra/$v0/$v1per-invocation snapshots, control-flow capture withpeek_instr-driven opcode decode and J/JAL-target computation, region-name task, ch265_cf_mnemonicfunction for prettier prints, full 5-way verdict logic. Twoch265_print_autopsy()` call sites (halt + timeout exits), both gated by the ifdef.
  • sim/Makefile — new tb_ee_core_bios_long_helper_autopsy target (only -DCH265_HELPER_AUTOPSY).

iverilog 12 gotchas hit (and avoided)

  1. Bit-select on parenthesized function-result expression. First version had { (pc + 32'd4)[31:28], instr[25:0], 2'b00 } inside ch265_jtarget. Elaborated as "Malformed statement." Fix: compute dslot = pc + 32'd4 into a temp, then bit-select dslot[31:28]. (Already documented in project-self-driven-milestone — bit-select on function return; same shape.)
  2. Wrong identifier names for trace_pkg constants. First version used bare EV_READ / SUBSYS_MEM / ee_map_ev_kind. The right names are trace_pkg::EV_READ / trace_pkg::SUBSYS_MEM / ee_map_ev_event. Easy to confirm by grepping existing Ch218 and Ch264 capture code.

Regression

Full regression: 157 / 157 with the new target off by default (CH265_HELPER_AUTOPSY undefined for routine builds).

Standing by for Codex's Ch266 call. Recommendation: recurse to 0xBFC4F320. Same observer infrastructure; bump the parameter.