Files
retroDE_ps2/docs/ch264_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

10 KiB

Ch264 closeout — callee body is a one-call thunk; the real polled state lives one frame deeper

Status: Closed. New opt-in target tb_ee_core_bios_long_callee_autopsy runs the BIOS-long flow with a narrow observer scoped to the longjmp-return callee body at 0xBFC52984..0xBFC52A04, capturing every non-fetch data read in that PC range with the EE map's actual returned data (not the hardcoded-zero ev_arg1) and the region classifier (ev_arg3).

Verdict literal: callee_reads_vary_but_flow_static. Structural verdict (deeper read of the trace): callee_body_is_pure_thunk_to_0xBFC4D370 — the callee's only non-fetch memory read is its own saved $ra on the stack; all "real work" lives in the JAL at 0xBFC52990 → 0xBFC4D370 with constant $a0=0x0F.

Codex Ch264 acceptance — line-by-line

Codex requirement Status Where
Pick candidate (C): scope observer to callee body CH264_CALLEE_LO/HI = 0xBFC52984/A04
Sample EE-map RETURNED data (not ev_arg1=0) ch264_data[i] <= ee_rd_data (Ch258 gotcha avoided)
Tag each read with region classifier ch264_region[i] <= ev_arg3[7:0] + ch264_region_name task
Capture >= 2 passes 9 captures across passes 0..8 (covers all 8 Ch217 passes plus pass-0 priming)
Report ordered transaction stream [ch264] [i] pass=N pc=0x... ea=0x... data=0x... region=...
Build dedup table (hits / pass-mask / data-varies / region) TOP_DISTINCT_EAs block
Emit 4-way verdict callee_no_data_reads / _static_ram_gate_found / _static_mmio_gate_found / _reads_vary_but_flow_static
Routine regression unchanged with target off-by-default Whole block is under \ifdef CH264_CALLEE_AUTOPSY`
Full regression green 157 / 157
No RTL touched TB-only addition; one ifdef block + 2 print sites + new make target

What the autopsy actually showed

Stream

[ch264]   [0] pass=0 pc=0xbfc529f0 ea=0x801ffdfc data=0xbfc521f4 region=EE_RAM
[ch264]   [1] pass=1 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [2] pass=2 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [3] pass=3 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [4] pass=4 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [5] pass=5 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [6] pass=6 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [7] pass=7 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264]   [8] pass=8 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM

Dedup

TOP_DISTINCT_EAs (count=1)
  ea=0x801ffdfc  hits=9  passes=0x000001ff  data=0xbfc521f4  data_varies=1  region=EE_RAM

Exactly one EA is read from the callee body across all 9 passes (0..8): 0x801FFDFC, in EE_RAM. That's it. No MMIO. No kernel global. No timer. No INTC. The callee body has zero data-loads outside of one stack reload.

What 0x801FFDFC actually is

Cross-referencing the Ch217 dump:

0xbfc52984: 0x27bdffe8  addiu $sp,$sp,-24      <- prologue
0xbfc52988: 0xafbf0014  sw    $ra,0x14($sp)    <- save $ra at $sp+0x14
0xbfc5298c: 0xafa40018  sw    $a0,0x18($sp)
0xbfc52990: 0x0ff134dc  jal   0xbfc4d370       <- call helper
0xbfc52994: 0x2404000f  addiu $a0,$zero,0x0f   <- delay slot: $a0=0x0F
0xbfc52998: 0x8fbf0014  lw    $ra,0x14($sp)    <- restore $ra  *** THE READ ***
0xbfc5299c: 0x27bd0018  addiu $sp,$sp,0x18
0xbfc529a0: 0x03e00008  jr    $ra
0xbfc529a4: 0x00000000  nop

0x801FFDFC = $sp + 0x14 at the moment of the lw. The callee body's one and only non-fetch read is its own saved return address on the stack — and pass=0 returned the priming value 0xBFC521F4 (the caller chain from the first arrival into this function), then pass=1..8 returned 0xBFC52360, which is exactly $ra_pre in the Ch217 caller table — i.e. the treadmill's stable saved $ra from the longjmp restore.

The "data varies" flag is set, but it varies between exactly two values: the pre-treadmill $ra and the in-treadmill $ra. It isn't a polled-state oscillation — it's the trace catching the priming pass before the system settles into the steady-state loop.

Pass index zero-vs-one quirk

ch217_count starts at 0 and is incremented after the pass sample is recorded. The Ch264 capture uses ch217_count directly as ch264_pass_idx, so pass=0 in the Ch264 stream corresponds to "before the first Ch217 pass was recorded" — i.e. the callee was entered once during the initial reset/init flow, then re-entered 8 more times once the Ch217 treadmill latched. This explains why there are 9 captures even though Ch217 reports 8 caller passes.

The structural finding

The longjmp-return callee at 0xBFC52984 is a one-line thunk:
    void callee(int x) {  /* $a0 = 2 from the outer caller */
        helper(0x0F);     /* JAL 0xBFC4D370, $a0=0x0F */
        return;
    }
The callee returns whatever helper(0x0F) returns:
    $v0_post = 0xa000a8c8  (identical every pass — Ch217 caller table)

The polled gate is NOT in 0xBFC52984..0xBFC52A04. Every non-fetch memory read in that PC range is just the stack reload of $ra. The thing the Ch215 treadmill is actually waiting on must be one of:

  1. Inside 0xBFC4D370 — the helper called with $a0=0x0F. Returns 0xA000A8C8 every pass. If it polls anything, it's one frame deeper than the autopsy currently sees.
  2. A side-effect of 0xBFC4D370 that nothing in this scope observes — e.g. a write into kernel memory the longjmp restore later reads. (Unlikely: Ch263 ruled out the scrubbed range, and the outer caller's $v0/$v1 reads are identical.)
  3. Outside the callee chain entirely — the BIOS poll-and-jump pattern is reading something that the longjmp keeps re-restoring, so neither the callee nor its helper actually poll.

By inspection of the BIOS instruction at 0xBFC529900xBFC4D370 with $a0=0x0F, the function is very likely one of:

  • _GetCop0 / _SetCop0 (selector 0x0F) — these are well-known PS2 BIOS syscall helpers in the _SyscallHandler block;
  • A ConfigSet/GetGsHParam-style accessor;
  • A _CdInit / _SifCmdInit style init that consumes a kernel-global.

Confirming this requires looking at 0xBFC4D370's own body — which is Ch265's job.

The structural map after Ch264:

Layer What's there Reads anything?
0xBFC52340..60 (Ch217 trampoline) beq + nops + JAL No data reads
0xBFC52984..A04 (Ch264 callee body) save/restore $ra + one JAL to helper Only $sp+0x14 (own $ra)
0xBFC4D370..? (helper, Ch265 target) unknown TO BE DETERMINED

The Ch263 finding (BIOS scrubs 0x80030000-3FF0 every pass) plus the Ch264 finding (callee body has no polled reads) together narrow the search dramatically: whatever the BIOS gate is reading to compute its identical $v0=0xa000a8c8 every pass, the read happens inside 0xBFC4D370 or below, and the gate state (if it lives in EE RAM) lives in a region NOT covered by the 0x80030000-3FF0 scrub.

Recommendation for Ch265

Re-aim the autopsy at the next frame.

The Ch264 observer infrastructure is reusable — bump the PC window. The helper 0xBFC4D370 itself starts with addiu $sp,$sp,-NN; sw $ra,...; ... (standard MIPS prologue), so its extent can be bounded by walking the BIOS dump to the next jr $ra; addiu $sp,$sp,NN or by reading the prologue/epilogue delta directly. A first cut: 0xBFC4D370..0xBFC4D470 (256 bytes = 64 instructions, generous upper bound).

The verdict logic can stay the same. The expected outcomes are identical to Ch264:

  • callee_no_data_reads → helper computes from registers only. In that case Ch266 has to look at what populates those registers ($a0=0x0F is set by the caller; what about other inputs?).
  • callee_static_mmio_gate_foundHIT. That's the polled device, and Ch266 models it.
  • callee_static_ram_gate_foundHIT. Some EE RAM location outside the scrubbed range is being read every pass; Ch266 models what writes there.
  • callee_reads_vary_but_flow_static → another thunk-layer. Recurse: Ch266 autopsies whatever JAL the helper makes.

Files changed

  • sim/tb/integration/tb_ee_core_bios_smoke.sv — added \ifdef CH264_CALLEE_AUTOPSYblock (capture arrays, combinational predicates,always_ffcapture, region-name task,ch264_print_autopsytask with verdict logic). Added twoch264_print_autopsy()` call sites (halt path + timeout path), each gated by the same ifdef.
  • sim/Makefile — new tb_ee_core_bios_long_callee_autopsy target (-DCH264_CALLEE_AUTOPSY only — no Ch262/Ch263 needed for this observer).

iverilog 12 gotcha avoided

The first compile attempt used return; to early-exit the n == 0 case in ch264_print_autopsy. iverilog 12 rejects return inside task. Rewrote as if (n==0) ... else begin ...full body... end. Same logic, no early return. Worth a note because future autopsy-style tasks will probably hit this again.

Regression

Full regression: 157 / 157 with the new target off by default (CH264_CALLEE_AUTOPSY undefined for routine builds).

Standing by for Codex's Ch265 call. Recommendation: aim the existing observer at 0xBFC4D370 and recompile. No new RTL, no new TB scaffolding — just a parameter bump.