# Ch264 closeout — callee body is a one-call thunk; the real polled state lives one frame deeper **Status:** Closed. New opt-in target `tb_ee_core_bios_long_callee_autopsy` runs the BIOS-long flow with a narrow observer scoped to the longjmp-return callee body at `0xBFC52984..0xBFC52A04`, capturing every non-fetch data read in that PC range with the EE map's actual returned data (not the hardcoded-zero `ev_arg1`) and the region classifier (`ev_arg3`). **Verdict literal:** `callee_reads_vary_but_flow_static`. **Structural verdict (deeper read of the trace):** `callee_body_is_pure_thunk_to_0xBFC4D370` — the callee's only non-fetch memory read is its own saved `$ra` on the stack; all "real work" lives in the JAL at `0xBFC52990 → 0xBFC4D370` with constant `$a0=0x0F`. ## Codex Ch264 acceptance — line-by-line | Codex requirement | Status | Where | |-------------------------------------------------------------------------|--------|--------------------------------------------------| | Pick candidate (C): scope observer to callee body | ✅ | `CH264_CALLEE_LO/HI` = `0xBFC52984/A04` | | Sample EE-map RETURNED data (not `ev_arg1=0`) | ✅ | `ch264_data[i] <= ee_rd_data` (Ch258 gotcha avoided) | | Tag each read with region classifier | ✅ | `ch264_region[i] <= ev_arg3[7:0]` + `ch264_region_name` task | | Capture >= 2 passes | ✅ | 9 captures across passes 0..8 (covers all 8 Ch217 passes plus pass-0 priming) | | Report ordered transaction stream | ✅ | `[ch264] [i] pass=N pc=0x... ea=0x... data=0x... region=...` | | Build dedup table (hits / pass-mask / data-varies / region) | ✅ | `TOP_DISTINCT_EAs` block | | Emit 4-way verdict | ✅ | `callee_no_data_reads` / `_static_ram_gate_found` / `_static_mmio_gate_found` / `_reads_vary_but_flow_static` | | Routine regression unchanged with target off-by-default | ✅ | Whole block is under `\`ifdef CH264_CALLEE_AUTOPSY` | | Full regression green | ✅ | 157 / 157 | | No RTL touched | ✅ | TB-only addition; one ifdef block + 2 print sites + new make target | ## What the autopsy actually showed ### Stream ``` [ch264] [0] pass=0 pc=0xbfc529f0 ea=0x801ffdfc data=0xbfc521f4 region=EE_RAM [ch264] [1] pass=1 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [2] pass=2 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [3] pass=3 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [4] pass=4 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [5] pass=5 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [6] pass=6 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [7] pass=7 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM [ch264] [8] pass=8 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM ``` ### Dedup ``` TOP_DISTINCT_EAs (count=1) ea=0x801ffdfc hits=9 passes=0x000001ff data=0xbfc521f4 data_varies=1 region=EE_RAM ``` **Exactly one EA is read from the callee body across all 9 passes (0..8): `0x801FFDFC`, in EE_RAM.** That's it. No MMIO. No kernel global. No timer. No INTC. The callee body has zero data-loads outside of one stack reload. ### What `0x801FFDFC` actually is Cross-referencing the Ch217 dump: ``` 0xbfc52984: 0x27bdffe8 addiu $sp,$sp,-24 <- prologue 0xbfc52988: 0xafbf0014 sw $ra,0x14($sp) <- save $ra at $sp+0x14 0xbfc5298c: 0xafa40018 sw $a0,0x18($sp) 0xbfc52990: 0x0ff134dc jal 0xbfc4d370 <- call helper 0xbfc52994: 0x2404000f addiu $a0,$zero,0x0f <- delay slot: $a0=0x0F 0xbfc52998: 0x8fbf0014 lw $ra,0x14($sp) <- restore $ra *** THE READ *** 0xbfc5299c: 0x27bd0018 addiu $sp,$sp,0x18 0xbfc529a0: 0x03e00008 jr $ra 0xbfc529a4: 0x00000000 nop ``` `0x801FFDFC = $sp + 0x14` at the moment of the `lw`. **The callee body's one and only non-fetch read is its own saved return address on the stack** — and `pass=0` returned the priming value `0xBFC521F4` (the caller chain from the first arrival into this function), then `pass=1..8` returned `0xBFC52360`, which is exactly `$ra_pre` in the Ch217 caller table — i.e. the treadmill's stable saved `$ra` from the longjmp restore. The "data varies" flag is set, but it varies between exactly two values: the pre-treadmill `$ra` and the in-treadmill `$ra`. It isn't a polled-state oscillation — it's the trace catching the priming pass before the system settles into the steady-state loop. ### Pass index zero-vs-one quirk `ch217_count` starts at 0 and is incremented after the pass sample is recorded. The Ch264 capture uses `ch217_count` directly as `ch264_pass_idx`, so pass=0 in the Ch264 stream corresponds to "before the first Ch217 pass was recorded" — i.e. the callee was entered once during the initial reset/init flow, then re-entered 8 more times once the Ch217 treadmill latched. This explains why there are 9 captures even though Ch217 reports 8 caller passes. ## The structural finding ``` The longjmp-return callee at 0xBFC52984 is a one-line thunk: void callee(int x) { /* $a0 = 2 from the outer caller */ helper(0x0F); /* JAL 0xBFC4D370, $a0=0x0F */ return; } The callee returns whatever helper(0x0F) returns: $v0_post = 0xa000a8c8 (identical every pass — Ch217 caller table) ``` **The polled gate is NOT in `0xBFC52984..0xBFC52A04`.** Every non-fetch memory read in that PC range is just the stack reload of `$ra`. The thing the Ch215 treadmill is actually waiting on must be one of: 1. **Inside `0xBFC4D370`** — the helper called with `$a0=0x0F`. Returns `0xA000A8C8` every pass. If it polls anything, it's one frame deeper than the autopsy currently sees. 2. **A side-effect of `0xBFC4D370`** that nothing in this scope observes — e.g. a write into kernel memory the longjmp restore later reads. (Unlikely: Ch263 ruled out the scrubbed range, and the outer caller's `$v0/$v1` reads are identical.) 3. **Outside the callee chain entirely** — the BIOS poll-and-jump pattern is reading something that the longjmp keeps re-restoring, so neither the callee nor its helper actually poll. By inspection of the BIOS instruction at `0xBFC52990` → `0xBFC4D370` with `$a0=0x0F`, the function is *very likely* one of: - `_GetCop0` / `_SetCop0` (selector 0x0F) — these are well-known PS2 BIOS syscall helpers in the `_SyscallHandler` block; - A `ConfigSet`/`GetGsHParam`-style accessor; - A `_CdInit` / `_SifCmdInit` style init that consumes a kernel-global. Confirming this requires looking at `0xBFC4D370`'s own body — which is Ch265's job. ## Where this leaves the search The structural map after Ch264: | Layer | What's there | Reads anything? | |------------------------------------|----------------------------------------------------|------------------| | `0xBFC52340..60` (Ch217 trampoline) | beq + nops + JAL | No data reads | | `0xBFC52984..A04` (Ch264 callee body) | save/restore $ra + one JAL to helper | Only `$sp+0x14` (own $ra) | | `0xBFC4D370..?` (helper, Ch265 target) | unknown | **TO BE DETERMINED** | The Ch263 finding (BIOS scrubs `0x80030000-3FF0` every pass) plus the Ch264 finding (callee body has no polled reads) together narrow the search dramatically: whatever the BIOS gate is reading to compute its identical `$v0=0xa000a8c8` every pass, **the read happens inside `0xBFC4D370` or below**, and the gate state (if it lives in EE RAM) lives in a region NOT covered by the `0x80030000-3FF0` scrub. ## Recommendation for Ch265 **Re-aim the autopsy at the next frame.** The Ch264 observer infrastructure is reusable — bump the PC window. The helper `0xBFC4D370` itself starts with `addiu $sp,$sp,-NN; sw $ra,...; ...` (standard MIPS prologue), so its extent can be bounded by walking the BIOS dump to the next `jr $ra; addiu $sp,$sp,NN` or by reading the prologue/epilogue delta directly. A first cut: `0xBFC4D370..0xBFC4D470` (256 bytes = 64 instructions, generous upper bound). The verdict logic can stay the same. The expected outcomes are identical to Ch264: - `callee_no_data_reads` → helper computes from registers only. In that case Ch266 has to look at what populates those registers (`$a0=0x0F` is set by the caller; what about other inputs?). - `callee_static_mmio_gate_found` → **HIT.** That's the polled device, and Ch266 models it. - `callee_static_ram_gate_found` → **HIT.** Some EE RAM location outside the scrubbed range is being read every pass; Ch266 models what writes there. - `callee_reads_vary_but_flow_static` → another thunk-layer. Recurse: Ch266 autopsies whatever JAL the helper makes. ## Files changed - `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added `\`ifdef CH264_CALLEE_AUTOPSY` block (capture arrays, combinational predicates, `always_ff` capture, region-name task, `ch264_print_autopsy` task with verdict logic). Added two `ch264_print_autopsy()` call sites (halt path + timeout path), each gated by the same ifdef. - `sim/Makefile` — new `tb_ee_core_bios_long_callee_autopsy` target (`-DCH264_CALLEE_AUTOPSY` only — no Ch262/Ch263 needed for this observer). ## iverilog 12 gotcha avoided The first compile attempt used `return;` to early-exit the `n == 0` case in `ch264_print_autopsy`. iverilog 12 rejects `return` inside `task`. Rewrote as `if (n==0) ... else begin ...full body... end`. Same logic, no early return. Worth a note because future autopsy-style tasks will probably hit this again. ## Regression Full regression: 157 / 157 with the new target off by default (`CH264_CALLEE_AUTOPSY` undefined for routine builds). Standing by for Codex's Ch265 call. Recommendation: aim the existing observer at `0xBFC4D370` and recompile. No new RTL, no new TB scaffolding — just a parameter bump.