RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
Ch264 closeout — callee body is a one-call thunk; the real polled state lives one frame deeper
Status: Closed. New opt-in target
tb_ee_core_bios_long_callee_autopsy runs the BIOS-long flow with a
narrow observer scoped to the longjmp-return callee body at
0xBFC52984..0xBFC52A04, capturing every non-fetch data read in
that PC range with the EE map's actual returned data (not the
hardcoded-zero ev_arg1) and the region classifier (ev_arg3).
Verdict literal: callee_reads_vary_but_flow_static.
Structural verdict (deeper read of the trace):
callee_body_is_pure_thunk_to_0xBFC4D370 — the callee's only
non-fetch memory read is its own saved $ra on the stack; all
"real work" lives in the JAL at 0xBFC52990 → 0xBFC4D370 with
constant $a0=0x0F.
Codex Ch264 acceptance — line-by-line
| Codex requirement | Status | Where |
|---|---|---|
| Pick candidate (C): scope observer to callee body | ✅ | CH264_CALLEE_LO/HI = 0xBFC52984/A04 |
Sample EE-map RETURNED data (not ev_arg1=0) |
✅ | ch264_data[i] <= ee_rd_data (Ch258 gotcha avoided) |
| Tag each read with region classifier | ✅ | ch264_region[i] <= ev_arg3[7:0] + ch264_region_name task |
| Capture >= 2 passes | ✅ | 9 captures across passes 0..8 (covers all 8 Ch217 passes plus pass-0 priming) |
| Report ordered transaction stream | ✅ | [ch264] [i] pass=N pc=0x... ea=0x... data=0x... region=... |
| Build dedup table (hits / pass-mask / data-varies / region) | ✅ | TOP_DISTINCT_EAs block |
| Emit 4-way verdict | ✅ | callee_no_data_reads / _static_ram_gate_found / _static_mmio_gate_found / _reads_vary_but_flow_static |
| Routine regression unchanged with target off-by-default | ✅ | Whole block is under \ifdef CH264_CALLEE_AUTOPSY` |
| Full regression green | ✅ | 157 / 157 |
| No RTL touched | ✅ | TB-only addition; one ifdef block + 2 print sites + new make target |
What the autopsy actually showed
Stream
[ch264] [0] pass=0 pc=0xbfc529f0 ea=0x801ffdfc data=0xbfc521f4 region=EE_RAM
[ch264] [1] pass=1 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [2] pass=2 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [3] pass=3 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [4] pass=4 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [5] pass=5 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [6] pass=6 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [7] pass=7 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [8] pass=8 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
Dedup
TOP_DISTINCT_EAs (count=1)
ea=0x801ffdfc hits=9 passes=0x000001ff data=0xbfc521f4 data_varies=1 region=EE_RAM
Exactly one EA is read from the callee body across all 9 passes
(0..8): 0x801FFDFC, in EE_RAM. That's it. No MMIO. No kernel
global. No timer. No INTC. The callee body has zero data-loads
outside of one stack reload.
What 0x801FFDFC actually is
Cross-referencing the Ch217 dump:
0xbfc52984: 0x27bdffe8 addiu $sp,$sp,-24 <- prologue
0xbfc52988: 0xafbf0014 sw $ra,0x14($sp) <- save $ra at $sp+0x14
0xbfc5298c: 0xafa40018 sw $a0,0x18($sp)
0xbfc52990: 0x0ff134dc jal 0xbfc4d370 <- call helper
0xbfc52994: 0x2404000f addiu $a0,$zero,0x0f <- delay slot: $a0=0x0F
0xbfc52998: 0x8fbf0014 lw $ra,0x14($sp) <- restore $ra *** THE READ ***
0xbfc5299c: 0x27bd0018 addiu $sp,$sp,0x18
0xbfc529a0: 0x03e00008 jr $ra
0xbfc529a4: 0x00000000 nop
0x801FFDFC = $sp + 0x14 at the moment of the lw. The callee
body's one and only non-fetch read is its own saved return
address on the stack — and pass=0 returned the priming value
0xBFC521F4 (the caller chain from the first arrival into this
function), then pass=1..8 returned 0xBFC52360, which is
exactly $ra_pre in the Ch217 caller table — i.e. the
treadmill's stable saved $ra from the longjmp restore.
The "data varies" flag is set, but it varies between exactly two
values: the pre-treadmill $ra and the in-treadmill $ra. It
isn't a polled-state oscillation — it's the trace catching the
priming pass before the system settles into the steady-state
loop.
Pass index zero-vs-one quirk
ch217_count starts at 0 and is incremented after the pass
sample is recorded. The Ch264 capture uses ch217_count directly
as ch264_pass_idx, so pass=0 in the Ch264 stream corresponds to
"before the first Ch217 pass was recorded" — i.e. the callee was
entered once during the initial reset/init flow, then re-entered
8 more times once the Ch217 treadmill latched. This explains why
there are 9 captures even though Ch217 reports 8 caller passes.
The structural finding
The longjmp-return callee at 0xBFC52984 is a one-line thunk:
void callee(int x) { /* $a0 = 2 from the outer caller */
helper(0x0F); /* JAL 0xBFC4D370, $a0=0x0F */
return;
}
The callee returns whatever helper(0x0F) returns:
$v0_post = 0xa000a8c8 (identical every pass — Ch217 caller table)
The polled gate is NOT in 0xBFC52984..0xBFC52A04. Every
non-fetch memory read in that PC range is just the stack reload
of $ra. The thing the Ch215 treadmill is actually waiting on
must be one of:
- Inside
0xBFC4D370— the helper called with$a0=0x0F. Returns0xA000A8C8every pass. If it polls anything, it's one frame deeper than the autopsy currently sees. - A side-effect of
0xBFC4D370that nothing in this scope observes — e.g. a write into kernel memory the longjmp restore later reads. (Unlikely: Ch263 ruled out the scrubbed range, and the outer caller's$v0/$v1reads are identical.) - Outside the callee chain entirely — the BIOS poll-and-jump pattern is reading something that the longjmp keeps re-restoring, so neither the callee nor its helper actually poll.
By inspection of the BIOS instruction at 0xBFC52990 →
0xBFC4D370 with $a0=0x0F, the function is very likely one of:
_GetCop0/_SetCop0(selector 0x0F) — these are well-known PS2 BIOS syscall helpers in the_SyscallHandlerblock;- A
ConfigSet/GetGsHParam-style accessor; - A
_CdInit/_SifCmdInitstyle init that consumes a kernel-global.
Confirming this requires looking at 0xBFC4D370's own body —
which is Ch265's job.
Where this leaves the search
The structural map after Ch264:
| Layer | What's there | Reads anything? |
|---|---|---|
0xBFC52340..60 (Ch217 trampoline) |
beq + nops + JAL | No data reads |
0xBFC52984..A04 (Ch264 callee body) |
save/restore $ra + one JAL to helper | Only $sp+0x14 (own $ra) |
0xBFC4D370..? (helper, Ch265 target) |
unknown | TO BE DETERMINED |
The Ch263 finding (BIOS scrubs 0x80030000-3FF0 every pass) plus
the Ch264 finding (callee body has no polled reads) together
narrow the search dramatically: whatever the BIOS gate is reading
to compute its identical $v0=0xa000a8c8 every pass, the
read happens inside 0xBFC4D370 or below, and the gate state
(if it lives in EE RAM) lives in a region NOT covered by the
0x80030000-3FF0 scrub.
Recommendation for Ch265
Re-aim the autopsy at the next frame.
The Ch264 observer infrastructure is reusable — bump the PC
window. The helper 0xBFC4D370 itself starts with addiu $sp,$sp,-NN; sw $ra,...; ... (standard MIPS prologue), so its
extent can be bounded by walking the BIOS dump to the next jr $ra; addiu $sp,$sp,NN or by reading the prologue/epilogue
delta directly. A first cut: 0xBFC4D370..0xBFC4D470 (256 bytes
= 64 instructions, generous upper bound).
The verdict logic can stay the same. The expected outcomes are identical to Ch264:
callee_no_data_reads→ helper computes from registers only. In that case Ch266 has to look at what populates those registers ($a0=0x0Fis set by the caller; what about other inputs?).callee_static_mmio_gate_found→ HIT. That's the polled device, and Ch266 models it.callee_static_ram_gate_found→ HIT. Some EE RAM location outside the scrubbed range is being read every pass; Ch266 models what writes there.callee_reads_vary_but_flow_static→ another thunk-layer. Recurse: Ch266 autopsies whatever JAL the helper makes.
Files changed
sim/tb/integration/tb_ee_core_bios_smoke.sv— added\ifdef CH264_CALLEE_AUTOPSYblock (capture arrays, combinational predicates,always_ffcapture, region-name task,ch264_print_autopsytask with verdict logic). Added twoch264_print_autopsy()` call sites (halt path + timeout path), each gated by the same ifdef.sim/Makefile— newtb_ee_core_bios_long_callee_autopsytarget (-DCH264_CALLEE_AUTOPSYonly — no Ch262/Ch263 needed for this observer).
iverilog 12 gotcha avoided
The first compile attempt used return; to early-exit the
n == 0 case in ch264_print_autopsy. iverilog 12 rejects
return inside task. Rewrote as if (n==0) ... else begin ...full body... end. Same logic, no early return. Worth a note
because future autopsy-style tasks will probably hit this
again.
Regression
Full regression: 157 / 157 with the new target off by default
(CH264_CALLEE_AUTOPSY undefined for routine builds).
Standing by for Codex's Ch265 call. Recommendation: aim the
existing observer at 0xBFC4D370 and recompile. No new RTL,
no new TB scaffolding — just a parameter bump.