Files
retroDE_ps2/docs/ch264_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

211 lines
10 KiB
Markdown

# Ch264 closeout — callee body is a one-call thunk; the real polled state lives one frame deeper
**Status:** Closed. New opt-in target
`tb_ee_core_bios_long_callee_autopsy` runs the BIOS-long flow with a
narrow observer scoped to the longjmp-return callee body at
`0xBFC52984..0xBFC52A04`, capturing every non-fetch data read in
that PC range with the EE map's actual returned data (not the
hardcoded-zero `ev_arg1`) and the region classifier (`ev_arg3`).
**Verdict literal:** `callee_reads_vary_but_flow_static`.
**Structural verdict (deeper read of the trace):**
`callee_body_is_pure_thunk_to_0xBFC4D370` — the callee's only
non-fetch memory read is its own saved `$ra` on the stack; all
"real work" lives in the JAL at `0xBFC52990 → 0xBFC4D370` with
constant `$a0=0x0F`.
## Codex Ch264 acceptance — line-by-line
| Codex requirement | Status | Where |
|-------------------------------------------------------------------------|--------|--------------------------------------------------|
| Pick candidate (C): scope observer to callee body | ✅ | `CH264_CALLEE_LO/HI` = `0xBFC52984/A04` |
| Sample EE-map RETURNED data (not `ev_arg1=0`) | ✅ | `ch264_data[i] <= ee_rd_data` (Ch258 gotcha avoided) |
| Tag each read with region classifier | ✅ | `ch264_region[i] <= ev_arg3[7:0]` + `ch264_region_name` task |
| Capture >= 2 passes | ✅ | 9 captures across passes 0..8 (covers all 8 Ch217 passes plus pass-0 priming) |
| Report ordered transaction stream | ✅ | `[ch264] [i] pass=N pc=0x... ea=0x... data=0x... region=...` |
| Build dedup table (hits / pass-mask / data-varies / region) | ✅ | `TOP_DISTINCT_EAs` block |
| Emit 4-way verdict | ✅ | `callee_no_data_reads` / `_static_ram_gate_found` / `_static_mmio_gate_found` / `_reads_vary_but_flow_static` |
| Routine regression unchanged with target off-by-default | ✅ | Whole block is under `\`ifdef CH264_CALLEE_AUTOPSY` |
| Full regression green | ✅ | 157 / 157 |
| No RTL touched | ✅ | TB-only addition; one ifdef block + 2 print sites + new make target |
## What the autopsy actually showed
### Stream
```
[ch264] [0] pass=0 pc=0xbfc529f0 ea=0x801ffdfc data=0xbfc521f4 region=EE_RAM
[ch264] [1] pass=1 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [2] pass=2 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [3] pass=3 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [4] pass=4 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [5] pass=5 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [6] pass=6 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [7] pass=7 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
[ch264] [8] pass=8 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
```
### Dedup
```
TOP_DISTINCT_EAs (count=1)
ea=0x801ffdfc hits=9 passes=0x000001ff data=0xbfc521f4 data_varies=1 region=EE_RAM
```
**Exactly one EA is read from the callee body across all 9 passes
(0..8): `0x801FFDFC`, in EE_RAM.** That's it. No MMIO. No kernel
global. No timer. No INTC. The callee body has zero data-loads
outside of one stack reload.
### What `0x801FFDFC` actually is
Cross-referencing the Ch217 dump:
```
0xbfc52984: 0x27bdffe8 addiu $sp,$sp,-24 <- prologue
0xbfc52988: 0xafbf0014 sw $ra,0x14($sp) <- save $ra at $sp+0x14
0xbfc5298c: 0xafa40018 sw $a0,0x18($sp)
0xbfc52990: 0x0ff134dc jal 0xbfc4d370 <- call helper
0xbfc52994: 0x2404000f addiu $a0,$zero,0x0f <- delay slot: $a0=0x0F
0xbfc52998: 0x8fbf0014 lw $ra,0x14($sp) <- restore $ra *** THE READ ***
0xbfc5299c: 0x27bd0018 addiu $sp,$sp,0x18
0xbfc529a0: 0x03e00008 jr $ra
0xbfc529a4: 0x00000000 nop
```
`0x801FFDFC = $sp + 0x14` at the moment of the `lw`. **The callee
body's one and only non-fetch read is its own saved return
address on the stack** — and `pass=0` returned the priming value
`0xBFC521F4` (the caller chain from the first arrival into this
function), then `pass=1..8` returned `0xBFC52360`, which is
exactly `$ra_pre` in the Ch217 caller table — i.e. the
treadmill's stable saved `$ra` from the longjmp restore.
The "data varies" flag is set, but it varies between exactly two
values: the pre-treadmill `$ra` and the in-treadmill `$ra`. It
isn't a polled-state oscillation — it's the trace catching the
priming pass before the system settles into the steady-state
loop.
### Pass index zero-vs-one quirk
`ch217_count` starts at 0 and is incremented after the pass
sample is recorded. The Ch264 capture uses `ch217_count` directly
as `ch264_pass_idx`, so pass=0 in the Ch264 stream corresponds to
"before the first Ch217 pass was recorded" — i.e. the callee was
entered once during the initial reset/init flow, then re-entered
8 more times once the Ch217 treadmill latched. This explains why
there are 9 captures even though Ch217 reports 8 caller passes.
## The structural finding
```
The longjmp-return callee at 0xBFC52984 is a one-line thunk:
void callee(int x) { /* $a0 = 2 from the outer caller */
helper(0x0F); /* JAL 0xBFC4D370, $a0=0x0F */
return;
}
The callee returns whatever helper(0x0F) returns:
$v0_post = 0xa000a8c8 (identical every pass — Ch217 caller table)
```
**The polled gate is NOT in `0xBFC52984..0xBFC52A04`.** Every
non-fetch memory read in that PC range is just the stack reload
of `$ra`. The thing the Ch215 treadmill is actually waiting on
must be one of:
1. **Inside `0xBFC4D370`** — the helper called with `$a0=0x0F`.
Returns `0xA000A8C8` every pass. If it polls anything, it's
one frame deeper than the autopsy currently sees.
2. **A side-effect of `0xBFC4D370`** that nothing in this scope
observes — e.g. a write into kernel memory the longjmp restore
later reads. (Unlikely: Ch263 ruled out the scrubbed range,
and the outer caller's `$v0/$v1` reads are identical.)
3. **Outside the callee chain entirely** — the BIOS poll-and-jump
pattern is reading something that the longjmp keeps re-restoring,
so neither the callee nor its helper actually poll.
By inspection of the BIOS instruction at `0xBFC52990` →
`0xBFC4D370` with `$a0=0x0F`, the function is *very likely* one of:
- `_GetCop0` / `_SetCop0` (selector 0x0F) — these are well-known
PS2 BIOS syscall helpers in the `_SyscallHandler` block;
- A `ConfigSet`/`GetGsHParam`-style accessor;
- A `_CdInit` / `_SifCmdInit` style init that consumes a kernel-global.
Confirming this requires looking at `0xBFC4D370`'s own body —
which is Ch265's job.
## Where this leaves the search
The structural map after Ch264:
| Layer | What's there | Reads anything? |
|------------------------------------|----------------------------------------------------|------------------|
| `0xBFC52340..60` (Ch217 trampoline) | beq + nops + JAL | No data reads |
| `0xBFC52984..A04` (Ch264 callee body) | save/restore $ra + one JAL to helper | Only `$sp+0x14` (own $ra) |
| `0xBFC4D370..?` (helper, Ch265 target) | unknown | **TO BE DETERMINED** |
The Ch263 finding (BIOS scrubs `0x80030000-3FF0` every pass) plus
the Ch264 finding (callee body has no polled reads) together
narrow the search dramatically: whatever the BIOS gate is reading
to compute its identical `$v0=0xa000a8c8` every pass, **the
read happens inside `0xBFC4D370` or below**, and the gate state
(if it lives in EE RAM) lives in a region NOT covered by the
`0x80030000-3FF0` scrub.
## Recommendation for Ch265
**Re-aim the autopsy at the next frame.**
The Ch264 observer infrastructure is reusable — bump the PC
window. The helper `0xBFC4D370` itself starts with `addiu
$sp,$sp,-NN; sw $ra,...; ...` (standard MIPS prologue), so its
extent can be bounded by walking the BIOS dump to the next `jr
$ra; addiu $sp,$sp,NN` or by reading the prologue/epilogue
delta directly. A first cut: `0xBFC4D370..0xBFC4D470` (256 bytes
= 64 instructions, generous upper bound).
The verdict logic can stay the same. The expected outcomes are
identical to Ch264:
- `callee_no_data_reads` → helper computes from registers only.
In that case Ch266 has to look at what populates those registers
(`$a0=0x0F` is set by the caller; what about other inputs?).
- `callee_static_mmio_gate_found` → **HIT.** That's the polled
device, and Ch266 models it.
- `callee_static_ram_gate_found` → **HIT.** Some EE RAM location
outside the scrubbed range is being read every pass; Ch266
models what writes there.
- `callee_reads_vary_but_flow_static` → another thunk-layer.
Recurse: Ch266 autopsies whatever JAL the helper makes.
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
`\`ifdef CH264_CALLEE_AUTOPSY` block (capture arrays,
combinational predicates, `always_ff` capture, region-name task,
`ch264_print_autopsy` task with verdict logic). Added two
`ch264_print_autopsy()` call sites (halt path + timeout path),
each gated by the same ifdef.
- `sim/Makefile` — new `tb_ee_core_bios_long_callee_autopsy`
target (`-DCH264_CALLEE_AUTOPSY` only — no Ch262/Ch263 needed
for this observer).
## iverilog 12 gotcha avoided
The first compile attempt used `return;` to early-exit the
`n == 0` case in `ch264_print_autopsy`. iverilog 12 rejects
`return` inside `task`. Rewrote as `if (n==0) ... else begin
...full body... end`. Same logic, no early return. Worth a note
because future autopsy-style tasks will probably hit this
again.
## Regression
Full regression: 157 / 157 with the new target off by default
(`CH264_CALLEE_AUTOPSY` undefined for routine builds).
Standing by for Codex's Ch265 call. Recommendation: aim the
existing observer at `0xBFC4D370` and recompile. No new RTL,
no new TB scaffolding — just a parameter bump.