ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
211 lines
10 KiB
Markdown
211 lines
10 KiB
Markdown
# Ch264 closeout — callee body is a one-call thunk; the real polled state lives one frame deeper
|
|
|
|
**Status:** Closed. New opt-in target
|
|
`tb_ee_core_bios_long_callee_autopsy` runs the BIOS-long flow with a
|
|
narrow observer scoped to the longjmp-return callee body at
|
|
`0xBFC52984..0xBFC52A04`, capturing every non-fetch data read in
|
|
that PC range with the EE map's actual returned data (not the
|
|
hardcoded-zero `ev_arg1`) and the region classifier (`ev_arg3`).
|
|
|
|
**Verdict literal:** `callee_reads_vary_but_flow_static`.
|
|
**Structural verdict (deeper read of the trace):**
|
|
`callee_body_is_pure_thunk_to_0xBFC4D370` — the callee's only
|
|
non-fetch memory read is its own saved `$ra` on the stack; all
|
|
"real work" lives in the JAL at `0xBFC52990 → 0xBFC4D370` with
|
|
constant `$a0=0x0F`.
|
|
|
|
## Codex Ch264 acceptance — line-by-line
|
|
|
|
| Codex requirement | Status | Where |
|
|
|-------------------------------------------------------------------------|--------|--------------------------------------------------|
|
|
| Pick candidate (C): scope observer to callee body | ✅ | `CH264_CALLEE_LO/HI` = `0xBFC52984/A04` |
|
|
| Sample EE-map RETURNED data (not `ev_arg1=0`) | ✅ | `ch264_data[i] <= ee_rd_data` (Ch258 gotcha avoided) |
|
|
| Tag each read with region classifier | ✅ | `ch264_region[i] <= ev_arg3[7:0]` + `ch264_region_name` task |
|
|
| Capture >= 2 passes | ✅ | 9 captures across passes 0..8 (covers all 8 Ch217 passes plus pass-0 priming) |
|
|
| Report ordered transaction stream | ✅ | `[ch264] [i] pass=N pc=0x... ea=0x... data=0x... region=...` |
|
|
| Build dedup table (hits / pass-mask / data-varies / region) | ✅ | `TOP_DISTINCT_EAs` block |
|
|
| Emit 4-way verdict | ✅ | `callee_no_data_reads` / `_static_ram_gate_found` / `_static_mmio_gate_found` / `_reads_vary_but_flow_static` |
|
|
| Routine regression unchanged with target off-by-default | ✅ | Whole block is under `\`ifdef CH264_CALLEE_AUTOPSY` |
|
|
| Full regression green | ✅ | 157 / 157 |
|
|
| No RTL touched | ✅ | TB-only addition; one ifdef block + 2 print sites + new make target |
|
|
|
|
## What the autopsy actually showed
|
|
|
|
### Stream
|
|
|
|
```
|
|
[ch264] [0] pass=0 pc=0xbfc529f0 ea=0x801ffdfc data=0xbfc521f4 region=EE_RAM
|
|
[ch264] [1] pass=1 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [2] pass=2 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [3] pass=3 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [4] pass=4 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [5] pass=5 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [6] pass=6 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [7] pass=7 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
[ch264] [8] pass=8 pc=0xbfc52998 ea=0x801ffdfc data=0xbfc52360 region=EE_RAM
|
|
```
|
|
|
|
### Dedup
|
|
|
|
```
|
|
TOP_DISTINCT_EAs (count=1)
|
|
ea=0x801ffdfc hits=9 passes=0x000001ff data=0xbfc521f4 data_varies=1 region=EE_RAM
|
|
```
|
|
|
|
**Exactly one EA is read from the callee body across all 9 passes
|
|
(0..8): `0x801FFDFC`, in EE_RAM.** That's it. No MMIO. No kernel
|
|
global. No timer. No INTC. The callee body has zero data-loads
|
|
outside of one stack reload.
|
|
|
|
### What `0x801FFDFC` actually is
|
|
|
|
Cross-referencing the Ch217 dump:
|
|
|
|
```
|
|
0xbfc52984: 0x27bdffe8 addiu $sp,$sp,-24 <- prologue
|
|
0xbfc52988: 0xafbf0014 sw $ra,0x14($sp) <- save $ra at $sp+0x14
|
|
0xbfc5298c: 0xafa40018 sw $a0,0x18($sp)
|
|
0xbfc52990: 0x0ff134dc jal 0xbfc4d370 <- call helper
|
|
0xbfc52994: 0x2404000f addiu $a0,$zero,0x0f <- delay slot: $a0=0x0F
|
|
0xbfc52998: 0x8fbf0014 lw $ra,0x14($sp) <- restore $ra *** THE READ ***
|
|
0xbfc5299c: 0x27bd0018 addiu $sp,$sp,0x18
|
|
0xbfc529a0: 0x03e00008 jr $ra
|
|
0xbfc529a4: 0x00000000 nop
|
|
```
|
|
|
|
`0x801FFDFC = $sp + 0x14` at the moment of the `lw`. **The callee
|
|
body's one and only non-fetch read is its own saved return
|
|
address on the stack** — and `pass=0` returned the priming value
|
|
`0xBFC521F4` (the caller chain from the first arrival into this
|
|
function), then `pass=1..8` returned `0xBFC52360`, which is
|
|
exactly `$ra_pre` in the Ch217 caller table — i.e. the
|
|
treadmill's stable saved `$ra` from the longjmp restore.
|
|
|
|
The "data varies" flag is set, but it varies between exactly two
|
|
values: the pre-treadmill `$ra` and the in-treadmill `$ra`. It
|
|
isn't a polled-state oscillation — it's the trace catching the
|
|
priming pass before the system settles into the steady-state
|
|
loop.
|
|
|
|
### Pass index zero-vs-one quirk
|
|
|
|
`ch217_count` starts at 0 and is incremented after the pass
|
|
sample is recorded. The Ch264 capture uses `ch217_count` directly
|
|
as `ch264_pass_idx`, so pass=0 in the Ch264 stream corresponds to
|
|
"before the first Ch217 pass was recorded" — i.e. the callee was
|
|
entered once during the initial reset/init flow, then re-entered
|
|
8 more times once the Ch217 treadmill latched. This explains why
|
|
there are 9 captures even though Ch217 reports 8 caller passes.
|
|
|
|
## The structural finding
|
|
|
|
```
|
|
The longjmp-return callee at 0xBFC52984 is a one-line thunk:
|
|
void callee(int x) { /* $a0 = 2 from the outer caller */
|
|
helper(0x0F); /* JAL 0xBFC4D370, $a0=0x0F */
|
|
return;
|
|
}
|
|
The callee returns whatever helper(0x0F) returns:
|
|
$v0_post = 0xa000a8c8 (identical every pass — Ch217 caller table)
|
|
```
|
|
|
|
**The polled gate is NOT in `0xBFC52984..0xBFC52A04`.** Every
|
|
non-fetch memory read in that PC range is just the stack reload
|
|
of `$ra`. The thing the Ch215 treadmill is actually waiting on
|
|
must be one of:
|
|
|
|
1. **Inside `0xBFC4D370`** — the helper called with `$a0=0x0F`.
|
|
Returns `0xA000A8C8` every pass. If it polls anything, it's
|
|
one frame deeper than the autopsy currently sees.
|
|
2. **A side-effect of `0xBFC4D370`** that nothing in this scope
|
|
observes — e.g. a write into kernel memory the longjmp restore
|
|
later reads. (Unlikely: Ch263 ruled out the scrubbed range,
|
|
and the outer caller's `$v0/$v1` reads are identical.)
|
|
3. **Outside the callee chain entirely** — the BIOS poll-and-jump
|
|
pattern is reading something that the longjmp keeps re-restoring,
|
|
so neither the callee nor its helper actually poll.
|
|
|
|
By inspection of the BIOS instruction at `0xBFC52990` →
|
|
`0xBFC4D370` with `$a0=0x0F`, the function is *very likely* one of:
|
|
- `_GetCop0` / `_SetCop0` (selector 0x0F) — these are well-known
|
|
PS2 BIOS syscall helpers in the `_SyscallHandler` block;
|
|
- A `ConfigSet`/`GetGsHParam`-style accessor;
|
|
- A `_CdInit` / `_SifCmdInit` style init that consumes a kernel-global.
|
|
|
|
Confirming this requires looking at `0xBFC4D370`'s own body —
|
|
which is Ch265's job.
|
|
|
|
## Where this leaves the search
|
|
|
|
The structural map after Ch264:
|
|
|
|
| Layer | What's there | Reads anything? |
|
|
|------------------------------------|----------------------------------------------------|------------------|
|
|
| `0xBFC52340..60` (Ch217 trampoline) | beq + nops + JAL | No data reads |
|
|
| `0xBFC52984..A04` (Ch264 callee body) | save/restore $ra + one JAL to helper | Only `$sp+0x14` (own $ra) |
|
|
| `0xBFC4D370..?` (helper, Ch265 target) | unknown | **TO BE DETERMINED** |
|
|
|
|
The Ch263 finding (BIOS scrubs `0x80030000-3FF0` every pass) plus
|
|
the Ch264 finding (callee body has no polled reads) together
|
|
narrow the search dramatically: whatever the BIOS gate is reading
|
|
to compute its identical `$v0=0xa000a8c8` every pass, **the
|
|
read happens inside `0xBFC4D370` or below**, and the gate state
|
|
(if it lives in EE RAM) lives in a region NOT covered by the
|
|
`0x80030000-3FF0` scrub.
|
|
|
|
## Recommendation for Ch265
|
|
|
|
**Re-aim the autopsy at the next frame.**
|
|
|
|
The Ch264 observer infrastructure is reusable — bump the PC
|
|
window. The helper `0xBFC4D370` itself starts with `addiu
|
|
$sp,$sp,-NN; sw $ra,...; ...` (standard MIPS prologue), so its
|
|
extent can be bounded by walking the BIOS dump to the next `jr
|
|
$ra; addiu $sp,$sp,NN` or by reading the prologue/epilogue
|
|
delta directly. A first cut: `0xBFC4D370..0xBFC4D470` (256 bytes
|
|
= 64 instructions, generous upper bound).
|
|
|
|
The verdict logic can stay the same. The expected outcomes are
|
|
identical to Ch264:
|
|
|
|
- `callee_no_data_reads` → helper computes from registers only.
|
|
In that case Ch266 has to look at what populates those registers
|
|
(`$a0=0x0F` is set by the caller; what about other inputs?).
|
|
- `callee_static_mmio_gate_found` → **HIT.** That's the polled
|
|
device, and Ch266 models it.
|
|
- `callee_static_ram_gate_found` → **HIT.** Some EE RAM location
|
|
outside the scrubbed range is being read every pass; Ch266
|
|
models what writes there.
|
|
- `callee_reads_vary_but_flow_static` → another thunk-layer.
|
|
Recurse: Ch266 autopsies whatever JAL the helper makes.
|
|
|
|
## Files changed
|
|
|
|
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
|
|
`\`ifdef CH264_CALLEE_AUTOPSY` block (capture arrays,
|
|
combinational predicates, `always_ff` capture, region-name task,
|
|
`ch264_print_autopsy` task with verdict logic). Added two
|
|
`ch264_print_autopsy()` call sites (halt path + timeout path),
|
|
each gated by the same ifdef.
|
|
- `sim/Makefile` — new `tb_ee_core_bios_long_callee_autopsy`
|
|
target (`-DCH264_CALLEE_AUTOPSY` only — no Ch262/Ch263 needed
|
|
for this observer).
|
|
|
|
## iverilog 12 gotcha avoided
|
|
|
|
The first compile attempt used `return;` to early-exit the
|
|
`n == 0` case in `ch264_print_autopsy`. iverilog 12 rejects
|
|
`return` inside `task`. Rewrote as `if (n==0) ... else begin
|
|
...full body... end`. Same logic, no early return. Worth a note
|
|
because future autopsy-style tasks will probably hit this
|
|
again.
|
|
|
|
## Regression
|
|
|
|
Full regression: 157 / 157 with the new target off by default
|
|
(`CH264_CALLEE_AUTOPSY` undefined for routine builds).
|
|
|
|
Standing by for Codex's Ch265 call. Recommendation: aim the
|
|
existing observer at `0xBFC4D370` and recompile. No new RTL,
|
|
no new TB scaffolding — just a parameter bump.
|