ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
241 lines
11 KiB
Markdown
241 lines
11 KiB
Markdown
# Ch265 closeout — helper is ALSO a one-call thunk (to 0xBFC4F320); recurse once more
|
||
|
||
**Status:** Closed. New opt-in target
|
||
`tb_ee_core_bios_long_helper_autopsy` runs the BIOS-long flow with
|
||
the Ch264 observer pattern re-aimed at the helper body
|
||
`0xBFC4D370..0xBFC4D470`, plus two new tracks: (1) per-invocation
|
||
`$a0_in`/`$v0_post`/`$v1_post` snapshots on entry-and-return,
|
||
(2) JAL/J/JR/JALR retire log inside the helper with statically-
|
||
decoded targets and "LEAVES helper" annotations.
|
||
|
||
**Literal verdict the task emits:** `helper_static_ram_gate_found
|
||
(EA=0x801FFDE4 returns identical 0xBFC52998 across 8 hits —
|
||
region=EE_RAM)`.
|
||
|
||
**Structural verdict (visible in the stream + CF table):**
|
||
**`helper_is_thunk` — the helper is another one-call thunk, this
|
||
time to `0xBFC4F320`.** The literal label is a known false-positive
|
||
(see "Verdict-label nuance" below); the real polled gate is still
|
||
one frame deeper.
|
||
|
||
## Codex Ch265 acceptance — line-by-line
|
||
|
||
| Codex requirement | Status | Where |
|
||
|-------------------------------------------------------------------------|--------|--------------------------------------------------|
|
||
| Reuse Ch264 observer one frame deeper at 0xBFC4D370..0xBFC4D470 | ✅ | `CH265_HELPER_LO/HI` = `0xBFC4D370/D470` |
|
||
| Same region tagging and compact tables | ✅ | `ch265_region_name` task; same shape as Ch264 |
|
||
| Capture non-fetch data reads only | ✅ | Same `!ch265_is_fetch` predicate as Ch264 |
|
||
| Include calls/jumps out of the helper | ✅ | `HELPER_CONTROL_FLOW` table — J/JAL/JR/JALR retires inside helper, with statically-decoded J/JAL target and "LEAVES helper" notes |
|
||
| Track $a0=0x0F at entry and returned $v0 | ✅ | `HELPER_PASSES` table with `$a0_in`/`$ra_in`/`$v0_post`/`$v1_post` |
|
||
| Compare pass 0 versus steady-state passes 1–8 | ✅ | `pass=N` column in every table; trivial visual diff |
|
||
| Verdicts mirror Ch264 + helper_is_thunk | ✅ | 5-way verdict logic |
|
||
| No new side-effect stubs | ✅ | TB-only addition; no RTL touched |
|
||
| Regression unaffected | ✅ | 157 / 157 with target off-by-default |
|
||
|
||
## What the autopsy showed
|
||
|
||
### HELPER_PASSES (per-invocation entry/exit register snapshots)
|
||
|
||
The helper is called from many places, not just from the Ch264
|
||
callee. The first 7 invocations are pre-treadmill BIOS init with
|
||
varying `$a0_in` (0xF, 0xE, 0x1, 0x4, 0x5, 0x6, 0x7). The
|
||
treadmill itself (cycles 10.2M onward) shows a **deterministic
|
||
pair every Ch217 pass**:
|
||
|
||
```
|
||
[7] cyc=10194426 $a0_in=0x0F $ra_in=0xBFC52998 $v0_post=0xA000A8C8 $v1_post=0x00000008
|
||
[8] cyc=10194505 $a0_in=0x07 $ra_in=0xBFC52368 $v0_post=??? $v1_post=???
|
||
[9] cyc=20095076 $a0_in=0x0F $ra_in=0xBFC52998 $v0_post=0xA000A8C8 $v1_post=0x00000008
|
||
[10] cyc=20095155 $a0_in=0x07 $ra_in=0xBFC52368 $v0_post=??? $v1_post=???
|
||
...repeats every Ch217 pass...
|
||
```
|
||
|
||
Two callers, interleaved:
|
||
|
||
| Caller location | `$a0` | Return target |
|
||
|--------------------------|-------|----------------|
|
||
| Ch264 callee at 0xBFC52990 | 0x0F | 0xBFC52998 |
|
||
| Ch217 trampoline at 0xBFC52360 | 0x07 | 0xBFC52368 |
|
||
|
||
The `$a0=0x07` path's `$v0_post` is `x` because the exit predicate
|
||
was scoped only to "return-to-Ch264-callee" (PC=0xBFC52998).
|
||
Future autopsy refinement: also exit on PC=0xBFC52368 to capture
|
||
the other arm's $v0. Doesn't change the structural conclusion.
|
||
|
||
The `$a0=0x0F` path returns `$v0=0xA000A8C8` identically every
|
||
treadmill pass — that matches the Ch217 outer-caller's
|
||
`$v0_post=0xa000a8c8` exactly. Consistency check ✓.
|
||
|
||
### HELPER_CONTROL_FLOW (every JAL/J/JR retired inside helper)
|
||
|
||
```
|
||
pc=0xBFC4D380 instr=0x0FF13CC8 jal target=0xBFC4F320 <-- LEAVES helper
|
||
pc=0xBFC4D390 instr=0x03E00008 jr target=0x00000000
|
||
```
|
||
|
||
Repeated 47 times (every helper invocation hits this exact pair).
|
||
**The helper has exactly one JAL out, every time, to
|
||
`0xBFC4F320`.** No conditional branches, no other JALs, no JR
|
||
that isn't the function epilog. This is a one-call thunk by
|
||
structure.
|
||
|
||
### HELPER_BODY_DATA_READS (every non-fetch read inside helper)
|
||
|
||
23 reads captured. **All from the single PC `0xBFC4D388`** —
|
||
which is the instruction immediately after the JAL's delay slot,
|
||
i.e. the saved-`$ra` reload (`lw $ra,N($sp)` in the standard
|
||
MIPS epilog).
|
||
|
||
Three distinct EAs, all in EE_RAM:
|
||
|
||
| EA | Hits | Pass mask | First data | data_varies | What it is |
|
||
|-------------|------|-----------|------------------|-------------|------------|
|
||
| 0x801FFEE4 | 2 | 0x0001 | 0xBFC528AC | yes | Pre-treadmill $sp's $ra slot (only during BIOS init) |
|
||
| 0x801FFDFC | 13 | 0x01FF | 0xBFC521C4..0xBFC52368 | yes | Ch217 trampoline's $sp+$ra-slot ($a0=0x07 caller) |
|
||
| 0x801FFDE4 | 8 | 0x01FE | 0xBFC52998 | **no** | Ch264 callee's $sp+$ra-slot ($a0=0x0F caller) — stable because that caller never changes |
|
||
|
||
Each helper invocation reads exactly one EA — the saved `$ra` at
|
||
its caller-determined stack frame. **There is no MMIO read. No
|
||
kernel-global read. No timer read. No non-stack read of any
|
||
kind.** The helper body is structurally the same shape as the
|
||
Ch264 callee: prologue → JAL → restore `$ra` from stack → JR.
|
||
|
||
## Verdict-label nuance — false-positive
|
||
|
||
The literal verdict `helper_static_ram_gate_found
|
||
(EA=0x801FFDE4 ... data=0xBFC52998)` is a **known
|
||
false-positive of the stable-EA heuristic**. The condition
|
||
"appears in ≥2 passes AND data doesn't vary" is satisfied
|
||
because the Ch264-callee-side caller path is itself stable
|
||
(every pass the helper is entered with the same `$ra=0xBFC52998`,
|
||
so the saved-$ra slot reload returns the same value).
|
||
|
||
But `0xBFC52998` is **exactly `$ra_in + 0`** for the Ch264-callee
|
||
caller — i.e. it's the return address that the helper itself
|
||
stashed on entry, not a polled state. Reading it back yields a
|
||
stable value because the caller doesn't change, **not** because
|
||
external state is settled.
|
||
|
||
The stack-only check (`abs(ea - first_ea) ≤ 0x40 && region=EE_RAM`)
|
||
didn't filter this out either — the helper is called from two
|
||
caller-paths with different `$sp` values 0x801FFDE4 and
|
||
0x801FFDFC, which are 0x18 apart but the all-three-EAs spread is
|
||
0x100 wide (because 0x801FFEE4 - 0x801FFDE4 = 0x100), exceeding
|
||
the 0x40 sibling threshold.
|
||
|
||
A more robust heuristic would discount any stable read whose
|
||
returned value equals the caller's `$ra_in` (i.e. detect saved-
|
||
$ra reloads explicitly). Not blocking — the control-flow table
|
||
makes the structural truth obvious without the heuristic. Future
|
||
Ch266+ autopsies can incorporate this filter.
|
||
|
||
## What this means for the search
|
||
|
||
After Ch263+Ch264+Ch265, the structural picture:
|
||
|
||
```
|
||
Ch217 trampoline 0xBFC52340..60
|
||
-> JAL 0xBFC52984 (Ch264 callee, $a0=2)
|
||
-> sw $ra,0x14($sp)
|
||
-> JAL 0xBFC4D370 (Ch265 helper, $a0=0x0F) ← thunk
|
||
-> sw $ra,N($sp)
|
||
-> JAL 0xBFC4F320 (Ch266 target) ← thunk to ???
|
||
-> ???
|
||
-> lw $ra,N($sp)
|
||
-> jr $ra
|
||
-> lw $ra,0x14($sp)
|
||
-> jr $ra
|
||
-> JAL 0xBFC4D370 again with $a0=0x07 (Ch217 post-call path)
|
||
same thunk to 0xBFC4F320
|
||
```
|
||
|
||
**Every layer so far has been a wrapper.** The actual work — the
|
||
polled-state lookup — has not yet appeared. It almost certainly
|
||
lives at or below `0xBFC4F320`.
|
||
|
||
The constant `$a0=0x0F` selector passing all the way through
|
||
`0xBFC52984` -> `0xBFC4D370` -> `0xBFC4F320` strongly suggests
|
||
this is a **selector-dispatched BIOS API**: something like
|
||
`GetXY(selector=0x0F)`. The Ch217 outer-caller also calls this
|
||
chain with `$a0=2`, and the Ch217 trampoline's second JAL goes
|
||
through with `$a0=0x07`. Different selectors, same dispatcher.
|
||
This is a classic PS2 BIOS pattern: a single entry point with a
|
||
selector argument.
|
||
|
||
`$v0=0xA000A8C8` is a kernel-space pointer (the kuseg of A0..
|
||
maps to physical RAM in the conventional `kseg0` shadow). That
|
||
return value being constant every pass is consistent with the
|
||
dispatch returning a **pointer to a stable kernel structure**,
|
||
which the longjmp-return caller then uses as a jump table base
|
||
or as a data source.
|
||
|
||
## Recommendation for Ch266
|
||
|
||
**Recurse one more frame, to `0xBFC4F320`.** Same observer
|
||
pattern, bump the PC window. Expected outcomes (in order of
|
||
likelihood, based on the chain so far):
|
||
|
||
1. **`helper_is_thunk` again** — `0xBFC4F320` is also a wrapper
|
||
to something deeper. Then Ch267 follows its JAL out.
|
||
2. **`helper_static_mmio_gate_found`** — `0xBFC4F320` reads from
|
||
some PS2 MMIO region (EE INTC, EE BIU, EE_MISC_MMIO, or
|
||
`0x1FA00000` which was the Ch263 deferred Pivot 2). That's
|
||
the gate. Ch267 models the device.
|
||
3. **`helper_static_ram_gate_found`** with a non-stack EA — a
|
||
kernel global in EE_RAM. Ch267 models what writes there.
|
||
|
||
Implementation notes for the autopsy itself:
|
||
|
||
- The verdict heuristic should add a saved-$ra filter: discount
|
||
any stable EA whose returned value equals the most-common
|
||
`$ra_in` for the same caller. Could be done in the autopsy
|
||
itself, or post-hoc by reading the stream. Note this in the
|
||
block.
|
||
- The `HELPER_PASSES` exit predicate (PC=0xBFC52998) was
|
||
scoped to the Ch264-callee return; the Ch217-trampoline
|
||
caller's return was missed. For Ch266 (assuming again a
|
||
single primary caller from the deeper helper), pick the
|
||
most-frequent caller's post-JAL PC and gate exit on that.
|
||
Alternatively widen exit: trigger on ANY retire whose PC is
|
||
outside the helper window and was reached from inside in the
|
||
immediately preceding cycle. Not critical.
|
||
- The `CH265_PASSES` cap of 16 is fine for 8 Ch217 passes ×
|
||
2 caller paths per pass = 16 invocations. For the next layer
|
||
bump to 32 to leave headroom.
|
||
|
||
## Files changed
|
||
|
||
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
|
||
`\`ifdef CH265_HELPER_AUTOPSY` block. New structure: data-read
|
||
capture (mirror of Ch264), `$a0/$ra/$v0/$v1` per-invocation
|
||
snapshots, control-flow capture with `peek_instr`-driven
|
||
opcode decode and J/JAL-target computation, region-name task,
|
||
`ch265_cf_mnemonic` function for prettier prints, full
|
||
5-way verdict logic. Two `ch265_print_autopsy()` call sites
|
||
(halt + timeout exits), both gated by the ifdef.
|
||
- `sim/Makefile` — new `tb_ee_core_bios_long_helper_autopsy`
|
||
target (only `-DCH265_HELPER_AUTOPSY`).
|
||
|
||
## iverilog 12 gotchas hit (and avoided)
|
||
|
||
1. **Bit-select on parenthesized function-result expression.**
|
||
First version had `{ (pc + 32'd4)[31:28], instr[25:0], 2'b00 }`
|
||
inside `ch265_jtarget`. Elaborated as "Malformed statement."
|
||
Fix: compute `dslot = pc + 32'd4` into a temp, then bit-select
|
||
`dslot[31:28]`. (Already documented in
|
||
[[project-self-driven-milestone]] — bit-select on function
|
||
return; same shape.)
|
||
2. **Wrong identifier names for trace_pkg constants.** First
|
||
version used bare `EV_READ` / `SUBSYS_MEM` / `ee_map_ev_kind`.
|
||
The right names are `trace_pkg::EV_READ` / `trace_pkg::SUBSYS_MEM` /
|
||
`ee_map_ev_event`. Easy to confirm by grepping existing Ch218
|
||
and Ch264 capture code.
|
||
|
||
## Regression
|
||
|
||
Full regression: 157 / 157 with the new target off by default
|
||
(`CH265_HELPER_AUTOPSY` undefined for routine builds).
|
||
|
||
Standing by for Codex's Ch266 call. Recommendation: recurse to
|
||
`0xBFC4F320`. Same observer infrastructure; bump the parameter.
|