Files
retroDE_ps2/docs/ch265_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

241 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch265 closeout — helper is ALSO a one-call thunk (to 0xBFC4F320); recurse once more
**Status:** Closed. New opt-in target
`tb_ee_core_bios_long_helper_autopsy` runs the BIOS-long flow with
the Ch264 observer pattern re-aimed at the helper body
`0xBFC4D370..0xBFC4D470`, plus two new tracks: (1) per-invocation
`$a0_in`/`$v0_post`/`$v1_post` snapshots on entry-and-return,
(2) JAL/J/JR/JALR retire log inside the helper with statically-
decoded targets and "LEAVES helper" annotations.
**Literal verdict the task emits:** `helper_static_ram_gate_found
(EA=0x801FFDE4 returns identical 0xBFC52998 across 8 hits —
region=EE_RAM)`.
**Structural verdict (visible in the stream + CF table):**
**`helper_is_thunk` — the helper is another one-call thunk, this
time to `0xBFC4F320`.** The literal label is a known false-positive
(see "Verdict-label nuance" below); the real polled gate is still
one frame deeper.
## Codex Ch265 acceptance — line-by-line
| Codex requirement | Status | Where |
|-------------------------------------------------------------------------|--------|--------------------------------------------------|
| Reuse Ch264 observer one frame deeper at 0xBFC4D370..0xBFC4D470 | ✅ | `CH265_HELPER_LO/HI` = `0xBFC4D370/D470` |
| Same region tagging and compact tables | ✅ | `ch265_region_name` task; same shape as Ch264 |
| Capture non-fetch data reads only | ✅ | Same `!ch265_is_fetch` predicate as Ch264 |
| Include calls/jumps out of the helper | ✅ | `HELPER_CONTROL_FLOW` table — J/JAL/JR/JALR retires inside helper, with statically-decoded J/JAL target and "LEAVES helper" notes |
| Track $a0=0x0F at entry and returned $v0 | ✅ | `HELPER_PASSES` table with `$a0_in`/`$ra_in`/`$v0_post`/`$v1_post` |
| Compare pass 0 versus steady-state passes 18 | ✅ | `pass=N` column in every table; trivial visual diff |
| Verdicts mirror Ch264 + helper_is_thunk | ✅ | 5-way verdict logic |
| No new side-effect stubs | ✅ | TB-only addition; no RTL touched |
| Regression unaffected | ✅ | 157 / 157 with target off-by-default |
## What the autopsy showed
### HELPER_PASSES (per-invocation entry/exit register snapshots)
The helper is called from many places, not just from the Ch264
callee. The first 7 invocations are pre-treadmill BIOS init with
varying `$a0_in` (0xF, 0xE, 0x1, 0x4, 0x5, 0x6, 0x7). The
treadmill itself (cycles 10.2M onward) shows a **deterministic
pair every Ch217 pass**:
```
[7] cyc=10194426 $a0_in=0x0F $ra_in=0xBFC52998 $v0_post=0xA000A8C8 $v1_post=0x00000008
[8] cyc=10194505 $a0_in=0x07 $ra_in=0xBFC52368 $v0_post=??? $v1_post=???
[9] cyc=20095076 $a0_in=0x0F $ra_in=0xBFC52998 $v0_post=0xA000A8C8 $v1_post=0x00000008
[10] cyc=20095155 $a0_in=0x07 $ra_in=0xBFC52368 $v0_post=??? $v1_post=???
...repeats every Ch217 pass...
```
Two callers, interleaved:
| Caller location | `$a0` | Return target |
|--------------------------|-------|----------------|
| Ch264 callee at 0xBFC52990 | 0x0F | 0xBFC52998 |
| Ch217 trampoline at 0xBFC52360 | 0x07 | 0xBFC52368 |
The `$a0=0x07` path's `$v0_post` is `x` because the exit predicate
was scoped only to "return-to-Ch264-callee" (PC=0xBFC52998).
Future autopsy refinement: also exit on PC=0xBFC52368 to capture
the other arm's $v0. Doesn't change the structural conclusion.
The `$a0=0x0F` path returns `$v0=0xA000A8C8` identically every
treadmill pass — that matches the Ch217 outer-caller's
`$v0_post=0xa000a8c8` exactly. Consistency check ✓.
### HELPER_CONTROL_FLOW (every JAL/J/JR retired inside helper)
```
pc=0xBFC4D380 instr=0x0FF13CC8 jal target=0xBFC4F320 <-- LEAVES helper
pc=0xBFC4D390 instr=0x03E00008 jr target=0x00000000
```
Repeated 47 times (every helper invocation hits this exact pair).
**The helper has exactly one JAL out, every time, to
`0xBFC4F320`.** No conditional branches, no other JALs, no JR
that isn't the function epilog. This is a one-call thunk by
structure.
### HELPER_BODY_DATA_READS (every non-fetch read inside helper)
23 reads captured. **All from the single PC `0xBFC4D388`**
which is the instruction immediately after the JAL's delay slot,
i.e. the saved-`$ra` reload (`lw $ra,N($sp)` in the standard
MIPS epilog).
Three distinct EAs, all in EE_RAM:
| EA | Hits | Pass mask | First data | data_varies | What it is |
|-------------|------|-----------|------------------|-------------|------------|
| 0x801FFEE4 | 2 | 0x0001 | 0xBFC528AC | yes | Pre-treadmill $sp's $ra slot (only during BIOS init) |
| 0x801FFDFC | 13 | 0x01FF | 0xBFC521C4..0xBFC52368 | yes | Ch217 trampoline's $sp+$ra-slot ($a0=0x07 caller) |
| 0x801FFDE4 | 8 | 0x01FE | 0xBFC52998 | **no** | Ch264 callee's $sp+$ra-slot ($a0=0x0F caller) — stable because that caller never changes |
Each helper invocation reads exactly one EA — the saved `$ra` at
its caller-determined stack frame. **There is no MMIO read. No
kernel-global read. No timer read. No non-stack read of any
kind.** The helper body is structurally the same shape as the
Ch264 callee: prologue → JAL → restore `$ra` from stack → JR.
## Verdict-label nuance — false-positive
The literal verdict `helper_static_ram_gate_found
(EA=0x801FFDE4 ... data=0xBFC52998)` is a **known
false-positive of the stable-EA heuristic**. The condition
"appears in ≥2 passes AND data doesn't vary" is satisfied
because the Ch264-callee-side caller path is itself stable
(every pass the helper is entered with the same `$ra=0xBFC52998`,
so the saved-$ra slot reload returns the same value).
But `0xBFC52998` is **exactly `$ra_in + 0`** for the Ch264-callee
caller — i.e. it's the return address that the helper itself
stashed on entry, not a polled state. Reading it back yields a
stable value because the caller doesn't change, **not** because
external state is settled.
The stack-only check (`abs(ea - first_ea) ≤ 0x40 && region=EE_RAM`)
didn't filter this out either — the helper is called from two
caller-paths with different `$sp` values 0x801FFDE4 and
0x801FFDFC, which are 0x18 apart but the all-three-EAs spread is
0x100 wide (because 0x801FFEE4 - 0x801FFDE4 = 0x100), exceeding
the 0x40 sibling threshold.
A more robust heuristic would discount any stable read whose
returned value equals the caller's `$ra_in` (i.e. detect saved-
$ra reloads explicitly). Not blocking — the control-flow table
makes the structural truth obvious without the heuristic. Future
Ch266+ autopsies can incorporate this filter.
## What this means for the search
After Ch263+Ch264+Ch265, the structural picture:
```
Ch217 trampoline 0xBFC52340..60
-> JAL 0xBFC52984 (Ch264 callee, $a0=2)
-> sw $ra,0x14($sp)
-> JAL 0xBFC4D370 (Ch265 helper, $a0=0x0F) ← thunk
-> sw $ra,N($sp)
-> JAL 0xBFC4F320 (Ch266 target) ← thunk to ???
-> ???
-> lw $ra,N($sp)
-> jr $ra
-> lw $ra,0x14($sp)
-> jr $ra
-> JAL 0xBFC4D370 again with $a0=0x07 (Ch217 post-call path)
same thunk to 0xBFC4F320
```
**Every layer so far has been a wrapper.** The actual work — the
polled-state lookup — has not yet appeared. It almost certainly
lives at or below `0xBFC4F320`.
The constant `$a0=0x0F` selector passing all the way through
`0xBFC52984` -> `0xBFC4D370` -> `0xBFC4F320` strongly suggests
this is a **selector-dispatched BIOS API**: something like
`GetXY(selector=0x0F)`. The Ch217 outer-caller also calls this
chain with `$a0=2`, and the Ch217 trampoline's second JAL goes
through with `$a0=0x07`. Different selectors, same dispatcher.
This is a classic PS2 BIOS pattern: a single entry point with a
selector argument.
`$v0=0xA000A8C8` is a kernel-space pointer (the kuseg of A0..
maps to physical RAM in the conventional `kseg0` shadow). That
return value being constant every pass is consistent with the
dispatch returning a **pointer to a stable kernel structure**,
which the longjmp-return caller then uses as a jump table base
or as a data source.
## Recommendation for Ch266
**Recurse one more frame, to `0xBFC4F320`.** Same observer
pattern, bump the PC window. Expected outcomes (in order of
likelihood, based on the chain so far):
1. **`helper_is_thunk` again** — `0xBFC4F320` is also a wrapper
to something deeper. Then Ch267 follows its JAL out.
2. **`helper_static_mmio_gate_found`** — `0xBFC4F320` reads from
some PS2 MMIO region (EE INTC, EE BIU, EE_MISC_MMIO, or
`0x1FA00000` which was the Ch263 deferred Pivot 2). That's
the gate. Ch267 models the device.
3. **`helper_static_ram_gate_found`** with a non-stack EA — a
kernel global in EE_RAM. Ch267 models what writes there.
Implementation notes for the autopsy itself:
- The verdict heuristic should add a saved-$ra filter: discount
any stable EA whose returned value equals the most-common
`$ra_in` for the same caller. Could be done in the autopsy
itself, or post-hoc by reading the stream. Note this in the
block.
- The `HELPER_PASSES` exit predicate (PC=0xBFC52998) was
scoped to the Ch264-callee return; the Ch217-trampoline
caller's return was missed. For Ch266 (assuming again a
single primary caller from the deeper helper), pick the
most-frequent caller's post-JAL PC and gate exit on that.
Alternatively widen exit: trigger on ANY retire whose PC is
outside the helper window and was reached from inside in the
immediately preceding cycle. Not critical.
- The `CH265_PASSES` cap of 16 is fine for 8 Ch217 passes ×
2 caller paths per pass = 16 invocations. For the next layer
bump to 32 to leave headroom.
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
`\`ifdef CH265_HELPER_AUTOPSY` block. New structure: data-read
capture (mirror of Ch264), `$a0/$ra/$v0/$v1` per-invocation
snapshots, control-flow capture with `peek_instr`-driven
opcode decode and J/JAL-target computation, region-name task,
`ch265_cf_mnemonic` function for prettier prints, full
5-way verdict logic. Two `ch265_print_autopsy()` call sites
(halt + timeout exits), both gated by the ifdef.
- `sim/Makefile` — new `tb_ee_core_bios_long_helper_autopsy`
target (only `-DCH265_HELPER_AUTOPSY`).
## iverilog 12 gotchas hit (and avoided)
1. **Bit-select on parenthesized function-result expression.**
First version had `{ (pc + 32'd4)[31:28], instr[25:0], 2'b00 }`
inside `ch265_jtarget`. Elaborated as "Malformed statement."
Fix: compute `dslot = pc + 32'd4` into a temp, then bit-select
`dslot[31:28]`. (Already documented in
[[project-self-driven-milestone]] — bit-select on function
return; same shape.)
2. **Wrong identifier names for trace_pkg constants.** First
version used bare `EV_READ` / `SUBSYS_MEM` / `ee_map_ev_kind`.
The right names are `trace_pkg::EV_READ` / `trace_pkg::SUBSYS_MEM` /
`ee_map_ev_event`. Easy to confirm by grepping existing Ch218
and Ch264 capture code.
## Regression
Full regression: 157 / 157 with the new target off by default
(`CH265_HELPER_AUTOPSY` undefined for routine builds).
Standing by for Codex's Ch266 call. Recommendation: recurse to
`0xBFC4F320`. Same observer infrastructure; bump the parameter.