Files
retroDE_ps2/docs/ch268_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

140 lines
6.2 KiB
Markdown

# Ch268 closeout — outer caller body emits ZERO non-fetch reads
**Status:** Closed. The widened read autopsy across the
longjmp-return OUTER CALLER body (PC `0xBFC52340..0xBFC52400`)
captured **zero** non-fetch reads in the entire BIOS-long run.
**Verdict:** `outer_no_reads`.
By inspection of the Ch217 outer-caller dump, this is not a
bug — the body really doesn't issue any loads:
```
0xBFC52350: beq $v0, $0, +0xC ; conditional branch ← THE DECISION
0xBFC52354: nop
0xBFC52358: jal <Ch264 callee>
0xBFC5235C: addiu $a0, $0, 0x385
0xBFC52360: jal <helper directly>
0xBFC52364: addiu $a0, $0, 0x07
0xBFC52368: jal <handler3>
0xBFC5236C: nop
0xBFC52370: jal <handler4>
0xBFC52374: addiu $a0, $0, 0x08
0xBFC52378: lui $v0, 0x1F80
0xBFC5237C: ori $v0, $v0, 0x1070
0xBFC52380: sw $0, 4($v0) ; W I_MASK
0xBFC52384: jal <handler5>
0xBFC52388: sw $0, 0($v0) ; W I_STAT
0xBFC5238C: lui $a0, 0xBFC6
```
No `lw`/`lb`/`lh` anywhere. Only `beq`, `nop`, `jal`, `addiu`,
`lui`, `ori`, `sw`. The outer caller body is **entirely
made of control-flow + immediate compute + JALs + writes** —
no memory reads to gate on.
## What this means
The BEQ at `0xBFC52350` is testing `$v0 == 0`. Per Ch217:
**`$v0_pre = 0x00000001` every Ch217 pass** — i.e. the
condition `$v0 != 0` always holds, the branch is never taken,
and the JAL chain always runs.
**The actual gate is whatever sets `$v0` BEFORE PC=`0xBFC52350`.**
Crucially, this means:
- The gate is **outside the autopsy window we just scanned**.
- The gate is the instruction (or sequence) that computes
`$v0` before the BEQ — almost certainly a load from
somewhere, or a function return that propagates a memory
read upward.
- If something could set `$v0 = 0` between Ch217 passes, the
BEQ would TAKE, BIOS would skip the entire JAL chain (and
the post-chain INTC clears), and execution would diverge —
i.e. the treadmill would break.
## Codex Ch268 acceptance — line-by-line
| Codex requirement | Status | Where |
|----------------------------------------------------------------------------|--------|-------|
| Observe 0xBFC52340..0xBFC52400 | ✅ | `CH268_OUTER_LO/HI` |
| Capture non-fetch data reads only | ✅ | EV_READ + `!is_fetch` predicate |
| Bucket by EA AND alias-normalized phys | ✅ | `ch268_phys[i] = ee_map_ev_arg0[28:0]`; dedup keyed on phys |
| Per-bucket: hits, PCs, per-pass values, data-varies, region | ✅ | DISTINCT_PHYS_EAs report (would have fired with non-zero captures) |
| Pass index isolated (pass 0 vs 1..8) | ✅ | `pass=` column + gate logic excludes pass 0 |
| Ignore stack reads + saved-register reloads | ✅ | `ch268_ea_is_stack()` using $sp captured at JAL site |
| 5-way verdict | ✅ | outer_static_{ram,mmio}_gate_found / only_stack / no_reads / vary |
| Regression unaffected | ✅ | 157 / 157 with target off-by-default |
| Don't jump to INTC semantics yet | ✅ | Did not touch INTC stub or jump to assumptions |
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
`\`ifdef CH268_OUTER_READ_AUTOPSY` block. Captures: per-event
($pass/PC/EA/phys/data/region); per-pass $sp (so the stack
filter can be per-pass-accurate). Print task with: stream,
alias-normalized bucketing, per-bucket PC tracker (up to 4),
per-bucket per-pass value table, alias-mask, 5-way verdict.
Two `ch268_print_autopsy()` call sites (halt + timeout exits).
- `sim/Makefile` — new `tb_ee_core_bios_long_outer_read_autopsy`
target (only `-DCH268_OUTER_READ_AUTOPSY`).
## iverilog 12 quirks hit
None new. Used flat 1D arrays (with `bucket*SLOTS+k` indexing)
to avoid 2D-unpacked-array surprises. Same pattern that
Ch264/265/266/267 used. Clean first-try compile.
## Recommendation for Ch269
**Trace back to where `$v0` gets set BEFORE the BEQ.**
The autopsy framework worked exactly as designed — it
correctly reported zero reads, because there genuinely are
zero reads in the scanned window. The structural lesson is
that the gate is upstream of `0xBFC52350`.
**Three concrete next steps, in order of cheapest:**
**(A) Widen the PC window backwards.** Re-run Ch268 with
`CH268_OUTER_LO = 0xBFC52300` (or `0xBFC52280`) to cover the
predecessor block of the BEQ. The instruction sequence
leading INTO `0xBFC52350` almost certainly includes the load
or compute that produces the `$v0=1` value. Same observer,
zero changes other than the PC window. Cheap.
**(B) Track all writes to `$v0` (regfile[2]) inside the
treadmill.** Add a tap on `u_core.regfile[2]` and log every
cycle it changes, with the retiring PC and `core_ev_valid`.
Filter to the treadmill window (post-Ch217-pass-0). The
last write to `$v0` BEFORE PC=`0xBFC52350` is the producer
we want to identify. Slightly more surgical than (A) but
needs more wiring.
**(C) Trace back from the function entry.** The function
containing `0xBFC52350` has an entry point somewhere
earlier — usually preceded by a JR/JALR/J that crossed into
it. Reading the BIOS dump near `0xBFC52340` and walking
backward to find the prologue (`addiu $sp,$sp,-N; sw $ra,...`)
identifies the function bounds; then Ch269 can autopsy the
whole function.
(A) is the highest-EV. If the predecessor block contains a
load, that's the gate. If it contains only register-to-register
moves, we need (B) or (C) to trace back further. Either way,
the search has narrowed dramatically — the gate is now a
well-bounded "find what set $v0 before 0xBFC52350" question.
**Standing by for Codex's Ch269 call.**
One subtle note: the BEQ is testing `$v0 == 0`. If we ever
find the producer and want to perturb it, setting `$v0 = 0`
between passes (e.g. by writing 0 to whatever memory the
producer reads) should break the treadmill. That's a clean
hypothesis test.
## Regression
Full regression: 157 / 157 with the new target off by default
(`CH268_OUTER_READ_AUTOPSY` undefined for routine builds).