Files
retroDE_ps2/docs/ch267_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

209 lines
9.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch267 closeout — `0xA000A8C8` is NOT the polled gate. The chain just clears it; nothing reads it.
**Status:** Closed. Phase 1 passive observation **rules out**
`0xA000A8C8` as a polled gate.
**Verdict:** `gate_only_cleared_never_polled`.
**Headline counts** across the entire BIOS-long run (93 accesses
to phys `0x000A8C8`, all kseg1 alias):
| Role | Count |
|--------------------|-------|
| clearer(dispatcher) | 69 (3 SWs × 23 dispatcher invocations) |
| clearer(other) | 24 (1 init-time + 23 helper-frame writes) |
| writer(non-zero) | **0** |
| poller(read) | **0** |
**Action per Codex's gate:** Do **NOT** proceed to Phase 2
(`0xA000A8C8` poke). The address is a *write target*, not a
polled value. The treadmill must be gating on something else.
## Codex Ch267 Phase 1 acceptance — line-by-line
| Codex requirement | Status | Where |
|-------------------------------------------------------------------------------------|--------|-------|
| Key on phys 0x0000A8C8, accept all three kseg/kuseg aliases | ✅ | `CH267_PHYS_TARGET = 29'h000_A8C8` (matches low 29 bits of EA) |
| Capture every EE map access to that word | ✅ | `ch267_*` arrays, cap=1024 |
| Classify each as clearer / writer / poller | ✅ | `ch267_role_name` task |
| Distinguish dispatcher clearer (PC in 0xBFC4F320..F520) vs other | ✅ | `ch267_in_disp` field |
| Log PC, access type, value, pass index, pre/post-clear | ✅ | full stream output |
| Suppress dispatcher clears beyond first-per-pass | ✅ | `dc_per_pass[]` filter (kept the first, counted+suppressed the rest) |
| 5-way verdict labels | ✅ | gate_alias_mismatch / gate_nonzero_writer_found / gate_polled_zero_no_writer / gate_only_cleared_never_polled / gate_no_traffic_at_all |
| Regression unaffected | ✅ | 157 / 157 with target off-by-default |
## What the stream actually showed
### One previously-unknown init-time clearer
The very first access to `0xA000A8C8` happens at **cyc=54566**
(deep BIOS init, pre-treadmill) from **PC=0xBFC4B83C**:
```
[0] cyc=54566 pass=0 CLEARER(other) pc=0xbfc4b83c ea=0xa000a8c8(kseg1) data=0x00000000 post_clear=0
```
This is the *first* zeroing of `0xA000A8C8` — before the Ch266
dispatcher ever runs. The PC is far from the dispatcher chain;
it's somewhere in early kernel init. Not a smoking gun
because it writes zero like the dispatcher does, but worth
naming so future autopsies don't think it's mysterious.
### The "other" clearer pattern in the helper
24 captures at **PC=0xBFC4D388** (inside the Ch265 helper, the
instruction right after the helper's JAL out to the dispatcher)
also write zero to `0xA000A8C8`.
This is a **trace-timing artefact**, not a separate writer.
The Ch266 dispatcher's JAL `0xBFC4F334 → jr $ra` has a delay
slot at `0xBFC4F338`; if the delay slot is `sw $0, OFF($base)`,
that write retires while `core_pc` is *one cycle ahead*,
already showing `0xBFC4D388` (the helper's post-JAL instruction).
So Ch266 attributed three writes to PCs F32C/F330/F334 inside
the dispatcher, but the third write was actually F338 (the
JR delay slot), reported with PC=0xBFC4D388 because `core_pc`
sampling is one cycle late on memory events.
Confirmation: every "other" clearer at 0xBFC4D388 fires
*immediately after* a `CLEARER(disp)` from `0xBFC4F32C`
(see cyc=67019→67034, 67131, 68243 — 15-cycle gap between
the dispatcher write and the "helper" write, matching the
JR + delay-slot + pipeline-bubble timing). Three writes per
dispatcher call, distributed across what looks like two PCs
because of the same one-cycle skew the Ch266 closeout noted.
(Same skew explanation applies to PC=0xBFC4F334 in Ch266's
output — it was actually the JR delay slot's write at F338,
not a write from the JR itself.)
**Net:** there's still one writer (the dispatcher), three SWs
per call. The autopsy just gave us a clearer picture of which
PCs the writes are really attributed to.
### Zero pollers, zero non-zero writers — the gate is elsewhere
The crucial counts:
```
writer(non-zero) = 0
poller(read) = 0
```
**No read of `0xA000A8C8` happens anywhere in the model during
the BIOS-long run.** Combined with the disassembly of the
Ch217 outer-caller post-chain:
```
0xbfc52378: lui $v0, 0x1f80 ; <- clobbers $v0=0xA000A8C8
0xbfc5237c: ori $v0, $v0, 0x1070 ; $v0 now = 0x1F801070
0xbfc52380: sw $0, 4($v0) ; write 0 to I_MASK
0xbfc52384: jal <next-handler>
0xbfc52388: sw $0, 0($v0) ; write 0 to I_STAT (W1C ack)
```
…the outer caller **discards** `$v0=0xA000A8C8` immediately
after the chain returns and rebuilds it as `0x1F801070`
(IOP INTC I_STAT). The `0xA000A8C8` pointer is never used as
a polled value, never used as a data pointer, never used at
all by the outer caller.
The chain's job appears to be **pure side-effect** — clearing
the kernel struct at `0xA000A8C8` and updating internal
selector-keyed state via the helper (`$v1` return values were
selector-dependent). The chain's `$v0` is computed but
discarded.
## What this means for the search
**The polled gate is not at `0xA000A8C8`.** Ch263Ch266 narrowed
the search to "the longjmp-return chain's effect," and Ch267
shows that effect is *not* a polled value at 0xA000A8C8 itself.
Possible relocations for "where the gate actually lives":
1. **One of the INTC writes the outer caller does immediately
after the chain.** `0xBFC52380: sw $0, 4($v0)` writes 0 to
I_MASK; `0xBFC52388: sw $0, 0($v0)` does W1C on I_STAT.
Both happen *every* Ch217 pass. Could the treadmill be
gated on the I_STAT value AFTER the W1C? If a "ready bit"
needed to remain set across the W1C, our INTC model might
be eating it.
2. **Elsewhere in the loop body the autopsies haven't covered.**
The Ch217 caller dump only shows PCs 0xBFC52340..0xBFC5238C
— the area *immediately* around the JAL. The treadmill
itself is longer; the polled state might be read further
along (post-W1C, post-RFE) before the exception loops back.
3. **A COP0 register, not memory.** The treadmill involves an
RFE; COP0 Status/Cause/EPC reads aren't in EE_MAP and
wouldn't show up in our existing autopsies. A re-poll of
Status.IE or Cause.IP between passes could be the gate.
## Recommendation for Ch268
**Pivot away from `0xA000A8C8` entirely.** Three concrete
follow-ups, in order of cheapest-first:
**(A) Widen Ch267 to scan ALL read EAs in the treadmill
window.** Instead of keying on one EA, capture every
non-fetch READ across a wider PC window — say the Ch217
caller body `0xBFC52340..0xBFC52400`. Bucket reads by EA and
diff pass 1 vs pass 8. Any EA that BIOS reads every pass and
whose value is "the same" deserves the polled-gate label.
Cheap to implement — copy the Ch266 capture, widen the PC,
drop the write capture, add per-pass diff bookkeeping.
**(B) Capture the immediate post-chain INTC writes.** Profile
the W1C cadence at I_STAT (0x1F801070) and I_MASK
(0x1F801074) across passes. If our INTC stub's behavior on
those writes differs from what BIOS expects, the treadmill
could be gating on I_STAT's residual after W1C.
**(C) Observe COP0 reads.** Add a minimal COP0 access logger
to ee_core_stub. Look for any read of Status/Cause/EPC that
returns the same value every pass — that's a candidate for a
"this would have changed on a real PS2" gate.
(A) is the highest-EV next step — it directly searches for
the gate without committing to a guess. (B) is the
second-highest-EV because we have a smoking gun pointing at
INTC (selector 0x05 → `$v1=0x1F801070`). (C) is the
fallback if (A) and (B) both come up empty.
**Do NOT proceed to Phase 2** (TB-poke of 0xA000A8C8). The
Ch267 result rules out 0xA000A8C8 as the gate, so poking it
would just confirm that — and possibly confuse the
dispatcher's internal selector-state tracking.
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — added
`\`ifdef CH267_GATE_OBSERVER` block. Single capture (R+W
for any EA matching phys 0x000A8C8 across aliases), with
per-event PC/value/role/post-clear tags. Stream-suppression
for dispatcher clears beyond first-per-pass. SUMMARY block
with alias breakdown + role counts. 5-way verdict logic
with alias-mismatch detection. Two call sites
(`ch267_print_observer()`) in halt + timeout exits.
- `sim/Makefile` — new `tb_ee_core_bios_long_gate_observer`
target (only `-DCH267_GATE_OBSERVER`).
## iverilog 12 quirks hit
None new. Wrote with the Ch264/265/266 patterns in mind
(no `return` from task; no bit-select on parenthesized expr;
`trace_pkg::` namespace). Clean first-try compile.
## Regression
Full regression: 157 / 157 with the new target off by default
(`CH267_GATE_OBSERVER` undefined for routine builds).
Standing by for Codex's Ch268 call. Recommendation: (A) —
wider PC-window read autopsy across the Ch217 caller body,
to find what EA the treadmill actually polls. The Ch266
infrastructure is reusable; just widen the PC window and
drop the write capture.