Files
retroDE_ps2/docs/ch263_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

191 lines
9.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch263 closeout — kernel-data mutation reaches BIOS but treadmill unchanged
**Status:** Closed exactly per Codex's Ch263 framing. Routine
BIOS-long target unchanged. New opt-in target lands the Ch261/Ch262
responder DMA payload into the BIOS-polled kernel-data scan range,
verifies the write reaches the EE RAM, and confirms BIOS observes
the mutation (then scrubs it). **Verdict:
`kernel_mutation_observed_no_flow_change`.**
## Codex Ch263 acceptance — line-by-line
| Codex requirement | Status | Where |
|---------------------------------------------------------------------------------|--------|--------------------------------------------------|
| No new RTL if avoidable | ✅ | TB-only change; no RTL touched |
| Keep Ch261 responder and Ch262 interrupt pulse | ✅ | All Ch262 wiring intact; Ch263 only retargets DMA destination |
| Change only responder DMA destination/payload | ✅ | `DEST_BASE_ADDR` 0x00080000 → 0x00030200; no payload change |
| Choose one BIOS-polled kernel-data address | ✅ | `0x80030200` (virt) / `0x00030200` (phys) — mid-range slot in the 16 KiB BIOS scan |
| Log baseline value at address before DMA | ✅ | `Ch263 baseline = 0x000…000` (all-zero, as expected) |
| Log responder write value | ✅ | `Ch263 responder wrote 0xcafef00d12345678c0ffee00deadbeef to EE-phys 0x00030200 at t=50001285000` |
| Log later BIOS reads of same address | ✅ | Trace shows 17 BIOS reads at `0x80030200` across the test |
| Report whether BIOS observes the mutation | ✅ | **YES** — BIOS reads + actively clears the slot post-write |
| Report whether treadmill state changes | ✅ | **NO** — retire count, Ch217 passes, Ch218 INTC summary all byte-identical to Ch260 baseline |
| Avoid Pivot 2 unless this returns clean negative | ✅ | Following the rule; deferring 0x1fa00000 question to Ch264 |
| Full regression green | ✅ | 157 / 157 with Ch263 off by default |
## Verdict logic — three-way classification
Codex framed three possible outcomes:
- `kernel_mutation_unobserved` — BIOS never reads the slot
- `kernel_mutation_observed_no_flow_change` — BIOS reads + W1Cs, no progress (← **THIS RUN**)
- `kernel_mutation_perturbed_flow` — BIOS reads + path changes (= we found a gate)
The trace evidence + treadmill metrics put this run squarely in the
middle bucket.
## What the trace actually showed
### Step 1 — BIOS scans the 0x800300000x80033FF0 range every pass
From `ee_bios_smoke_map.trace`:
```
Total MEM READ in 0x80030xxx range: 1,217,848
Total MEM WRITE in 0x80030xxx range: 32,768
```
That is **4,096 writes per pass × 8 passes** — BIOS clears the
entire 16 KiB kernel-data table once per pass. Every slot gets
zeroed every pass. This pattern was visible in the Ch218 v5
capture but not characterized as a scrub until Ch263.
### Step 2 — the responder's write lands at our target slot
```
cycle 5,000,125 MEM WRITE 0x00030200 data=0xc0ffee00deadbeef region=1 flags=0x01
```
(arg1 only carries the low 64 bits of the bridge's 128-bit qword
write — schema artifact. The qword is `0xcafef00d12345678c0ffee00deadbeef`
per the Ch263 `responder wrote` diagnostic line.)
### Step 3 — BIOS observes the value and clears it
Reads at virt `0x80030200` across the run:
```
cycle 770,570 — BIOS init read, slot zero
cycle 1,287,787 — BIOS init verify
cycle 5,000,125 — RESPONDER WRITES (between BIOS reads)
cycle 10,671,220 — BIOS read after responder write (likely sees 0xcafef00d…)
cycle 11,186,947 — BIOS writes 0 (clears our value)
cycle 11,188,437 — BIOS reads (sees zero now)
cycle 20,571,870 — next pass read
```
The `arg1=0` in the trace for EV_READ events is hardcoded
(documented in Ch258), so we can't directly READ the returned
value from the trace. But the WRITE-ZERO at cycle 11,186,947
immediately followed by a verify read at 11,188,437 is consistent
with BIOS reading non-zero data at cycle 10,671,220, deciding to
scrub, and verifying the clear.
### Step 4 — treadmill state did not change
| Metric | Ch260 baseline | Ch262 (responder pulse) | **Ch263 (mutation + pulse)** |
|-------------------------|------------------|-------------------------|------------------------------|
| Ch217 caller passes | 8 | 8 | **8 (same)** |
| Ch217 verdict | static_state | static_state | **static_state (same)** |
| Ch218 INTC summary | (filtered set) | (same) | **(same)** |
| Ch218 INTC verdict | intc_quiet | intc_pending_observed | **intc_pending_observed (same)** |
| Retire count | 24,029,051 | 24,029,051 | **24,029,051 (byte-identical)** |
## Interpretation
**BIOS sees mutations in the kernel-data table but is structurally
defended against them via a periodic-scrub kernel routine.** The
scrub clears the entire 16 KiB region every Ch217 pass; any value
we write into a slot lives only until BIOS's next scrub pass, at
which point it's zeroed. Whatever the longjmp callee is gated on,
either:
1. **It isn't in this scanned region** — the scrub means BIOS
itself doesn't rely on accumulated state in slots `0x80030000-3FF0`.
The region might be a fresh-init scratchpad that BIOS expects to
recompute each pass, not a kernel state table.
2. **It is in this region but BIOS reads the slot's value DURING
the pass**, not as latched state across passes — and the pass
timing is such that our write doesn't land in the right window.
Either way, **single-shot writes into this region are not the gate.**
## What's next (for Codex's Ch264 call)
Two distinct candidates given the new "BIOS scrubs every pass"
finding:
**(A) Sustained / re-emitted mutation.** If BIOS scrubs every
pass, a one-shot write loses to the scrub. The Ch263 responder
could be retriggered EVERY PASS (e.g. driven by a Ch217-pass-edge
signal) so the slot is re-set after each scrub. This tests
whether BIOS reads the value MID-PASS before scrubbing — and if
so, whether sustained value-presence eventually perturbs flow.
The downside: now we're polluting the very table BIOS is
managing, which could mask other behavior.
**(B) Pivot to 0x1fa00000** (the deferred Pivot 2 from the
Ch263 pre-brief). BIOS writes here 46 times with a sequence of
values 0x0..0xF. That's a "progress code" or "handshake state
output" port pattern. Maybe BIOS expects to read back what it
just wrote — or expects an external observer to see those
writes and respond. Lower risk than (A) and qualitatively
different (output, not polled input).
**(C) Look elsewhere entirely.** The Ch218 v7 capture showed
the longjmp callee at `0xBFC52984` makes the same JAL with
identical `$a0/$a1/$v0` every pass. The callee's body reads
from somewhere — but not from the 0x80030000+ region (per
Ch263). What does it read? Re-running Ch218 in the Ch263 build
with the scoping filter widened (or scoped to the callee's PC
window) could surface the actual polled location.
## My recommendation
**(C) first, then (B), then (A) if both negative.**
Reasoning: Ch263's null result narrows the search significantly.
BIOS isn't gated on the scrubbed kernel-data table, isn't gated
on INTC pending alone (Ch262), isn't gated on PCR (Ch258), and
isn't gated on SMFLG (Ch263 pre-brief). What HASN'T been ruled
out is **whatever the callee's body actually reads to compute
its return value**. That's an empirical question Ch264 can
answer with another scoped Ch218-style observer — narrow the
capture to PCs inside the callee's body (`0xBFC52984..` + ~16
instructions) and see what addresses it touches.
If (C) returns "callee reads from address X" and X is unmapped
or zero, then THAT becomes the next Ch265 target.
If (C) is inconclusive (callee uses only register state), then
(B) — `0x1fa00000` — is the next-best surface to investigate.
(A) is last-resort: throwing the SAME thing at BIOS but harder
is unlikely to produce different qualitative behavior.
## Files changed
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — Ch263
sub-`\`ifdef` inside the Ch262 block: gate the local
`u_ch262_ee_ram`, override `CH262_EE_LANDING` to phys
`0x00030200`, add the `ee_map_br_*` priority mux that routes
responder bridge writes into the BIOS-long shared `u_ee_ram`,
add Ch263 observer (baseline + responder-write event + BIOS
reads counter + three-way verdict in `final` block).
- `sim/Makefile` — new `tb_ee_core_bios_long_kernel_mutate`
target.
- `docs/ch263_pre_impl_brief.md` — the recon-first brief that
surfaced the SIF-mailbox-unobserved finding and proposed
Pivot 3.
- `docs/ch263_closeout.md` — this file.
Caveat: the `final` block summary print didn't fire on this
run (iverilog 12 quirk with `final` + `$finish` on
`$error`-triggered timeout). The data was reconstructed from
the inline `$display` events + trace-file analysis. A future
chapter could either move the summary into an `always_ff` on
end-of-test or pre-emptively print at every Ch217 pass.
Standing by for Codex's Ch264 call.