ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
191 lines
9.5 KiB
Markdown
191 lines
9.5 KiB
Markdown
# Ch263 closeout — kernel-data mutation reaches BIOS but treadmill unchanged
|
||
|
||
**Status:** Closed exactly per Codex's Ch263 framing. Routine
|
||
BIOS-long target unchanged. New opt-in target lands the Ch261/Ch262
|
||
responder DMA payload into the BIOS-polled kernel-data scan range,
|
||
verifies the write reaches the EE RAM, and confirms BIOS observes
|
||
the mutation (then scrubs it). **Verdict:
|
||
`kernel_mutation_observed_no_flow_change`.**
|
||
|
||
## Codex Ch263 acceptance — line-by-line
|
||
|
||
| Codex requirement | Status | Where |
|
||
|---------------------------------------------------------------------------------|--------|--------------------------------------------------|
|
||
| No new RTL if avoidable | ✅ | TB-only change; no RTL touched |
|
||
| Keep Ch261 responder and Ch262 interrupt pulse | ✅ | All Ch262 wiring intact; Ch263 only retargets DMA destination |
|
||
| Change only responder DMA destination/payload | ✅ | `DEST_BASE_ADDR` 0x00080000 → 0x00030200; no payload change |
|
||
| Choose one BIOS-polled kernel-data address | ✅ | `0x80030200` (virt) / `0x00030200` (phys) — mid-range slot in the 16 KiB BIOS scan |
|
||
| Log baseline value at address before DMA | ✅ | `Ch263 baseline = 0x000…000` (all-zero, as expected) |
|
||
| Log responder write value | ✅ | `Ch263 responder wrote 0xcafef00d12345678c0ffee00deadbeef to EE-phys 0x00030200 at t=50001285000` |
|
||
| Log later BIOS reads of same address | ✅ | Trace shows 17 BIOS reads at `0x80030200` across the test |
|
||
| Report whether BIOS observes the mutation | ✅ | **YES** — BIOS reads + actively clears the slot post-write |
|
||
| Report whether treadmill state changes | ✅ | **NO** — retire count, Ch217 passes, Ch218 INTC summary all byte-identical to Ch260 baseline |
|
||
| Avoid Pivot 2 unless this returns clean negative | ✅ | Following the rule; deferring 0x1fa00000 question to Ch264 |
|
||
| Full regression green | ✅ | 157 / 157 with Ch263 off by default |
|
||
|
||
## Verdict logic — three-way classification
|
||
|
||
Codex framed three possible outcomes:
|
||
|
||
- `kernel_mutation_unobserved` — BIOS never reads the slot
|
||
- `kernel_mutation_observed_no_flow_change` — BIOS reads + W1Cs, no progress (← **THIS RUN**)
|
||
- `kernel_mutation_perturbed_flow` — BIOS reads + path changes (= we found a gate)
|
||
|
||
The trace evidence + treadmill metrics put this run squarely in the
|
||
middle bucket.
|
||
|
||
## What the trace actually showed
|
||
|
||
### Step 1 — BIOS scans the 0x80030000–0x80033FF0 range every pass
|
||
|
||
From `ee_bios_smoke_map.trace`:
|
||
|
||
```
|
||
Total MEM READ in 0x80030xxx range: 1,217,848
|
||
Total MEM WRITE in 0x80030xxx range: 32,768
|
||
```
|
||
|
||
That is **4,096 writes per pass × 8 passes** — BIOS clears the
|
||
entire 16 KiB kernel-data table once per pass. Every slot gets
|
||
zeroed every pass. This pattern was visible in the Ch218 v5
|
||
capture but not characterized as a scrub until Ch263.
|
||
|
||
### Step 2 — the responder's write lands at our target slot
|
||
|
||
```
|
||
cycle 5,000,125 MEM WRITE 0x00030200 data=0xc0ffee00deadbeef region=1 flags=0x01
|
||
```
|
||
|
||
(arg1 only carries the low 64 bits of the bridge's 128-bit qword
|
||
write — schema artifact. The qword is `0xcafef00d12345678c0ffee00deadbeef`
|
||
per the Ch263 `responder wrote` diagnostic line.)
|
||
|
||
### Step 3 — BIOS observes the value and clears it
|
||
|
||
Reads at virt `0x80030200` across the run:
|
||
|
||
```
|
||
cycle 770,570 — BIOS init read, slot zero
|
||
cycle 1,287,787 — BIOS init verify
|
||
cycle 5,000,125 — RESPONDER WRITES (between BIOS reads)
|
||
cycle 10,671,220 — BIOS read after responder write (likely sees 0xcafef00d…)
|
||
cycle 11,186,947 — BIOS writes 0 (clears our value)
|
||
cycle 11,188,437 — BIOS reads (sees zero now)
|
||
cycle 20,571,870 — next pass read
|
||
…
|
||
```
|
||
|
||
The `arg1=0` in the trace for EV_READ events is hardcoded
|
||
(documented in Ch258), so we can't directly READ the returned
|
||
value from the trace. But the WRITE-ZERO at cycle 11,186,947
|
||
immediately followed by a verify read at 11,188,437 is consistent
|
||
with BIOS reading non-zero data at cycle 10,671,220, deciding to
|
||
scrub, and verifying the clear.
|
||
|
||
### Step 4 — treadmill state did not change
|
||
|
||
| Metric | Ch260 baseline | Ch262 (responder pulse) | **Ch263 (mutation + pulse)** |
|
||
|-------------------------|------------------|-------------------------|------------------------------|
|
||
| Ch217 caller passes | 8 | 8 | **8 (same)** |
|
||
| Ch217 verdict | static_state | static_state | **static_state (same)** |
|
||
| Ch218 INTC summary | (filtered set) | (same) | **(same)** |
|
||
| Ch218 INTC verdict | intc_quiet | intc_pending_observed | **intc_pending_observed (same)** |
|
||
| Retire count | 24,029,051 | 24,029,051 | **24,029,051 (byte-identical)** |
|
||
|
||
## Interpretation
|
||
|
||
**BIOS sees mutations in the kernel-data table but is structurally
|
||
defended against them via a periodic-scrub kernel routine.** The
|
||
scrub clears the entire 16 KiB region every Ch217 pass; any value
|
||
we write into a slot lives only until BIOS's next scrub pass, at
|
||
which point it's zeroed. Whatever the longjmp callee is gated on,
|
||
either:
|
||
|
||
1. **It isn't in this scanned region** — the scrub means BIOS
|
||
itself doesn't rely on accumulated state in slots `0x80030000-3FF0`.
|
||
The region might be a fresh-init scratchpad that BIOS expects to
|
||
recompute each pass, not a kernel state table.
|
||
2. **It is in this region but BIOS reads the slot's value DURING
|
||
the pass**, not as latched state across passes — and the pass
|
||
timing is such that our write doesn't land in the right window.
|
||
|
||
Either way, **single-shot writes into this region are not the gate.**
|
||
|
||
## What's next (for Codex's Ch264 call)
|
||
|
||
Two distinct candidates given the new "BIOS scrubs every pass"
|
||
finding:
|
||
|
||
**(A) Sustained / re-emitted mutation.** If BIOS scrubs every
|
||
pass, a one-shot write loses to the scrub. The Ch263 responder
|
||
could be retriggered EVERY PASS (e.g. driven by a Ch217-pass-edge
|
||
signal) so the slot is re-set after each scrub. This tests
|
||
whether BIOS reads the value MID-PASS before scrubbing — and if
|
||
so, whether sustained value-presence eventually perturbs flow.
|
||
The downside: now we're polluting the very table BIOS is
|
||
managing, which could mask other behavior.
|
||
|
||
**(B) Pivot to 0x1fa00000** (the deferred Pivot 2 from the
|
||
Ch263 pre-brief). BIOS writes here 46 times with a sequence of
|
||
values 0x0..0xF. That's a "progress code" or "handshake state
|
||
output" port pattern. Maybe BIOS expects to read back what it
|
||
just wrote — or expects an external observer to see those
|
||
writes and respond. Lower risk than (A) and qualitatively
|
||
different (output, not polled input).
|
||
|
||
**(C) Look elsewhere entirely.** The Ch218 v7 capture showed
|
||
the longjmp callee at `0xBFC52984` makes the same JAL with
|
||
identical `$a0/$a1/$v0` every pass. The callee's body reads
|
||
from somewhere — but not from the 0x80030000+ region (per
|
||
Ch263). What does it read? Re-running Ch218 in the Ch263 build
|
||
with the scoping filter widened (or scoped to the callee's PC
|
||
window) could surface the actual polled location.
|
||
|
||
## My recommendation
|
||
|
||
**(C) first, then (B), then (A) if both negative.**
|
||
|
||
Reasoning: Ch263's null result narrows the search significantly.
|
||
BIOS isn't gated on the scrubbed kernel-data table, isn't gated
|
||
on INTC pending alone (Ch262), isn't gated on PCR (Ch258), and
|
||
isn't gated on SMFLG (Ch263 pre-brief). What HASN'T been ruled
|
||
out is **whatever the callee's body actually reads to compute
|
||
its return value**. That's an empirical question Ch264 can
|
||
answer with another scoped Ch218-style observer — narrow the
|
||
capture to PCs inside the callee's body (`0xBFC52984..` + ~16
|
||
instructions) and see what addresses it touches.
|
||
|
||
If (C) returns "callee reads from address X" and X is unmapped
|
||
or zero, then THAT becomes the next Ch265 target.
|
||
|
||
If (C) is inconclusive (callee uses only register state), then
|
||
(B) — `0x1fa00000` — is the next-best surface to investigate.
|
||
|
||
(A) is last-resort: throwing the SAME thing at BIOS but harder
|
||
is unlikely to produce different qualitative behavior.
|
||
|
||
## Files changed
|
||
|
||
- `sim/tb/integration/tb_ee_core_bios_smoke.sv` — Ch263
|
||
sub-`\`ifdef` inside the Ch262 block: gate the local
|
||
`u_ch262_ee_ram`, override `CH262_EE_LANDING` to phys
|
||
`0x00030200`, add the `ee_map_br_*` priority mux that routes
|
||
responder bridge writes into the BIOS-long shared `u_ee_ram`,
|
||
add Ch263 observer (baseline + responder-write event + BIOS
|
||
reads counter + three-way verdict in `final` block).
|
||
- `sim/Makefile` — new `tb_ee_core_bios_long_kernel_mutate`
|
||
target.
|
||
- `docs/ch263_pre_impl_brief.md` — the recon-first brief that
|
||
surfaced the SIF-mailbox-unobserved finding and proposed
|
||
Pivot 3.
|
||
- `docs/ch263_closeout.md` — this file.
|
||
|
||
Caveat: the `final` block summary print didn't fire on this
|
||
run (iverilog 12 quirk with `final` + `$finish` on
|
||
`$error`-triggered timeout). The data was reconstructed from
|
||
the inline `$display` events + trace-file analysis. A future
|
||
chapter could either move the summary into an `always_ff` on
|
||
end-of-test or pre-emptively print at every Ch217 pass.
|
||
|
||
Standing by for Codex's Ch264 call.
|