Files
retroDE_ps2/docs/ch263_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

9.5 KiB
Raw Permalink Blame History

Ch263 closeout — kernel-data mutation reaches BIOS but treadmill unchanged

Status: Closed exactly per Codex's Ch263 framing. Routine BIOS-long target unchanged. New opt-in target lands the Ch261/Ch262 responder DMA payload into the BIOS-polled kernel-data scan range, verifies the write reaches the EE RAM, and confirms BIOS observes the mutation (then scrubs it). Verdict: kernel_mutation_observed_no_flow_change.

Codex Ch263 acceptance — line-by-line

Codex requirement Status Where
No new RTL if avoidable TB-only change; no RTL touched
Keep Ch261 responder and Ch262 interrupt pulse All Ch262 wiring intact; Ch263 only retargets DMA destination
Change only responder DMA destination/payload DEST_BASE_ADDR 0x00080000 → 0x00030200; no payload change
Choose one BIOS-polled kernel-data address 0x80030200 (virt) / 0x00030200 (phys) — mid-range slot in the 16 KiB BIOS scan
Log baseline value at address before DMA Ch263 baseline = 0x000…000 (all-zero, as expected)
Log responder write value Ch263 responder wrote 0xcafef00d12345678c0ffee00deadbeef to EE-phys 0x00030200 at t=50001285000
Log later BIOS reads of same address Trace shows 17 BIOS reads at 0x80030200 across the test
Report whether BIOS observes the mutation YES — BIOS reads + actively clears the slot post-write
Report whether treadmill state changes NO — retire count, Ch217 passes, Ch218 INTC summary all byte-identical to Ch260 baseline
Avoid Pivot 2 unless this returns clean negative Following the rule; deferring 0x1fa00000 question to Ch264
Full regression green 157 / 157 with Ch263 off by default

Verdict logic — three-way classification

Codex framed three possible outcomes:

  • kernel_mutation_unobserved — BIOS never reads the slot
  • kernel_mutation_observed_no_flow_change — BIOS reads + W1Cs, no progress (← THIS RUN)
  • kernel_mutation_perturbed_flow — BIOS reads + path changes (= we found a gate)

The trace evidence + treadmill metrics put this run squarely in the middle bucket.

What the trace actually showed

Step 1 — BIOS scans the 0x800300000x80033FF0 range every pass

From ee_bios_smoke_map.trace:

Total MEM READ in 0x80030xxx range:   1,217,848
Total MEM WRITE in 0x80030xxx range:  32,768

That is 4,096 writes per pass × 8 passes — BIOS clears the entire 16 KiB kernel-data table once per pass. Every slot gets zeroed every pass. This pattern was visible in the Ch218 v5 capture but not characterized as a scrub until Ch263.

Step 2 — the responder's write lands at our target slot

cycle 5,000,125  MEM WRITE 0x00030200  data=0xc0ffee00deadbeef  region=1  flags=0x01

(arg1 only carries the low 64 bits of the bridge's 128-bit qword write — schema artifact. The qword is 0xcafef00d12345678c0ffee00deadbeef per the Ch263 responder wrote diagnostic line.)

Step 3 — BIOS observes the value and clears it

Reads at virt 0x80030200 across the run:

cycle 770,570       — BIOS init read, slot zero
cycle 1,287,787     — BIOS init verify
cycle 5,000,125     — RESPONDER WRITES (between BIOS reads)
cycle 10,671,220    — BIOS read after responder write (likely sees 0xcafef00d…)
cycle 11,186,947    — BIOS writes 0 (clears our value)
cycle 11,188,437    — BIOS reads (sees zero now)
cycle 20,571,870    — next pass read
…

The arg1=0 in the trace for EV_READ events is hardcoded (documented in Ch258), so we can't directly READ the returned value from the trace. But the WRITE-ZERO at cycle 11,186,947 immediately followed by a verify read at 11,188,437 is consistent with BIOS reading non-zero data at cycle 10,671,220, deciding to scrub, and verifying the clear.

Step 4 — treadmill state did not change

Metric Ch260 baseline Ch262 (responder pulse) Ch263 (mutation + pulse)
Ch217 caller passes 8 8 8 (same)
Ch217 verdict static_state static_state static_state (same)
Ch218 INTC summary (filtered set) (same) (same)
Ch218 INTC verdict intc_quiet intc_pending_observed intc_pending_observed (same)
Retire count 24,029,051 24,029,051 24,029,051 (byte-identical)

Interpretation

BIOS sees mutations in the kernel-data table but is structurally defended against them via a periodic-scrub kernel routine. The scrub clears the entire 16 KiB region every Ch217 pass; any value we write into a slot lives only until BIOS's next scrub pass, at which point it's zeroed. Whatever the longjmp callee is gated on, either:

  1. It isn't in this scanned region — the scrub means BIOS itself doesn't rely on accumulated state in slots 0x80030000-3FF0. The region might be a fresh-init scratchpad that BIOS expects to recompute each pass, not a kernel state table.
  2. It is in this region but BIOS reads the slot's value DURING the pass, not as latched state across passes — and the pass timing is such that our write doesn't land in the right window.

Either way, single-shot writes into this region are not the gate.

What's next (for Codex's Ch264 call)

Two distinct candidates given the new "BIOS scrubs every pass" finding:

(A) Sustained / re-emitted mutation. If BIOS scrubs every pass, a one-shot write loses to the scrub. The Ch263 responder could be retriggered EVERY PASS (e.g. driven by a Ch217-pass-edge signal) so the slot is re-set after each scrub. This tests whether BIOS reads the value MID-PASS before scrubbing — and if so, whether sustained value-presence eventually perturbs flow. The downside: now we're polluting the very table BIOS is managing, which could mask other behavior.

(B) Pivot to 0x1fa00000 (the deferred Pivot 2 from the Ch263 pre-brief). BIOS writes here 46 times with a sequence of values 0x0..0xF. That's a "progress code" or "handshake state output" port pattern. Maybe BIOS expects to read back what it just wrote — or expects an external observer to see those writes and respond. Lower risk than (A) and qualitatively different (output, not polled input).

(C) Look elsewhere entirely. The Ch218 v7 capture showed the longjmp callee at 0xBFC52984 makes the same JAL with identical $a0/$a1/$v0 every pass. The callee's body reads from somewhere — but not from the 0x80030000+ region (per Ch263). What does it read? Re-running Ch218 in the Ch263 build with the scoping filter widened (or scoped to the callee's PC window) could surface the actual polled location.

My recommendation

(C) first, then (B), then (A) if both negative.

Reasoning: Ch263's null result narrows the search significantly. BIOS isn't gated on the scrubbed kernel-data table, isn't gated on INTC pending alone (Ch262), isn't gated on PCR (Ch258), and isn't gated on SMFLG (Ch263 pre-brief). What HASN'T been ruled out is whatever the callee's body actually reads to compute its return value. That's an empirical question Ch264 can answer with another scoped Ch218-style observer — narrow the capture to PCs inside the callee's body (0xBFC52984.. + ~16 instructions) and see what addresses it touches.

If (C) returns "callee reads from address X" and X is unmapped or zero, then THAT becomes the next Ch265 target.

If (C) is inconclusive (callee uses only register state), then (B) — 0x1fa00000 — is the next-best surface to investigate.

(A) is last-resort: throwing the SAME thing at BIOS but harder is unlikely to produce different qualitative behavior.

Files changed

  • sim/tb/integration/tb_ee_core_bios_smoke.sv — Ch263 sub-\ifdefinside the Ch262 block: gate the localu_ch262_ee_ram, override CH262_EE_LANDINGto phys0x00030200, add the ee_map_br_*priority mux that routes responder bridge writes into the BIOS-long sharedu_ee_ram, add Ch263 observer (baseline + responder-write event + BIOS reads counter + three-way verdict in final` block).
  • sim/Makefile — new tb_ee_core_bios_long_kernel_mutate target.
  • docs/ch263_pre_impl_brief.md — the recon-first brief that surfaced the SIF-mailbox-unobserved finding and proposed Pivot 3.
  • docs/ch263_closeout.md — this file.

Caveat: the final block summary print didn't fire on this run (iverilog 12 quirk with final + $finish on $error-triggered timeout). The data was reconstructed from the inline $display events + trace-file analysis. A future chapter could either move the summary into an always_ff on end-of-test or pre-emptively print at every Ch217 pass.

Standing by for Codex's Ch264 call.