# Wave 3 — BIOS / IOP / SBUS reconnaissance + probe arc (Ch256–Ch260) **Status: CLOSED MILESTONE (Ch260).** The Ch256-Ch259 probe arc exhausted single-register approaches to breaking the Ch215 longjmp treadmill. The next BIOS attempt does NOT start with another hardcode — it starts with **IOP-side state-machine modeling**. See "Milestone closeout" section below for the full conclusion + Ch261+ pivot. The earlier sections of this document are preserved for posterity; the chapter-by-chapter notes below reflect what was actually learned during the arc, not what we initially hypothesised. --- ## Milestone closeout (Ch260) Four chapters of probe-arc work produced four concrete facts: 1. **Ch256 recon → "primary hypothesis: EE timers."** Wrong. `tb_ee_core_bios_long` observer eventually surfaced the real pattern — see Ch257. 2. **Ch257 Ch218 observer landed.** After seven iterations (six more than necessary, lesson captured in [[feedback-pause-for-codex-on-iteration-loops]]) the observer was able to attribute reads to the JAL callee in the longjmp-return path. Surfaced IOP DMAC PCR + IOP INTC pair as the recurring polled MMIO targets. Timer hypothesis didn't survive contact. 3. **Ch258 IOP DMAC PCR realism stub (0xBF8010F0 → 0x07654321) — PCR was not the gate.** The hardcode landed cleanly (verified via `sim/traces/rtl/ee_bios_smoke_core.trace` showing `lw $14, 0x10f0($14)` retiring with `$14 = 0x07654321`), BIOS reads-then-writes the value back as part of a save/modify/restore pattern, but the treadmill behaviour is byte-identical pre- vs. post-Ch258. 4. **Ch259 named IOP INTC behavior at `0x1F801070`/`0x1F801074` + sticky source injection — INTC alone is not enough.** - Phase 1 (no synthetic source): proper W1C/mask semantics landed. BIOS exercises the full 14-instruction INTC dance every pass: probes I_MASK, W1C's I_STAT, polls I_STAT three times, sets up mask bits 0 and 3, repeats. Every I_STAT read returns 0. Verdict: `intc_quiet`. Treadmill persists. - Phase 2 (`+IOP_INTC_BOOT_SRC=0001`): the sticky source IS reaching the EE (verdict flipped to `intc_pending_observed` — at least one I_STAT read returned non-zero). BIOS sees the pending bit but still loops 8 passes with byte-identical retire count (24,029,051). Pending bit alone is necessary but not sufficient — the dispatch-through-handler doesn't reach a state where the longjmp restoration sees changed inputs. **Conclusion:** the Ch215 treadmill requires **multi-state IOP/SBUS/kernel activity** — a real IOP responder that produces firmware-visible side effects (kernel globals written, SIF mailbox flags toggled, INTC sources asserted in response to actual events). Single-register hardcodes and single-bit synthetic sources do not suffice. Ch260 closes the BIOS-mmio probe arc. ### What landed in the tree (kept) - `rtl/ee/ee_bootstrap_mmio_stub.sv` — three named MMIO behaviors promoted out of the anonymous regfile, all in production-shaped form with default-safe semantics: - **0x1814** (Ch202) — hardcoded `0xFFFFFFFF` (RDRAM-init ready polling bit). - **0x10F0** (Ch258) — hardcoded `0x07654321` (IOP DMAC PCR realism stub, real PS1/IOP reset value). - **0x1070 / 0x1074** (Ch259) — named IOP INTC view with W1C on I_STAT, plain-write on I_MASK, sticky `iop_intc_inject_src_i` diagnostic injection port (default 0). - `sim/tb/integration/tb_ee_core_bios_smoke.sv` — Ch218 observer preserved as a compact INTC transaction log, gated behind `\`ifdef CH259_INTC_DIAG` so routine builds are silent. Re-enable via `make tb_ee_core_bios_long_intc_diag`. - `sim/Makefile` — new `tb_ee_core_bios_long_intc_diag` target. ### Where the next attempt starts (Ch261+) **Do not** open Ch261 as another `ee_bootstrap_mmio_stub` hardcode. The probe-arc has demonstrated empirically that no single MMIO ready bit is the gate. The Ch261+ arc opens **IOP-side modeling**: - **Phase 1 (Ch261 candidate):** a tiny synthetic IOP/SIF responder in the TB that produces ONE meaningful kernel-visible side effect per Ch215 pass. The simplest plausible "real side effect" is a monotonic counter at a fixed kernel-data address (e.g., `0x80030000` — the kdata region BIOS scans every pass per the Ch218 v5 capture). If BIOS's longjmp callee polls that counter and it advances, the treadmill should break. If it doesn't, we've isolated which side-effect shape BIOS actually needs. - **Phase 2 (later):** flesh out the IOP-side stubs (already exist in `rtl/iop/`, none currently in production) into a responder that can take SIF mailbox commands and emit INTC pulses. This is the multi-chapter "real IOP" arc. Three structural rules captured from the arc, to apply during Ch261+ (recorded as memory entries): - **Pause for Codex on the second unexpected result** — [[feedback-pause-for-codex-on-iteration-loops]]. - **Agents for breadth, source-read for runtime semantics** — [[feedback-agents-for-breadth-not-runtime]]. - **Don't model IOP/SIF activity by hardcoding a single bit; model the producer.** A pending-bit hardcode kicks BIOS into a fake handler path without progressing it. --- **Purpose:** before re-opening the BIOS treadmill, lock down what the EE / IOP / SIF / INTC / DMAC stack actually models today versus what the real BIOS expects, so the next chapter targets a specific dependency rather than chasing the BIOS opcode-of-the-week. --- ## TL;DR The real BIOS, run under `tb_ee_core_bios_smoke` with `+BIOS=`, exhibits **two distinct failure modes** that both blocked the Ch215-Ch218 arc: 1. **The Ch215 longjmp-return treadmill.** After the Ch215 jmp_buf restore FSM rehydrates 12 GPRs from `0xA000B1E0`, BIOS resumes at the restored `$ra` and loops 5 times through `0xBFC52340..0xBFC52360`. The Ch217 verdict captures the smoking gun: across passes, `$a0`, `$a1`, and the JAL callee's `$v0` are **bit-identical**. The kernel is asking the same question and getting the same answer every cycle. "BIOS has no escape signal from this callee" — external state that should flip between passes is not flipping because our stack does not model it. 2. **The DEADBEEF wedge in EE-RAM code.** Independent of the longjmp loop, BIOS code executing from EE RAM at `pc=0x000014C4` issues an `LW` with a base pointer derived from an earlier UNMAPPED read. The first UNMAPPED read's return value (the EE map's sentinel for unmapped regions) ends up in `$6`, the wedge fires `lw $2, 0($6)` on a poisoned base, that read is also UNMAPPED, and the resulting value keeps the loop alive forever. The TB has a `lineage_poison_addr` / `retire_ring` diagnostic that captures the first-unmapped read's PC + EA on every run. Both failures share a root cause: **the EE memory map has decode holes in regions BIOS reaches.** The treadmill is one symptom; the DEADBEEF wedge is another. Ch257 picks the most likely missing region and lands a minimal model. --- ## Layer inventory — what exists today ### SIF (Sub-system Interface) — `rtl/sif/` All 8 SIF modules are functional in simulation, **none are instantiated in the production hierarchy** (`top_psmct32_raster_demo`, `de25_nano_psmct32_raster_demo_top`). They live in TBs only. | Module | Models | Faked / TODO | |---------------------------------|-------------------------------------------------------|---------------------------------------------------------------| | `sif_mailbox_stub` | 4-reg MSCOM / SMCOM / MSFLG / SMFLG @ EE 0x1000F200 / IOP 0x1D000000 | No directional / W1C / set-clear semantics; no IRQ; plain RW | | `sif_mailbox_peer_stub` | TB-side IOP responder: command-echo FSM watching MSFLG | TB only; no real IOP execution | | `sif_dma_stub` | Qword receive endpoint (128-bit ready/valid) | No consume path; buffer fills and stays full | | `sif_dma_ack_peer_stub` | EE→IOP combined ctrl+data terminator | One-shot S_DONE; no re-arm | | `sif_dma_iop_ram_bridge_stub` | 128→32 width adapter, qword → 4×32-bit writes | DEST_BASE_ADDR hardcoded; no ack upstream | | `sif_dma_ee_ram_bridge_stub` | 32→128 width adapter, accumulates 4 beats | DEST_BASE_ADDR hardcoded; Ch239 rewind for single-slot pads | | `sif_dma_ee_ack_peer_stub` | IOP→EE combined ctrl+data terminator | One-shot | | `boot_install_agent_stub` | Synthetic exception-handler streamer → EE RAM | NOT BIOS code; canned `MFC0 / ADDIU / JR / RFE` payload only | **Critical gap:** the EE memory map has **no decode region for `0x1000F200-0x1000F2FF`**, so `sceSifInit()`'s mailbox accesses go to UNMAPPED. Even if the stubs were instantiated, the EE core could not reach them. The IOP-side map *does* have the SIF decode (`0x1D000000` block), but the IOP CPU is not running any real firmware. ### IOP — `rtl/iop/` | Module | Models | Faked / TODO | |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------| | `iop_core_stub` | Minimal R3000 (11 opcodes), COP0 Status/Cause/EPC triple-stack, async exception entry, `STRICT_UNSUPPORTED` trap+latch, reset vector `0xBFC00000` (BIOS) | No TLB, cache, HI/LO, R-type ALU, shifts, mul, div, BD bit, kernel/user enforcement | | `iop_memory_map_stub` | IOP RAM 0-2 MiB, BIOS ROM 0x1FC00000+, IOP DMAC ch9 0x1F801520, IOP INTC 0x1F801070, SIF 0x1D000000, retroDE pad I/O 0x1F808500 | All other IOP I/O unmapped (SPU2, timers, other DMAC channels, real SIO2); UNMAPPED reads return `0xDEADBEEF` | | `iop_ram_stub` | 2 MiB SRAM (default 16 KiB in TBs) | None | | `iop_fetch_stub` | Sequential fetcher (trace-only; superseded by `iop_core_stub`) | — | | `iop_exec_stub` | 5-opcode micro-op script engine (HALT/WRITE/READ/WAIT_IRQ/BNE; superseded by `iop_core_stub`) | — | | `iop_dmac_reg_stub` | IOP DMAC ch9 (SIF0 IOP→EE): MADR/BCR/CHCR/DONE_COUNT, 32-bit beats from IOP RAM via memory master | Only ch9 wired; no error reporting | | `sio2_input_stub` | Sony pad word @ retroDE 0x1F808500; 2-FF CDC from bridge domain | No real SIO2 FIFO at 0x1F808200; no DMAC ch11 | **Critical gap:** **the IOP has no real BIOS image and never executes beyond synthetic test programs.** Real BIOS boot expects the IOP to fetch from `0xBFC00000` (shared BIOS ROM), execute its bootstrap, parse `IOPBTCONF`, load IRX modules, and signal readiness back to the EE via SIF. None of that is happening — the IOP-side mailbox writes that would unstick the EE never come. ### INTC — `rtl/intc/intc_stub.sv` Generic 16-source controller, reused on both EE (`0x1000F000`) and IOP (`0x1F801070`) sides. W1C status, plain-write mask. Aggregate `cpu_irq = |(STAT & MASK)`. **Faked:** write-to-toggle (XOR) mask semantics not implemented (real PS2 INTC uses XOR for atomic bit-flip — BIOS code that XORs a mask bit will see unexpected behavior). **Wired sources:** only DMAC completion pulses (`dmac_reg_stub.irq_completion_o`, `iop_dmac_reg_stub.irq_completion_o`). **Unwired:** VBLANK_START, VBLANK_END, GS_FINISH, TIMER0..3, SBUS, VIF/VU, SIF0/SIF1 done, real SIO2, CDVD, USB, FW, IPU. The PCRTC's `frame_seen` toggle exists in the demo wrapper but is not routed to INTC. ### DMAC — `rtl/dmac/dmac_reg_stub.sv` EE-side per-channel register shell + state machine. MADR / QWC / CHCR / TADR / DONE_COUNT at offsets 0/0x10/0x20/0x30/0x40 within a channel bank. Real 128-bit qword transfers from `ee_ram_stub` via memory master. **Faked:** chain mode (TADR recorded but never consulted), all errors report code 0 (OK), no memory-contention throttling. Only ch2 (GIF) and ch5 (SIF0) are instantiated in the production demo; channels 0, 1, 3, 4, 6-12 are register-only or absent. ### EE-side MMIO stubs — `rtl/ee/ee_*_mmio_stub.sv` | Stub | Range | Behavior | |----------------------------|--------------------------------|-----------------------------------------------------------------------------------------| | `ee_biu_mmio_stub` | `0x1FFE0000-0x1FFE0FFF` (4 KiB) | Latched RW. Real BIOS writes 0x1FFE0130 for cache/BIU config (Ch9). | | `ee_bootstrap_mmio_stub` | `0x1F800000-0x1F80FFFF` (64 KiB)| Latched RW + **one special-case**: offset `0x1814` returns hardcoded `0xFFFFFFFF` (Ch202 RDRAM-init ready bit). | `ee_bootstrap_mmio_stub`'s Ch202 special-case is **the canonical pattern** for "BIOS polls a hardware register expecting a ready bit; real RDRAM init never happens; hardcode the return." --- ## The Ch215 longjmp-return treadmill — what the TB knows `tb_ee_core_bios_smoke` runs the real BIOS dump (4 MiB at `/home/ubuntu/Downloads/bios.hex`, passed via `+BIOS=` plusarg). ### What BIOS gets through 1. EE reset @ `0xBFC00000`, ROM bootstrap. 2. BIU/cache config writes to `0x1FFE0130` (BIU stub absorbs them). 3. `0x1F80xxxx` MCH/SBUS reads (bootstrap MMIO stub absorbs, special- case `0x1814` returns `0xFFFFFFFF`). 4. Memory copies from BIOS ROM into EE RAM. 5. Kernel exception-handler install (handled by either `boot_install_agent_stub` or direct memory writes). 6. **First SYSCALL #8 (`_ReturnFromException` with `$a0=2`)** — triggers the Ch215 jmp_buf restore FSM. 12 GPRs (`$ra`, `$sp`, `$fp`, `$s0..$s7`, `$gp`) are loaded from `0xA000B1E0`, `$v0` set to 1 (longjmp-return marker), PC restored from `$ra`. ### Where BIOS stalls After the Ch215 restore, BIOS resumes at the restored `$ra` and loops through `0xBFC52340..0xBFC52360`. The TB's Ch217 task captures every pass through the JAL at `0xBFC52358` (the longjmp-return-handler call). **Across 5 observed passes**: - `$a0` and `$a1` to the callee are **bit-identical**. - The callee's returned `$v0` is **bit-identical**. Ch217 verdict (from the TB itself): > `longjmp_return_repeats_due_to_static_state` — BIOS has no escape > signal from this callee. The callee must be modifying something we > are missing, OR the BIOS expects some external state (MMIO ready, > timer, or kernel global) to flip between passes — but our model is > not providing it. The JAL target is computed dynamically by the TB (`{4'hB, instr[25:0], 2'b00}`) and the callee's first 16 instructions are dumped to the log via Ch217's `CALLED_FUNCTION dump`. **The specific addresses the callee READS inside its body have not been captured by the existing TB diagnostic.** ### The DEADBEEF wedge (separate, related) Parallel to the longjmp loop, BIOS code running in EE RAM at `pc=0x000014C4` issues an `LW` with base = `$6 = 0xDEADBEEF` (or a derivative). The effective address `0x3084_FFFF` lands in an unmapped region; the EE map returns the unmapped-sentinel; the value re-poisons `$2`; the loop self-perpetuates millions of times. The TB has full diagnostic for this: `lineage_poison_addr`, `lineage_poison_data`, `lineage_pc`, `lineage_instr`, plus a 32-deep retire ring (`retire_ring_pc[*]`, `retire_ring_r2[*]`, `retire_ring_r6[*]`) that captures the 32 instructions preceding the first UNMAPPED read. The lineage capture identifies where `$6`'s poison originated, but the corresponding fix has not landed — presumably because the ORIGINATING unmapped read's address class has not yet been claimed by any stub. --- ## What's documented vs. what the kernel actually needs Existing contracts under `docs/contracts/` are **all Draft**: - `sif.md` — mailbox/flag exchange + SIF DMA shape; no detailed boot sequence. - `iop.md` — IOP as separate peer subsystem; notes IOPBOOT / module- load NOT modeled. - `memory.md` — emphasizes BIOS ROM mapping; explicitly does not own BIOS boot sequencing. - `intc.md`, `dmac.md`, `ee.md`, `gif_gs.md`, `peripherals.md`, `platform.md`, `vif_vu.md`, `sio2_pad.md`, `spu2.md` — system contracts, no BIOS/kernel detail. The `wave2*_plan.md` documents are all DMA/GIF focused and explicitly defer SIF/IOP/BIOS. **No existing doc covers the kernel's setjmp / longjmp / `_ReturnFromException` path.** Ch214-Ch217 reverse- engineered the layout (12-GPR frame at `0xA000B1E0`, setjmp at `0xBFC4DB50`, post-setjmp checkpoint at `0xBFC52358`) but only embedded the findings in the TB itself. --- ## Where the missing signal most likely lives Cross-referencing the Ch217 verdict's three candidate categories ("MMIO ready, timer, or kernel global") against the layer inventory: **Kernel global (RAM at `0xA000xxxx`).** Unlikely. The callee at `0xBFC52358` is in BIOS ROM and reads `$a0`/`$a1` from the restored GPRs. If the callee polls a kernel global, that global would be in EE RAM. EE RAM is bidirectionally accessible; if the callee both READS and WRITES the global, the value would change pass-to-pass. The Ch217 verdict says it does not — so either the callee writes nothing, or the global lives in an unmapped region. **Timer (`0x10000000-0x10001FFF`, T0/T1/T2/T3 COUNT/MODE/COMP/HOLD).** Plausible. PS2 kernel uses one of T0-T3 for the scheduler tick. The counter is read at fixed addresses; the count value advances with hardware clocks regardless of CPU activity. **The EE memory map has NO decode for `0x10000000-0x10001FFF`** — all four timers are unmapped. A read returns the unmapped sentinel, the kernel reads the same sentinel every pass, sees no time elapsed, loops. **MMIO ready bit (EE INTC, SIF mailbox, or bootstrap MMIO).** Also plausible. - *EE INTC `0x1000F000`* is mapped (status/mask), but no sources fire periodically. If the callee polls `INTC_STAT[VBLANK_START]` waiting for a frame tick, the bit never sets. - *SIF mailbox `0x1000F200`* is **not mapped at all** on the EE side. A read returns the unmapped sentinel; if the callee polls `SMFLG[IOP_READY]` waiting for IOP boot completion, that read is pure UNMAPPED — and would also trigger the DEADBEEF wedge if the sentinel is `0xDEADBEEF`. - *Bootstrap MMIO `0x1F80xxxx`* is mapped (latched + Ch202 special- case at `0x1814`). If the callee polls a different offset expecting a ready bit, hardcoded special-case is the proven fix pattern. --- ## Ch257 — scoped callee-body memory-read observer (landed) **Implementation:** [`sim/tb/integration/tb_ee_core_bios_smoke.sv`](../sim/tb/integration/tb_ee_core_bios_smoke.sv) gains a Ch218 observer + verdict-emitter: - **`ch218_jal_target`** — dynamically decoded from `peek_instr(0xBFC52358)` (the JAL whose callee Ch217 already characterized as static-state). Pattern matches Ch217's own decode. - **`ch218_pc_in_body`** — combinational gate, fires when the live `core_pc` is in `[jal_target, jal_target + 0x80)`. - **Capture array** of depth 64. Each entry records `{pass_idx (from ch217_count), pc, instr, ea, data, rt}` per qualifying EE memory READ event (`ee_map_ev_valid && EV_READ && SUBSYS_MEM && ch218_pc_in_body`). Sampled on every clock so reads within a single pass through the callee are all captured. - **Print task `ch218_print_callee_reads`** dumps every captured read with its instruction mnemonic, then sweeps the array for an EA that appears across **multiple distinct passes** and returns **identical data**. That EA is the static-poll candidate. - **Verdict classifier** with six labels, each one naming the Ch258 target region directly: | Verdict label | EA range | Ch258 action | |------------------------------|--------------------------------|-----------------------------------------------------------| | `timer_poll_static` | `0x10000000 – 0x10001FFF` | Land `ee_timer_stub` (PRIMARY) | | `sif_mailbox_static` | `0x1000F200 – 0x1000F2FF` | Route `sif_mailbox_stub` ee-side into EE map decode | | `ee_intc_static` | `0x1000F000 – 0x1000F1FF` | Wire VBLANK / TIMER / SIF source(s) into `intc_stub.irq_src` | | `bootstrap_mmio_static` | `0x1F800000 – 0x1F80FFFF` | Add Ch202-style hardcoded ready at the polled offset | | `ee_ram_static` | `0x00000000 – 0x01FFFFFF` | Preload kernel global via `boot_install_agent_stub` | | `named_region_static` | (any other range) | Report verbatim; pick the next Ch258 hypothesis | | `no_repeated_read_across_passes` | — | Not enough passes captured, or callee uses scratch only | | `no_callee_reads` | — | Synthetic-CI mode (callee never reached); not applicable | - **Wiring:** the print task is called from both the long-run halt path (after `ch217_print_longjmp_path`) and the timeout path (the one real-BIOS mode exits through). Synthetic CI mode skips both call sites because they sit inside `ch213_sc8_seen`-gated blocks — no regression noise, no behavior change to existing TBs. **No RTL change.** No production-RTL change. No new TB. Pure TB-side diagnostic add to an existing TB. The full sim regression stays at **155 PASS / 0 FAIL** with the observer dormant in every TB except the one real-BIOS run that the operator triggers manually. ## How to use Ch218 — operator command ``` cd sim make tb_ee_core_bios_smoke BIOS=/home/ubuntu/Downloads/bios.hex ``` This runs `tb_ee_core_bios_smoke` with `+BIOS=...` so the real 4 MiB BIOS dump is loaded. The TB will eventually hit the Ch215 longjmp treadmill, loop the Ch217 callee multiple times, and timeout. The timeout-path prints: ``` [ch217] LONGJMP_PATH_DECODE 0xBFC52350..0xBFC52390: [ch217] ... [ch217] verdict=longjmp_return_repeats_due_to_static_state ... [ch218] CALLEE_BODY_READS jal_target=0x?????.??..0x?????.?? captured=N (cap=64) [ch218] [0] pass=0 pc=0x... instr=0x... ea=0x... data=0x... rt=$. [ch218] [1] pass=0 pc=0x... ... [ch218] ... [ch218] verdict=