# Ch261 — IOP responder skeleton + arbitration-bug discovery (brief for Codex) **Status:** TB landed and composed exactly per your Ch261 framing (iop_exec_stub + iop_memory_map_stub + iop_ram_stub + iop_dmac_reg_stub + sif_dma_ee_ram_bridge_stub + ee_ram_stub). Two unexpected results in a row → pausing per the [[feedback-pause-for-codex-on-iteration-loops]] rule. **Finding: a real CPU-vs-DMA arbitration bug in `rtl/iop/iop_memory_map_stub.sv:318`** that silently corrupts DMA beats whenever a CPU read collides with a DMA read on the shared IOP RAM port. Likely latent for a while — the existing IOP-side TBs verify counts but not data values, so this had no visible failure mode. ## What Ch261 attempted New TB: `sim/tb/integration/tb_iop_responder_ee_ram_landing.sv` Chain (all from existing primitives, no new RTL): ``` iop_exec_stub ─► iop_memory_map_stub ─► iop_ram_stub │ (script + payload) ├─► iop_dmac_reg_stub (ch9) ─► sif_dma_ee_ram_bridge_stub ─► ee_ram_stub └─► intc_stub (cpu_irq → exec WAIT_IRQ exit) ``` Initial script: WRITE INTC_MASK / MADR / BCR / CHCR=start → WAIT_IRQ → W1C INTC_STAT → READ DONE_COUNT → HALT. Payload (4 × 32-bit at IOP RAM 0x200..0x20C): `{DEADBEEF, C0FFEE00, 12345678, CAFEF00D}`. Expected EE-RAM landing at `0x80000`: `{CAFEF00D, 12345678, C0FFEE00, DEADBEEF}` (little-endian qword). ## What actually landed ``` [diag-beat] beat=0 ep_data=0x00000003 dma_rd_addr=0x00000200 [diag-beat] beat=1 ep_data=0xc0ffee00 dma_rd_addr=0x00000204 [diag-beat] beat=2 ep_data=0x12345678 dma_rd_addr=0x00000208 [diag-beat] beat=3 ep_data=0xcafef00d dma_rd_addr=0x0000020c landed_qword = 0xcafef00d 12345678 c0ffee00 00000003 ^^^^^^^^^ wrong — should be 0xdeadbeef ``` Beats 1–3 correct. Beat 0 returns `0x00000003` — which is the value of `OP_WAIT_IRQ` at script slot 4 (byte 0x440 = word 0x110). **The DMA is reading from address 0x200 but receiving the data from address 0x440 instead.** Pre-test IOP RAM dump confirmed `iop_ram[0x80] = 0xdeadbeef` at the correct payload location. ## Root cause `rtl/iop/iop_memory_map_stub.sv` lines 315–318: ```sv assign cpu_rd_hit = iop_rd_en && rd_is_ram; assign dma_rd_hit = dma_rd_en && dma_rd_is_ram; assign ram_rd_en = cpu_rd_hit || dma_rd_hit; assign ram_rd_addr = cpu_rd_hit ? rd_ram_offset : dma_rd_ram_offset; ``` When CPU and DMA both want to read RAM on the same cycle: - `ram_rd_addr` always picks the **CPU's** address. - `ram_rd_en` is asserted (so the read actually fires for the CPU address). - `iop_ram_stub` returns data for the CPU address. Line 462: `assign dma_rd_data = dma_rd_was_ram ? ram_rd_data : ...;` The DMA path samples `ram_rd_data` blindly. On collision, the DMA gets the CPU's data. **No stall, no error, no detection.** ## Why this only hits beat 0 The DMAC enters S_FETCH_WAIT one cycle after `CHCR=1` is written. That's the same cycle the exec stub is fetching the NEXT script op (originally WAIT_IRQ at slot 4 = 0x440). CPU+DMA collide. CPU's addr (0x440) wins, `iop_ram[0x110] = 0x00000003 = OP_WAIT_IRQ` flows back as DMA beat 0. By beat 1, exec_stub has either entered S_WAIT_IRQ (silent — no `map_rd_en` pulses, verified in `iop_exec_stub.sv:140-163`) or is in HALT (also silent). DMA reads cleanly from then on. ## Workaround attempt that did NOT fix it Restructured the script to drop `WAIT_IRQ` and have the exec stub HALT immediately after CHCR=1: ``` 0 WRITE DMAC_MADR = payload_base 1 WRITE DMAC_BCR = 4 2 WRITE DMAC_CHCR = 1 3 HALT ``` Result: beat 0 still wrong, now reads `0x00000000` instead of `0x00000003`. The exec stub is fetching the HALT op (all-zero contents) at the same cycle as DMA beat 0; CPU still wins; DMA gets the zeros from script slot 3. **The race is structural** — any CPU activity in the same cycle window as DMA's first beat corrupts the data, regardless of what script op the CPU is fetching. ## Why the existing TBs never caught this `tb_iop_self_driven` and `tb_iop_autonomous_two_xfers` exercise the same chain (exec + map + RAM + DMAC) but verify only: - `dma_done_events == 1` (or 2) - INTC assert/ack counts - `halt_events == 1` - exec PC at certain checkpoints They DROP DMA payload data on the floor via the `ep_ready` handshake without ever checking what bytes came out. The bug was invisible to the existing regression because nothing crosschecked DMA payload against IOP RAM source contents. `tb_pad_state_via_sif_to_ee` DOES verify the EE-RAM landing matches expected, but the IOP side is TB-impersonated (no exec stub fetching script ops), so there's no CPU read pressure on the shared port. ## Two candidate fixes for Codex to pick from **A. Tweak the arbitration in `iop_memory_map_stub.sv:317-318`** — small, targeted RTL change. Options: 1. *DMA wins on collision.* One-line flip — change priority so `ram_rd_addr = dma_rd_hit ? dma_rd_ram_offset : rd_ram_offset`. CPU's read silently gets stale/wrong data when colliding with DMA, but the existing TBs only verify counts so they wouldn't regress (verifiable). Risk: undetectable CPU silent failure if future code paths care about CPU read data. 2. *Stall CPU on collision.* Drop `cpu_rd_valid` to 0 when DMA wins, forcing the exec stub to re-issue the read. Cleaner semantically but more code. Need to verify exec_stub's handling of `!map_rd_valid` on its read request. 3. *True dual-port RAM.* Bigger change — split `iop_ram_stub` so CPU and DMA see independent read ports. Most correct but furthest from "compose existing primitives." **B. Document the limitation, leave the bug, change Ch261's scope.** Strip the CPU-driven trigger entirely — TB writes CHCR=1 directly via some new path, exec_stub doesn't participate, no CPU read pressure during DMA. This is closer to `tb_pad_state_via_sif_to_ee` shape and largely defeats Codex's "synthetic IOP responder" framing. ## My recommendation A.2 (stall CPU on collision) is the most correct fix that preserves Ch261's intent. Small RTL change in one file, no breakage of existing TBs (their CPU reads don't actually collide with DMA the way Ch261's new TB does, because they don't have the same race window), and it turns a silent data-corruption bug into a (transparent to the CPU) backpressure event. If you want to keep Ch261 tightly bounded, A.1 (DMA priority) is even smaller and produces the same Ch261 PASS — at the cost of leaving the CPU-side silent-corruption risk in place. A.3 (true dual-port) is the chapter-after if we want to remove the limitation entirely. ## Files in the tree from this attempt - `sim/tb/integration/tb_iop_responder_ee_ram_landing.sv` — new TB, currently fails. Diagnostic prints (`[diag] iop_ram words`, `[diag] script slot 1`, `[diag] DMAC regs`, `[diag-beat]`) are left in for triage. - `sim/Makefile` — new `tb_iop_responder_ee_ram_landing:` target + `.PHONY` list entry + `run:` master-list entry. Full regression has NOT been re-run because the TB itself fails. The other 155 TBs are unchanged. Will rerun after Codex picks the fix. ## Decision needed from Codex 1. Which fix path? (A.1 / A.2 / A.3 / B / something else) 2. If A.\*: do you want me to make the RTL change as Ch261 closing work, or split it into Ch262 as a separate audit chapter? 3. Should I strip the per-beat diagnostic prints from the TB once it passes, or leave them as a permanent low-noise debug aid? Pausing all code changes until your call. The bug itself is real regardless of how Ch261 closes — it's a silent DMA data-corruption risk in any future scenario where CPU + DMA contend for IOP RAM.