RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.7 KiB
Ch261 — IOP responder skeleton + arbitration-bug discovery (brief for Codex)
Status: TB landed and composed exactly per your Ch261 framing (iop_exec_stub + iop_memory_map_stub + iop_ram_stub + iop_dmac_reg_stub
- sif_dma_ee_ram_bridge_stub + ee_ram_stub). Two unexpected results in a row → pausing per the feedback-pause-for-codex-on-iteration-loops rule.
Finding: a real CPU-vs-DMA arbitration bug in
rtl/iop/iop_memory_map_stub.sv:318 that silently corrupts DMA
beats whenever a CPU read collides with a DMA read on the shared
IOP RAM port. Likely latent for a while — the existing IOP-side TBs
verify counts but not data values, so this had no visible failure
mode.
What Ch261 attempted
New TB: sim/tb/integration/tb_iop_responder_ee_ram_landing.sv
Chain (all from existing primitives, no new RTL):
iop_exec_stub ─► iop_memory_map_stub ─► iop_ram_stub
│ (script + payload)
├─► iop_dmac_reg_stub (ch9) ─► sif_dma_ee_ram_bridge_stub ─► ee_ram_stub
└─► intc_stub (cpu_irq → exec WAIT_IRQ exit)
Initial script: WRITE INTC_MASK / MADR / BCR / CHCR=start → WAIT_IRQ → W1C INTC_STAT → READ DONE_COUNT → HALT.
Payload (4 × 32-bit at IOP RAM 0x200..0x20C):
{DEADBEEF, C0FFEE00, 12345678, CAFEF00D}.
Expected EE-RAM landing at 0x80000:
{CAFEF00D, 12345678, C0FFEE00, DEADBEEF} (little-endian qword).
What actually landed
[diag-beat] beat=0 ep_data=0x00000003 dma_rd_addr=0x00000200
[diag-beat] beat=1 ep_data=0xc0ffee00 dma_rd_addr=0x00000204
[diag-beat] beat=2 ep_data=0x12345678 dma_rd_addr=0x00000208
[diag-beat] beat=3 ep_data=0xcafef00d dma_rd_addr=0x0000020c
landed_qword = 0xcafef00d 12345678 c0ffee00 00000003
^^^^^^^^^
wrong — should be 0xdeadbeef
Beats 1–3 correct. Beat 0 returns 0x00000003 — which is the
value of OP_WAIT_IRQ at script slot 4 (byte 0x440 = word 0x110).
The DMA is reading from address 0x200 but receiving the data from
address 0x440 instead. Pre-test IOP RAM dump confirmed
iop_ram[0x80] = 0xdeadbeef at the correct payload location.
Root cause
rtl/iop/iop_memory_map_stub.sv lines 315–318:
assign cpu_rd_hit = iop_rd_en && rd_is_ram;
assign dma_rd_hit = dma_rd_en && dma_rd_is_ram;
assign ram_rd_en = cpu_rd_hit || dma_rd_hit;
assign ram_rd_addr = cpu_rd_hit ? rd_ram_offset : dma_rd_ram_offset;
When CPU and DMA both want to read RAM on the same cycle:
ram_rd_addralways picks the CPU's address.ram_rd_enis asserted (so the read actually fires for the CPU address).iop_ram_stubreturns data for the CPU address.
Line 462: assign dma_rd_data = dma_rd_was_ram ? ram_rd_data : ...;
The DMA path samples ram_rd_data blindly. On collision, the
DMA gets the CPU's data. No stall, no error, no detection.
Why this only hits beat 0
The DMAC enters S_FETCH_WAIT one cycle after CHCR=1 is written.
That's the same cycle the exec stub is fetching the NEXT script op
(originally WAIT_IRQ at slot 4 = 0x440). CPU+DMA collide. CPU's
addr (0x440) wins, iop_ram[0x110] = 0x00000003 = OP_WAIT_IRQ
flows back as DMA beat 0.
By beat 1, exec_stub has either entered S_WAIT_IRQ (silent — no
map_rd_en pulses, verified in iop_exec_stub.sv:140-163) or is
in HALT (also silent). DMA reads cleanly from then on.
Workaround attempt that did NOT fix it
Restructured the script to drop WAIT_IRQ and have the exec stub
HALT immediately after CHCR=1:
0 WRITE DMAC_MADR = payload_base
1 WRITE DMAC_BCR = 4
2 WRITE DMAC_CHCR = 1
3 HALT
Result: beat 0 still wrong, now reads 0x00000000 instead of
0x00000003. The exec stub is fetching the HALT op (all-zero
contents) at the same cycle as DMA beat 0; CPU still wins; DMA
gets the zeros from script slot 3.
The race is structural — any CPU activity in the same cycle window as DMA's first beat corrupts the data, regardless of what script op the CPU is fetching.
Why the existing TBs never caught this
tb_iop_self_driven and tb_iop_autonomous_two_xfers exercise the
same chain (exec + map + RAM + DMAC) but verify only:
dma_done_events == 1(or 2)- INTC assert/ack counts
halt_events == 1- exec PC at certain checkpoints
They DROP DMA payload data on the floor via the ep_ready handshake
without ever checking what bytes came out. The bug was invisible to
the existing regression because nothing crosschecked DMA payload
against IOP RAM source contents.
tb_pad_state_via_sif_to_ee DOES verify the EE-RAM landing matches
expected, but the IOP side is TB-impersonated (no exec stub fetching
script ops), so there's no CPU read pressure on the shared port.
Two candidate fixes for Codex to pick from
A. Tweak the arbitration in iop_memory_map_stub.sv:317-318 —
small, targeted RTL change. Options:
- DMA wins on collision. One-line flip — change priority so
ram_rd_addr = dma_rd_hit ? dma_rd_ram_offset : rd_ram_offset. CPU's read silently gets stale/wrong data when colliding with DMA, but the existing TBs only verify counts so they wouldn't regress (verifiable). Risk: undetectable CPU silent failure if future code paths care about CPU read data. - Stall CPU on collision. Drop
cpu_rd_validto 0 when DMA wins, forcing the exec stub to re-issue the read. Cleaner semantically but more code. Need to verify exec_stub's handling of!map_rd_validon its read request. - True dual-port RAM. Bigger change — split
iop_ram_stubso CPU and DMA see independent read ports. Most correct but furthest from "compose existing primitives."
B. Document the limitation, leave the bug, change Ch261's scope.
Strip the CPU-driven trigger entirely — TB writes CHCR=1 directly
via some new path, exec_stub doesn't participate, no CPU read
pressure during DMA. This is closer to tb_pad_state_via_sif_to_ee
shape and largely defeats Codex's "synthetic IOP responder"
framing.
My recommendation
A.2 (stall CPU on collision) is the most correct fix that preserves Ch261's intent. Small RTL change in one file, no breakage of existing TBs (their CPU reads don't actually collide with DMA the way Ch261's new TB does, because they don't have the same race window), and it turns a silent data-corruption bug into a (transparent to the CPU) backpressure event.
If you want to keep Ch261 tightly bounded, A.1 (DMA priority) is even smaller and produces the same Ch261 PASS — at the cost of leaving the CPU-side silent-corruption risk in place.
A.3 (true dual-port) is the chapter-after if we want to remove the limitation entirely.
Files in the tree from this attempt
sim/tb/integration/tb_iop_responder_ee_ram_landing.sv— new TB, currently fails. Diagnostic prints ([diag] iop_ram words,[diag] script slot 1,[diag] DMAC regs,[diag-beat]) are left in for triage.sim/Makefile— newtb_iop_responder_ee_ram_landing:target +.PHONYlist entry +run:master-list entry.
Full regression has NOT been re-run because the TB itself fails. The other 155 TBs are unchanged. Will rerun after Codex picks the fix.
Decision needed from Codex
- Which fix path? (A.1 / A.2 / A.3 / B / something else)
- If A.*: do you want me to make the RTL change as Ch261 closing work, or split it into Ch262 as a separate audit chapter?
- Should I strip the per-beat diagnostic prints from the TB once it passes, or leave them as a permanent low-noise debug aid?
Pausing all code changes until your call. The bug itself is real regardless of how Ch261 closes — it's a silent DMA data-corruption risk in any future scenario where CPU + DMA contend for IOP RAM.