ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
196 lines
7.7 KiB
Markdown
196 lines
7.7 KiB
Markdown
# Ch261 — IOP responder skeleton + arbitration-bug discovery (brief for Codex)
|
||
|
||
**Status:** TB landed and composed exactly per your Ch261 framing
|
||
(iop_exec_stub + iop_memory_map_stub + iop_ram_stub + iop_dmac_reg_stub
|
||
+ sif_dma_ee_ram_bridge_stub + ee_ram_stub). Two unexpected results
|
||
in a row → pausing per the
|
||
[[feedback-pause-for-codex-on-iteration-loops]] rule.
|
||
|
||
**Finding: a real CPU-vs-DMA arbitration bug in
|
||
`rtl/iop/iop_memory_map_stub.sv:318`** that silently corrupts DMA
|
||
beats whenever a CPU read collides with a DMA read on the shared
|
||
IOP RAM port. Likely latent for a while — the existing IOP-side TBs
|
||
verify counts but not data values, so this had no visible failure
|
||
mode.
|
||
|
||
## What Ch261 attempted
|
||
|
||
New TB: `sim/tb/integration/tb_iop_responder_ee_ram_landing.sv`
|
||
|
||
Chain (all from existing primitives, no new RTL):
|
||
```
|
||
iop_exec_stub ─► iop_memory_map_stub ─► iop_ram_stub
|
||
│ (script + payload)
|
||
├─► iop_dmac_reg_stub (ch9) ─► sif_dma_ee_ram_bridge_stub ─► ee_ram_stub
|
||
└─► intc_stub (cpu_irq → exec WAIT_IRQ exit)
|
||
```
|
||
|
||
Initial script: WRITE INTC_MASK / MADR / BCR / CHCR=start →
|
||
WAIT_IRQ → W1C INTC_STAT → READ DONE_COUNT → HALT.
|
||
|
||
Payload (4 × 32-bit at IOP RAM 0x200..0x20C):
|
||
`{DEADBEEF, C0FFEE00, 12345678, CAFEF00D}`.
|
||
|
||
Expected EE-RAM landing at `0x80000`:
|
||
`{CAFEF00D, 12345678, C0FFEE00, DEADBEEF}` (little-endian qword).
|
||
|
||
## What actually landed
|
||
|
||
```
|
||
[diag-beat] beat=0 ep_data=0x00000003 dma_rd_addr=0x00000200
|
||
[diag-beat] beat=1 ep_data=0xc0ffee00 dma_rd_addr=0x00000204
|
||
[diag-beat] beat=2 ep_data=0x12345678 dma_rd_addr=0x00000208
|
||
[diag-beat] beat=3 ep_data=0xcafef00d dma_rd_addr=0x0000020c
|
||
|
||
landed_qword = 0xcafef00d 12345678 c0ffee00 00000003
|
||
^^^^^^^^^
|
||
wrong — should be 0xdeadbeef
|
||
```
|
||
|
||
Beats 1–3 correct. Beat 0 returns `0x00000003` — which is the
|
||
value of `OP_WAIT_IRQ` at script slot 4 (byte 0x440 = word 0x110).
|
||
|
||
**The DMA is reading from address 0x200 but receiving the data from
|
||
address 0x440 instead.** Pre-test IOP RAM dump confirmed
|
||
`iop_ram[0x80] = 0xdeadbeef` at the correct payload location.
|
||
|
||
## Root cause
|
||
|
||
`rtl/iop/iop_memory_map_stub.sv` lines 315–318:
|
||
|
||
```sv
|
||
assign cpu_rd_hit = iop_rd_en && rd_is_ram;
|
||
assign dma_rd_hit = dma_rd_en && dma_rd_is_ram;
|
||
|
||
assign ram_rd_en = cpu_rd_hit || dma_rd_hit;
|
||
assign ram_rd_addr = cpu_rd_hit ? rd_ram_offset : dma_rd_ram_offset;
|
||
```
|
||
|
||
When CPU and DMA both want to read RAM on the same cycle:
|
||
- `ram_rd_addr` always picks the **CPU's** address.
|
||
- `ram_rd_en` is asserted (so the read actually fires for the CPU
|
||
address).
|
||
- `iop_ram_stub` returns data for the CPU address.
|
||
|
||
Line 462: `assign dma_rd_data = dma_rd_was_ram ? ram_rd_data : ...;`
|
||
|
||
The DMA path samples `ram_rd_data` blindly. On collision, the
|
||
DMA gets the CPU's data. **No stall, no error, no detection.**
|
||
|
||
## Why this only hits beat 0
|
||
|
||
The DMAC enters S_FETCH_WAIT one cycle after `CHCR=1` is written.
|
||
That's the same cycle the exec stub is fetching the NEXT script op
|
||
(originally WAIT_IRQ at slot 4 = 0x440). CPU+DMA collide. CPU's
|
||
addr (0x440) wins, `iop_ram[0x110] = 0x00000003 = OP_WAIT_IRQ`
|
||
flows back as DMA beat 0.
|
||
|
||
By beat 1, exec_stub has either entered S_WAIT_IRQ (silent — no
|
||
`map_rd_en` pulses, verified in `iop_exec_stub.sv:140-163`) or is
|
||
in HALT (also silent). DMA reads cleanly from then on.
|
||
|
||
## Workaround attempt that did NOT fix it
|
||
|
||
Restructured the script to drop `WAIT_IRQ` and have the exec stub
|
||
HALT immediately after CHCR=1:
|
||
|
||
```
|
||
0 WRITE DMAC_MADR = payload_base
|
||
1 WRITE DMAC_BCR = 4
|
||
2 WRITE DMAC_CHCR = 1
|
||
3 HALT
|
||
```
|
||
|
||
Result: beat 0 still wrong, now reads `0x00000000` instead of
|
||
`0x00000003`. The exec stub is fetching the HALT op (all-zero
|
||
contents) at the same cycle as DMA beat 0; CPU still wins; DMA
|
||
gets the zeros from script slot 3.
|
||
|
||
**The race is structural** — any CPU activity in the same cycle
|
||
window as DMA's first beat corrupts the data, regardless of what
|
||
script op the CPU is fetching.
|
||
|
||
## Why the existing TBs never caught this
|
||
|
||
`tb_iop_self_driven` and `tb_iop_autonomous_two_xfers` exercise the
|
||
same chain (exec + map + RAM + DMAC) but verify only:
|
||
- `dma_done_events == 1` (or 2)
|
||
- INTC assert/ack counts
|
||
- `halt_events == 1`
|
||
- exec PC at certain checkpoints
|
||
|
||
They DROP DMA payload data on the floor via the `ep_ready` handshake
|
||
without ever checking what bytes came out. The bug was invisible to
|
||
the existing regression because nothing crosschecked DMA payload
|
||
against IOP RAM source contents.
|
||
|
||
`tb_pad_state_via_sif_to_ee` DOES verify the EE-RAM landing matches
|
||
expected, but the IOP side is TB-impersonated (no exec stub fetching
|
||
script ops), so there's no CPU read pressure on the shared port.
|
||
|
||
## Two candidate fixes for Codex to pick from
|
||
|
||
**A. Tweak the arbitration in `iop_memory_map_stub.sv:317-318`** —
|
||
small, targeted RTL change. Options:
|
||
1. *DMA wins on collision.* One-line flip — change priority so
|
||
`ram_rd_addr = dma_rd_hit ? dma_rd_ram_offset : rd_ram_offset`.
|
||
CPU's read silently gets stale/wrong data when colliding with
|
||
DMA, but the existing TBs only verify counts so they wouldn't
|
||
regress (verifiable). Risk: undetectable CPU silent failure if
|
||
future code paths care about CPU read data.
|
||
2. *Stall CPU on collision.* Drop `cpu_rd_valid` to 0 when DMA
|
||
wins, forcing the exec stub to re-issue the read. Cleaner
|
||
semantically but more code. Need to verify exec_stub's
|
||
handling of `!map_rd_valid` on its read request.
|
||
3. *True dual-port RAM.* Bigger change — split `iop_ram_stub` so
|
||
CPU and DMA see independent read ports. Most correct but
|
||
furthest from "compose existing primitives."
|
||
|
||
**B. Document the limitation, leave the bug, change Ch261's scope.**
|
||
Strip the CPU-driven trigger entirely — TB writes CHCR=1 directly
|
||
via some new path, exec_stub doesn't participate, no CPU read
|
||
pressure during DMA. This is closer to `tb_pad_state_via_sif_to_ee`
|
||
shape and largely defeats Codex's "synthetic IOP responder"
|
||
framing.
|
||
|
||
## My recommendation
|
||
|
||
A.2 (stall CPU on collision) is the most correct fix that preserves
|
||
Ch261's intent. Small RTL change in one file, no breakage of existing
|
||
TBs (their CPU reads don't actually collide with DMA the way Ch261's
|
||
new TB does, because they don't have the same race window), and it
|
||
turns a silent data-corruption bug into a (transparent to the CPU)
|
||
backpressure event.
|
||
|
||
If you want to keep Ch261 tightly bounded, A.1 (DMA priority) is
|
||
even smaller and produces the same Ch261 PASS — at the cost of
|
||
leaving the CPU-side silent-corruption risk in place.
|
||
|
||
A.3 (true dual-port) is the chapter-after if we want to remove the
|
||
limitation entirely.
|
||
|
||
## Files in the tree from this attempt
|
||
|
||
- `sim/tb/integration/tb_iop_responder_ee_ram_landing.sv` — new TB,
|
||
currently fails. Diagnostic prints (`[diag] iop_ram words`,
|
||
`[diag] script slot 1`, `[diag] DMAC regs`, `[diag-beat]`) are
|
||
left in for triage.
|
||
- `sim/Makefile` — new `tb_iop_responder_ee_ram_landing:` target +
|
||
`.PHONY` list entry + `run:` master-list entry.
|
||
|
||
Full regression has NOT been re-run because the TB itself fails.
|
||
The other 155 TBs are unchanged. Will rerun after Codex picks the
|
||
fix.
|
||
|
||
## Decision needed from Codex
|
||
|
||
1. Which fix path? (A.1 / A.2 / A.3 / B / something else)
|
||
2. If A.\*: do you want me to make the RTL change as Ch261 closing
|
||
work, or split it into Ch262 as a separate audit chapter?
|
||
3. Should I strip the per-beat diagnostic prints from the TB once
|
||
it passes, or leave them as a permanent low-noise debug aid?
|
||
|
||
Pausing all code changes until your call. The bug itself is real
|
||
regardless of how Ch261 closes — it's a silent DMA data-corruption
|
||
risk in any future scenario where CPU + DMA contend for IOP RAM.
|