Files
retroDE_ps2/docs/wave26_multi_beat_dma_plan.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

9.1 KiB

Wave 2.6 Mini-Plan: Multi-Beat DMAC to GIF to GS

This document defines the next consolidation step after the completed Wave 2.5 memory-backed BGCOLOR via DMA/GIF milestone.

Goal:

  • prove the DMAC/GIF path across more than one qword,
  • validate address stepping from MADR,
  • validate end-of-packet signaling on the final beat only,
  • preserve the visible Milestone A+ outcome through the existing GS/platform chain,
  • do all of that without expanding packet-format scope prematurely.

Working milestone name:

multi-beat BGCOLOR via DMA/GIF

That means:

  • DMAC channel 2 is programmed with QWC = 2,
  • DMAC performs two memory-backed qword fetches,
  • GIF accepts two qwords as two standalone register-write packets,
  • GS observes two consecutive writes,
  • platform video reflects the final written color,
  • traces prove beat ordering, source-address stepping, and gif_tag_last behavior.

Why this goes before memory-map routing

Wave 2.5 established the right ownership seam:

  • DMAC is now a memory client,
  • MADR is real,
  • RAM is owned by the memory subsystem,
  • the graphics-visible path is intact.

The next most valuable risk to retire is not topology, it is transfer shape.

Multi-beat transfer support is the first place where subtle bugs are likely to hide:

  • off-by-one address stepping,
  • incorrect remaining-count bookkeeping,
  • gif_tag_last asserted too early or too late,
  • state-machine stalls between fetch and send,
  • final-color behavior when multiple writes land back-to-back.

Once that is stable, routing the same traffic through ee_memory_map_stub becomes a narrower structural refactor instead of a behavioral and structural change combined.

Deliverables

The first Wave 2.6 pass should land:

  1. an updated rtl/dmac/dmac_reg_stub.sv
  2. a small comment/README update in rtl/gif_gs/gif_path_stub.sv if needed to make multi-beat behavior explicit
  3. an updated sim/tb/integration/tb_bgcolor_via_dma.sv
  4. sim/Makefile updates only if the run flow changes
  5. optional README note if the integration-TB expectations materially change

No new subsystem contracts or decision records are required for this step.

Scope boundary

This plan is intentionally narrow.

It does not attempt to implement:

  • real GIFtag decode,
  • packed register lists,
  • chain mode,
  • linked-list DMA,
  • path arbitration,
  • full EE memory-map routing,
  • any IOP or SIF behavior,
  • GS drawing primitives or VRAM.

The purpose is to prove:

QWC > 1 works correctly on the already-established temporary topology.

Transfer length scope

Recommendation:

  • sign off this phase on QWC = 2

Do not require QWC = 4 for first signoff.

Why two qwords is enough

Two beats cover every new behavior category this phase is meant to validate:

  • beat 0 source address = MADR
  • beat 1 source address = MADR + 16
  • beat 0 has gif_tag_last = 0
  • beat 1 has gif_tag_last = 1
  • the transfer must remain active across an intermediate beat
  • the final beat must retire cleanly into DMA_DONE

Four beats would add more distance but not a new category of behavior. If the implementation naturally supports QWC > 2, that is welcome, but the required directed proof for this phase should stay at 2.

Payload semantics

Recommendation:

  • keep the current project-local packet model:
    • one qword = one standalone GS register write
  • use the same destination register twice:
    • BGCOLOR on beat 0
    • BGCOLOR again on beat 1 with a different RGB value

This is the simplest high-signal option.

Why this goes over the alternatives

Using one register twice keeps the focus on transport, not packet-model growth.

It gives us:

  • two visible GS-side events,
  • two visible GIF-side accepts,
  • a deterministic final platform-visible color,
  • no need to invent a tag-qword/data-qword pairing before we are ready to tackle real GIFtag structure.

Using two different GS registers would blur the phase boundary by making the test partly about GS register-surface breadth. That coverage already exists in the single-register-path and can expand later when GIFtag work begins.

Preload two qwords at consecutive addresses:

  • beat 0 packet:
    • register = BGCOLOR
    • value = a clearly non-final color, for example blue
  • beat 1 packet:
    • register = BGCOLOR
    • value = the intended final visible color, for example red

That gives a clean final assertion:

  • two EV_BGCOLOR events occurred,
  • final bg_* equals the second payload.

dmac_reg_stub exact scope

Status target:

  • the current state machine becomes genuinely multi-beat for the signoff case

Owns in this phase

  • repeated qword fetches based on QWC
  • source-address stepping by 16 bytes per beat
  • repeated send/accept handshakes into gif_path_stub
  • correct gif_tag_last behavior on the final beat only
  • transfer completion only after the final beat is accepted

Explicit non-goals

  • channel arbitration
  • chain-mode TADR behavior
  • interrupt-driven completion semantics
  • routing through ee_memory_map_stub

Required behavioral rules

For QWC = 2:

  1. DMA_START.arg2 reports the latched MADR
  2. beat 0 fetch/read uses MADR
  3. beat 1 fetch/read uses MADR + 16
  4. beat 0 emits gif_tag_last = 0
  5. beat 1 emits gif_tag_last = 1
  6. DMA_DONE occurs only after beat 1 is accepted downstream

Implementation note:

  • if the RTL naturally supports arbitrary QWC > 0, that is fine
  • the TB and acceptance criteria only need to lock QWC = 2

gif_path_stub behavior in Wave 2.6

Recommendation:

  • keep gif_path_stub stateless with respect to packet framing
  • keep the current project-local interpretation:
    • every accepted qword is one standalone register write
  • preserve in_last as trace-visible metadata only

This means no new internal "tag phase vs. data phase" state is added yet.

Why this is the right boundary

Wave 2.6 is about transfer length, not packet-model realism.

If we introduce tag/data pairing now, we turn one scoped extension into two:

  • multi-beat DMA sequencing
  • a new packet parser contract

That is exactly the kind of scope expansion we have been avoiding well so far.

Trace policy

No new event names are required for Wave 2.6.

Reuse the current vocabulary:

  • EV_DMA_CFG
  • EV_DMA_START
  • EV_DMA_BEAT
  • EV_DMA_DONE
  • EV_GIFTAG
  • EV_BGCOLOR

Required trace-visible facts

The Wave 2.6 proof must make the following visible in traces:

  • two DMAC-backed RAM reads from consecutive qword addresses
  • two EV_DMA_BEAT events with:
    • beat index 0, source address MADR
    • beat index 1, source address MADR + 16
  • two EV_GIFTAG events with:
    • flags[0] = 0 on beat 0
    • flags[0] = 1 on beat 1
  • two GS-side EV_BGCOLOR events
  • one EV_DMA_DONE after the second beat completes

Cycle-perfect alignment between subsystems is not the primary comparison rule. Order and causality are the primary rules:

  • read for beat 0 precedes send for beat 0
  • read for beat 1 precedes send for beat 1
  • done follows final acceptance

Integration testbench shape

Recommendation:

  • update the existing sim/tb/integration/tb_bgcolor_via_dma.sv

Do not create a second near-duplicate integration TB for first signoff.

Why update instead of fork

Wave 2.6 is a strict extension of the current Milestone A+ proof:

  • same subsystem chain
  • same top-level outcome
  • same trace sinks

Keeping one canonical integration TB reduces maintenance churn and makes the current path's expectations clearer.

If later we want a minimal single-beat smoke test, that can be added as a focused DMAC unit TB rather than cloning the whole platform chain.

TB preload shape

The TB should preload two qwords into ee_ram_stub:

  • first qword at MADR
  • second qword at MADR + 16

Suggested values:

  • beat 0: BGCOLOR = blue
  • beat 1: BGCOLOR = red

TB pass criteria

At minimum, the updated integration TB should require:

  1. mem_reads_dmac >= 2
  2. dma_cfg_count >= 3
  3. dma_start_count == 1
  4. dma_beat_count == 2
  5. dma_done_count == 1
  6. giftag_count == 2
  7. gs_bgcolor_count == 2
  8. first observed beat source address = MADR
  9. second observed beat source address = MADR + 16
  10. first GIF end-of-packet flag = 0
  11. second GIF end-of-packet flag = 1
  12. final bg_* equals the second payload color
  13. platform still renders active pixels correctly after transfer completion

Optional but useful:

  • count post-transfer frames as before
  • assert that DMA_DONE is observed only after the second EV_GIFTAG

Exit criteria

Wave 2.6 is complete when all of the following are true:

  • dmac_reg_stub performs a two-beat memory-backed transfer
  • MADR stepping is trace-visible and correct
  • gif_tag_last is false on beat 0 and true on beat 1
  • GIF accepts two qwords without new packet-state machinery
  • GS observes two BGCOLOR writes
  • platform video settles to the second color
  • make full_checks remains green

Next step after Wave 2.6

If this passes cleanly, the next recommended step is:

  • route DMAC through ee_memory_map_stub

That is the right moment for topology cleanup, because the transfer behavior itself will already be stable and trace-proven.