# Decision 0006: VRAM Roadmap Status: `In progress` — Ch251.4 near-term rescue applied, longer-term work queued. ## Context The Ch251 hardware demo build (`de25_nano_psmct32_raster_demo_top`) failed the Quartus Fitter on Agilex 5 with **516 / 358 M20K** (144%). The Fitter resource report attributed ~410 M20Ks to two replicated `vram_bram_stub` banks: ``` u_demo|u_vram|mem_rtl_0 Logical Size: 4194304 bits M20K blocks: 204.800 u_demo|u_vram|mem_rtl_1 Logical Size: 4194304 bits M20K blocks: 204.800 ``` Root cause: `vram_bram_stub` exposes **1 write + 2 independent read ports**. An M20K block has at most two physical ports total (and at most one write port). To honour 1W + 2R, Quartus replicates the entire storage so each read port gets its own simple-dual-port BRAM, with the write fanned to both copies. True dual-port would not have rescued this — TDP still gives only 2 physical ports, not 3. The two read ports serve distinct clients: - **read** — PCRTC scanout (every pixel) - **read2** — PSMT4 RMW old-byte read on the rasterizer write path The Ch251 build draws PSMCT32 sprites only. The PSMT4 RMW pipe is wired but never fires (`is_t4_emit` stays low), so read2 is dead weight on hardware. ## Decision (Near-Term — Ch251.4) Add a parameter `ENABLE_READ2` to `vram_bram_stub`: - Default `1` keeps every simulation TB and every PSMT4-exercising path byte-identical. - Hardware top (`de25_nano_psmct32_raster_demo_top`) overrides to `0`. When disabled, the read2 always_ff branch contains **no reference** to `mem`, so Quartus infers a single 1W+1R simple-dual-port BRAM (~205 M20Ks at 512 KiB) instead of two replicas (~410 M20Ks). This is a **scoped hardware-demo build profile**, not a general fix. It is correct only as long as the hardware build is PSMCT32 (or any non-PSMT4 format). Any future hardware build that exercises PSMT4 RMW must either re-enable read2 (and accept the M20K cost) or first land the long-term architecture below. ## Decision (Long-Term) Before the GS path expands beyond PSMCT32 on hardware (PSMT4 RMW, broader format coverage, or a larger framebuffer), replace the replicated-multi-read VRAM with one of: 1. **Arbitrated TDP VRAM scheduler** — one TDP backing memory. Port A serves PCRTC reads with priority; port B serves the writer / RMW path. PSMT4 RMW becomes multi-cycle and may stall raster writes. This is the most correct long-term FPGA shape. 2. **Line-buffer scanout** — PCRTC reads short bursts into a small line FIFO/line-buffer once per scanline, freeing the VRAM ports for writes for the rest of the line. More complex but closer to a scalable video architecture. 3. **Bank/tile partitioning** — split VRAM by banks so different clients typically hit different banks. Still needs conflict handling. Useful as a later optimization, not as the first replacement. Eventually larger memory surfaces (≥ a few MiB of true PS2 VRAM, or the 32 MiB main RAM) will need SDRAM/HPS/DDR-backed storage with tiled BRAM caches; the all-M20K convenience model does not scale. ## Triggers — when to revisit (Ch252) Re-open this decision and land one of the long-term options above when **any** of the following becomes true on a hardware build: 1. **PSMT4 RMW returns to the rasterizer write path on hardware.** Any GS draw flow that consults `is_t4_emit` needs the second VRAM read port live, which re-introduces the replication cost. 2. **More than one VRAM read client during scanout.** The current profile is one read client (PCRTC). A second simultaneous read consumer — texture cache fetch, CLUT sampler from VRAM, secondary display window, anything that races PCRTC for read bandwidth — recreates the 1W+nR shape that forced Quartus replication in the first place. 3. **VRAM_BYTES grows beyond the current 512 KiB profile.** 512 KiB already costs ~205 M20Ks per replica at Agilex 5 packing. Any expansion (larger framebuffer, multi-format scratch space, texture storage) at the current replicated shape exceeds the device budget. A simulation/elaboration tripwire in `vram_bram_stub.sv` fires (`$display` + `$fatal`) when `ENABLE_READ2 = 1` **and** `BYTES >= 262_144` (256 KiB). 256 KiB is not magical — it is the threshold above which replicated VRAM becomes a board-level architectural decision rather than a casual parameter flip. The tripwire is a loud canary in lint / sim; the **real protection is the board-top parameter profile**. ## Consequences - Ch251 ships on hardware with the read2-strip build profile. The bring-up runbook documents the override so anyone reading it later sees the explicit trade-off. - Simulation regressions stay byte-identical (default `ENABLE_READ2 = 1`). - Any chapter that re-enables PSMT4 on hardware **must** land an arbitrated / line-buffered VRAM design first. Surfacing this as a decision record keeps it from quietly slipping when scope expands. - The Ch251 failure was a warning shot about VRAM strategy, not a fundamental blocker on the PS2 core. Actual 512 KiB framebuffer storage is ~205 M20Ks; the over-budget portion was the second full copy.