RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.1 KiB
Decision 0006: VRAM Roadmap
Status: In progress — Ch251.4 near-term rescue applied, longer-term work
queued.
Context
The Ch251 hardware demo build (de25_nano_psmct32_raster_demo_top) failed the
Quartus Fitter on Agilex 5 with 516 / 358 M20K (144%). The Fitter resource
report attributed ~410 M20Ks to two replicated vram_bram_stub banks:
u_demo|u_vram|mem_rtl_0 Logical Size: 4194304 bits M20K blocks: 204.800
u_demo|u_vram|mem_rtl_1 Logical Size: 4194304 bits M20K blocks: 204.800
Root cause: vram_bram_stub exposes 1 write + 2 independent read ports.
An M20K block has at most two physical ports total (and at most one write
port). To honour 1W + 2R, Quartus replicates the entire storage so each read
port gets its own simple-dual-port BRAM, with the write fanned to both copies.
True dual-port would not have rescued this — TDP still gives only 2 physical
ports, not 3.
The two read ports serve distinct clients:
- read — PCRTC scanout (every pixel)
- read2 — PSMT4 RMW old-byte read on the rasterizer write path
The Ch251 build draws PSMCT32 sprites only. The PSMT4 RMW pipe is wired but
never fires (is_t4_emit stays low), so read2 is dead weight on hardware.
Decision (Near-Term — Ch251.4)
Add a parameter ENABLE_READ2 to vram_bram_stub:
- Default
1keeps every simulation TB and every PSMT4-exercising path byte-identical. - Hardware top (
de25_nano_psmct32_raster_demo_top) overrides to0. When disabled, the read2 always_ff branch contains no reference tomem, so Quartus infers a single 1W+1R simple-dual-port BRAM (~205 M20Ks at 512 KiB) instead of two replicas (~410 M20Ks).
This is a scoped hardware-demo build profile, not a general fix. It is correct only as long as the hardware build is PSMCT32 (or any non-PSMT4 format). Any future hardware build that exercises PSMT4 RMW must either re-enable read2 (and accept the M20K cost) or first land the long-term architecture below.
Decision (Long-Term)
Before the GS path expands beyond PSMCT32 on hardware (PSMT4 RMW, broader format coverage, or a larger framebuffer), replace the replicated-multi-read VRAM with one of:
-
Arbitrated TDP VRAM scheduler — one TDP backing memory. Port A serves PCRTC reads with priority; port B serves the writer / RMW path. PSMT4 RMW becomes multi-cycle and may stall raster writes. This is the most correct long-term FPGA shape.
-
Line-buffer scanout — PCRTC reads short bursts into a small line FIFO/line-buffer once per scanline, freeing the VRAM ports for writes for the rest of the line. More complex but closer to a scalable video architecture.
-
Bank/tile partitioning — split VRAM by banks so different clients typically hit different banks. Still needs conflict handling. Useful as a later optimization, not as the first replacement.
Eventually larger memory surfaces (≥ a few MiB of true PS2 VRAM, or the 32 MiB main RAM) will need SDRAM/HPS/DDR-backed storage with tiled BRAM caches; the all-M20K convenience model does not scale.
Triggers — when to revisit (Ch252)
Re-open this decision and land one of the long-term options above when any of the following becomes true on a hardware build:
-
PSMT4 RMW returns to the rasterizer write path on hardware. Any GS draw flow that consults
is_t4_emitneeds the second VRAM read port live, which re-introduces the replication cost. -
More than one VRAM read client during scanout. The current profile is one read client (PCRTC). A second simultaneous read consumer — texture cache fetch, CLUT sampler from VRAM, secondary display window, anything that races PCRTC for read bandwidth — recreates the 1W+nR shape that forced Quartus replication in the first place.
-
VRAM_BYTES grows beyond the current 512 KiB profile. 512 KiB already costs ~205 M20Ks per replica at Agilex 5 packing. Any expansion (larger framebuffer, multi-format scratch space, texture storage) at the current replicated shape exceeds the device budget.
A simulation/elaboration tripwire in vram_bram_stub.sv fires
($display + $fatal) when ENABLE_READ2 = 1 and
BYTES >= 262_144 (256 KiB). 256 KiB is not magical — it is the
threshold above which replicated VRAM becomes a board-level
architectural decision rather than a casual parameter flip. The
tripwire is a loud canary in lint / sim; the real protection is the
board-top parameter profile.
Consequences
- Ch251 ships on hardware with the read2-strip build profile. The bring-up runbook documents the override so anyone reading it later sees the explicit trade-off.
- Simulation regressions stay byte-identical (default
ENABLE_READ2 = 1). - Any chapter that re-enables PSMT4 on hardware must land an arbitrated / line-buffered VRAM design first. Surfacing this as a decision record keeps it from quietly slipping when scope expands.
- The Ch251 failure was a warning shot about VRAM strategy, not a fundamental blocker on the PS2 core. Actual 512 KiB framebuffer storage is ~205 M20Ks; the over-budget portion was the second full copy.