Files
retroDE_ps2/docs/decisions/0009-combined-tex-alpha-depth-schedule.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

64 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 0009 — Combined textured + alpha + depth: per-pixel memory-op schedule
**Status:** proven in sim (Ch302), board-pending. Local BRAM probe; NOT yet tiled VRAM.
## Why this exists
Before designing tiled/LPDDR-backed VRAM we need the exact per-pixel read/write
schedule a primitive that is simultaneously **textured + alpha-blended +
depth-tested** demands. Until Ch302 those three GS features were *mutually
exclusive* (each the sole `read2` consumer for its primitive). Ch302 lifts that —
behind the default-off `COMBINED_TAZ` param — with an explicit walker-stalling
multi-beat FSM in `gs_stub`, so the schedule is observable and asserted.
Speed was explicitly NOT a goal; the correct, observable schedule is.
## The per-pixel schedule (single read2 port, single write port)
Z-test is issued FIRST so a hidden pixel costs one read and nothing else:
| Beat | read2 (1-cyc registered) | compute | write port |
|------|--------------------------|---------|------------|
| 0 `CB_Z` | issue **stored-Z** read (`z_rd_en`) | — | — |
| 1 `CB_ZW` | (issue **texel** read iff Z passes) | Z-test (GEQUAL): frag_z vs stored_z. **FAIL → stop** (no texel/dest read, no write; advance) | — |
| 2 `CB_T` | issue **dest-color** read (`fb_rd_en`) | latch texel as Cs + As (=texel α) | — |
| 3 `CB_FB` | — | blend `Cv=((CsCd)·As)>>7+Cd` | **write color** (blended) → FB |
| 4 `CB_ZWR` | — | — | **write Z** → Z-buffer (skip if ZMSK); then advance walker |
The three reads land on the single read2 port in **separate cycles**, so the
existing read2 priority mux + its mutual-exclusion `$error` asserts are untouched
(one consumer per cycle). The two writes serialize on the single write port
(color beat 3, Z beat 4). The walker does not advance to the next candidate
pixel until BOTH writes complete.
## The concrete requirement for tiled VRAM
- **hidden pixel: 1 read, 0 writes** (stored-Z only).
- **visible pixel: 3 reads + 2 writes** — stored-Z, texel, dest-color reads;
color + Z writes.
So tile-local memory must serve **up to 3 reads + 2 writes per pixel**. The
options this makes concrete (no longer hand-wavy):
- a **2-read-port** tile RAM (e.g. texel + Z in parallel, dest folded in) + a
write path, OR
- a **3-phase read schedule** on fewer ports (what this probe does, serialized),
trading throughput for ports, OR
- tile-local banking that absorbs the dest read-modify-write locally.
Z-first ordering means the texel/dest bandwidth is only spent on visible pixels —
a real saving the tiled design should preserve.
## Verification (tb_top_psmct32_combined_demo)
A green Z-writing background + one TME+ABE+ZTE triangle whose interpolated Z
crosses the background Z (top half passes, bottom fails). A **memory-op tracer**
records, per pixel, the read enables + write addresses and asserts the SEQUENCE
(not just final pixels):
- depth-FAIL: z-read=1, texel-read=0, dest-read=0, color-write=0, Z-write=0 → pixel stays background green.
- depth-PASS: z-read=1, texel-read=1, dest-read=1, color-write=1, Z-write=1 → blend(texel, green); texel RGB and green dest both present.
Result: 35 PASS / 7 FAIL / 160 outside, errors=0. Param=0 keeps all prior demos byte-identical.
## Out of scope (deliberately)
Perspective (affine only — perspective proven separately, Ch301), alpha-test /
texture-alpha discard, non-PSMCT32 dest, and throughput (multi-beat is fine here).