# 0009 — Combined textured + alpha + depth: per-pixel memory-op schedule **Status:** proven in sim (Ch302), board-pending. Local BRAM probe; NOT yet tiled VRAM. ## Why this exists Before designing tiled/LPDDR-backed VRAM we need the exact per-pixel read/write schedule a primitive that is simultaneously **textured + alpha-blended + depth-tested** demands. Until Ch302 those three GS features were *mutually exclusive* (each the sole `read2` consumer for its primitive). Ch302 lifts that — behind the default-off `COMBINED_TAZ` param — with an explicit walker-stalling multi-beat FSM in `gs_stub`, so the schedule is observable and asserted. Speed was explicitly NOT a goal; the correct, observable schedule is. ## The per-pixel schedule (single read2 port, single write port) Z-test is issued FIRST so a hidden pixel costs one read and nothing else: | Beat | read2 (1-cyc registered) | compute | write port | |------|--------------------------|---------|------------| | 0 `CB_Z` | issue **stored-Z** read (`z_rd_en`) | — | — | | 1 `CB_ZW` | (issue **texel** read iff Z passes) | Z-test (GEQUAL): frag_z vs stored_z. **FAIL → stop** (no texel/dest read, no write; advance) | — | | 2 `CB_T` | issue **dest-color** read (`fb_rd_en`) | latch texel as Cs + As (=texel α) | — | | 3 `CB_FB` | — | blend `Cv=((Cs−Cd)·As)>>7+Cd` | **write color** (blended) → FB | | 4 `CB_ZWR` | — | — | **write Z** → Z-buffer (skip if ZMSK); then advance walker | The three reads land on the single read2 port in **separate cycles**, so the existing read2 priority mux + its mutual-exclusion `$error` asserts are untouched (one consumer per cycle). The two writes serialize on the single write port (color beat 3, Z beat 4). The walker does not advance to the next candidate pixel until BOTH writes complete. ## The concrete requirement for tiled VRAM - **hidden pixel: 1 read, 0 writes** (stored-Z only). - **visible pixel: 3 reads + 2 writes** — stored-Z, texel, dest-color reads; color + Z writes. So tile-local memory must serve **up to 3 reads + 2 writes per pixel**. The options this makes concrete (no longer hand-wavy): - a **2-read-port** tile RAM (e.g. texel + Z in parallel, dest folded in) + a write path, OR - a **3-phase read schedule** on fewer ports (what this probe does, serialized), trading throughput for ports, OR - tile-local banking that absorbs the dest read-modify-write locally. Z-first ordering means the texel/dest bandwidth is only spent on visible pixels — a real saving the tiled design should preserve. ## Verification (tb_top_psmct32_combined_demo) A green Z-writing background + one TME+ABE+ZTE triangle whose interpolated Z crosses the background Z (top half passes, bottom fails). A **memory-op tracer** records, per pixel, the read enables + write addresses and asserts the SEQUENCE (not just final pixels): - depth-FAIL: z-read=1, texel-read=0, dest-read=0, color-write=0, Z-write=0 → pixel stays background green. - depth-PASS: z-read=1, texel-read=1, dest-read=1, color-write=1, Z-write=1 → blend(texel, green); texel RGB and green dest both present. Result: 35 PASS / 7 FAIL / 160 outside, errors=0. Param=0 keeps all prior demos byte-identical. ## Out of scope (deliberately) Perspective (affine only — perspective proven separately, Ch301), alpha-test / texture-alpha discard, non-PSMCT32 dest, and throughput (multi-beat is fine here).