# 0009 — Combined textured + alpha + depth: per-pixel memory-op schedule

**Status:** proven in sim (Ch302), board-pending. Local BRAM probe; NOT yet tiled VRAM.

## Why this exists

Before designing tiled/LPDDR-backed VRAM we need the exact per-pixel read/write
schedule a primitive that is simultaneously **textured + alpha-blended +
depth-tested** demands. Until Ch302 those three GS features were *mutually
exclusive* (each the sole `read2` consumer for its primitive). Ch302 lifts that —
behind the default-off `COMBINED_TAZ` param — with an explicit walker-stalling
multi-beat FSM in `gs_stub`, so the schedule is observable and asserted.

Speed was explicitly NOT a goal; the correct, observable schedule is.

## The per-pixel schedule (single read2 port, single write port)

Z-test is issued FIRST so a hidden pixel costs one read and nothing else:

| Beat | read2 (1-cyc registered) | compute | write port |
|------|--------------------------|---------|------------|
| 0 `CB_Z`   | issue **stored-Z** read (`z_rd_en`) | — | — |
| 1 `CB_ZW`  | (issue **texel** read iff Z passes) | Z-test (GEQUAL): frag_z vs stored_z. **FAIL → stop** (no texel/dest read, no write; advance) | — |
| 2 `CB_T`   | issue **dest-color** read (`fb_rd_en`) | latch texel as Cs + As (=texel α) | — |
| 3 `CB_FB`  | — | blend `Cv=((Cs−Cd)·As)>>7+Cd` | **write color** (blended) → FB |
| 4 `CB_ZWR` | — | — | **write Z** → Z-buffer (skip if ZMSK); then advance walker |

The three reads land on the single read2 port in **separate cycles**, so the
existing read2 priority mux + its mutual-exclusion `$error` asserts are untouched
(one consumer per cycle). The two writes serialize on the single write port
(color beat 3, Z beat 4). The walker does not advance to the next candidate
pixel until BOTH writes complete.

## The concrete requirement for tiled VRAM

- **hidden pixel: 1 read, 0 writes** (stored-Z only).
- **visible pixel: 3 reads + 2 writes** — stored-Z, texel, dest-color reads;
  color + Z writes.

So tile-local memory must serve **up to 3 reads + 2 writes per pixel**. The
options this makes concrete (no longer hand-wavy):
- a **2-read-port** tile RAM (e.g. texel + Z in parallel, dest folded in) + a
  write path, OR
- a **3-phase read schedule** on fewer ports (what this probe does, serialized),
  trading throughput for ports, OR
- tile-local banking that absorbs the dest read-modify-write locally.

Z-first ordering means the texel/dest bandwidth is only spent on visible pixels —
a real saving the tiled design should preserve.

## Verification (tb_top_psmct32_combined_demo)

A green Z-writing background + one TME+ABE+ZTE triangle whose interpolated Z
crosses the background Z (top half passes, bottom fails). A **memory-op tracer**
records, per pixel, the read enables + write addresses and asserts the SEQUENCE
(not just final pixels):
- depth-FAIL: z-read=1, texel-read=0, dest-read=0, color-write=0, Z-write=0 → pixel stays background green.
- depth-PASS: z-read=1, texel-read=1, dest-read=1, color-write=1, Z-write=1 → blend(texel, green); texel RGB and green dest both present.
Result: 35 PASS / 7 FAIL / 160 outside, errors=0. Param=0 keeps all prior demos byte-identical.

## Out of scope (deliberately)
Perspective (affine only — perspective proven separately, Ch301), alpha-test /
texture-alpha discard, non-PSMCT32 dest, and throughput (multi-beat is fine here).