ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
64 lines
3.3 KiB
Markdown
64 lines
3.3 KiB
Markdown
# 0009 — Combined textured + alpha + depth: per-pixel memory-op schedule
|
||
|
||
**Status:** proven in sim (Ch302), board-pending. Local BRAM probe; NOT yet tiled VRAM.
|
||
|
||
## Why this exists
|
||
|
||
Before designing tiled/LPDDR-backed VRAM we need the exact per-pixel read/write
|
||
schedule a primitive that is simultaneously **textured + alpha-blended +
|
||
depth-tested** demands. Until Ch302 those three GS features were *mutually
|
||
exclusive* (each the sole `read2` consumer for its primitive). Ch302 lifts that —
|
||
behind the default-off `COMBINED_TAZ` param — with an explicit walker-stalling
|
||
multi-beat FSM in `gs_stub`, so the schedule is observable and asserted.
|
||
|
||
Speed was explicitly NOT a goal; the correct, observable schedule is.
|
||
|
||
## The per-pixel schedule (single read2 port, single write port)
|
||
|
||
Z-test is issued FIRST so a hidden pixel costs one read and nothing else:
|
||
|
||
| Beat | read2 (1-cyc registered) | compute | write port |
|
||
|------|--------------------------|---------|------------|
|
||
| 0 `CB_Z` | issue **stored-Z** read (`z_rd_en`) | — | — |
|
||
| 1 `CB_ZW` | (issue **texel** read iff Z passes) | Z-test (GEQUAL): frag_z vs stored_z. **FAIL → stop** (no texel/dest read, no write; advance) | — |
|
||
| 2 `CB_T` | issue **dest-color** read (`fb_rd_en`) | latch texel as Cs + As (=texel α) | — |
|
||
| 3 `CB_FB` | — | blend `Cv=((Cs−Cd)·As)>>7+Cd` | **write color** (blended) → FB |
|
||
| 4 `CB_ZWR` | — | — | **write Z** → Z-buffer (skip if ZMSK); then advance walker |
|
||
|
||
The three reads land on the single read2 port in **separate cycles**, so the
|
||
existing read2 priority mux + its mutual-exclusion `$error` asserts are untouched
|
||
(one consumer per cycle). The two writes serialize on the single write port
|
||
(color beat 3, Z beat 4). The walker does not advance to the next candidate
|
||
pixel until BOTH writes complete.
|
||
|
||
## The concrete requirement for tiled VRAM
|
||
|
||
- **hidden pixel: 1 read, 0 writes** (stored-Z only).
|
||
- **visible pixel: 3 reads + 2 writes** — stored-Z, texel, dest-color reads;
|
||
color + Z writes.
|
||
|
||
So tile-local memory must serve **up to 3 reads + 2 writes per pixel**. The
|
||
options this makes concrete (no longer hand-wavy):
|
||
- a **2-read-port** tile RAM (e.g. texel + Z in parallel, dest folded in) + a
|
||
write path, OR
|
||
- a **3-phase read schedule** on fewer ports (what this probe does, serialized),
|
||
trading throughput for ports, OR
|
||
- tile-local banking that absorbs the dest read-modify-write locally.
|
||
|
||
Z-first ordering means the texel/dest bandwidth is only spent on visible pixels —
|
||
a real saving the tiled design should preserve.
|
||
|
||
## Verification (tb_top_psmct32_combined_demo)
|
||
|
||
A green Z-writing background + one TME+ABE+ZTE triangle whose interpolated Z
|
||
crosses the background Z (top half passes, bottom fails). A **memory-op tracer**
|
||
records, per pixel, the read enables + write addresses and asserts the SEQUENCE
|
||
(not just final pixels):
|
||
- depth-FAIL: z-read=1, texel-read=0, dest-read=0, color-write=0, Z-write=0 → pixel stays background green.
|
||
- depth-PASS: z-read=1, texel-read=1, dest-read=1, color-write=1, Z-write=1 → blend(texel, green); texel RGB and green dest both present.
|
||
Result: 35 PASS / 7 FAIL / 160 outside, errors=0. Param=0 keeps all prior demos byte-identical.
|
||
|
||
## Out of scope (deliberately)
|
||
Perspective (affine only — perspective proven separately, Ch301), alpha-test /
|
||
texture-alpha discard, non-PSMCT32 dest, and throughput (multi-beat is fine here).
|