Files
retroDE_ps2/docs/decisions/0012-ch347-clut-psmt8-sprite.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

6.0 KiB
Raw Blame History

0012 — Ch347: CLUT (PSMT8) textured-alpha sprites

Status: planned (synthetic brick buildable now; authentic acceptance gated on a real capture) Date: 2026-06-23

Goal

Extend the Ch344/Ch345a textured-alpha SPRITE path from PSMCT32-only to PSMT8 indexed (CLUT) textures: TEX0.PSM=0x13 → fetch 8-bit index from VRAM → CLUT → ABGR texel → MODULATE → source-over alpha. This is the first "real game" GS feature beyond the homebrew corpus (which is anomalously all-PSMCT32); PS2 titles lean on palettized textures to fit VRAM, so a richer free corpus (Ch347 target: a ScummVM-freeware capture, Beneath a Steel Sky) forces CLUT. Scope is PSMT8 only — PSMT4 (nibble/RMW) deferred unless census forces it.

Key finding: the CLUT machinery is ~95% already built (search-before-reimplement)

The platform already has, and PROVES for textured TRI/SPRITE DECAL (Ch296/297/299/314):

  • clut_stub.sv — 256×32 CLUT RAM, two combinational read ports; one is already dedicated to the texture sampler (tex_read_idxtex_read_data).
  • clut_loader_stub.sv — VRAM→CLUT load FSM, CLD-mode policy, PSMCT32/PSMCT16 unpack, load_busy guards read2.
  • gs_texel_addr.sv PSMT8 path — 1 byte/texel linear byte address; gs_swizzle_psmt8_stub.sv for swizzle.
  • gs_texture_unit.sv (Ch296) — byte-lane extract from the 32-bit word + CLUT lookup; output is .tex_color.
  • gs_stub already decodes TEX0 CLUT fields (CBP/CPSM/CSM/CSA/CLD) and the textured-DECAL gate already admits PSM 0x13/0x14.

Critically: the Ch344 half-rate sprite datapath captures s1_tex_color, and s1_tex_color IS the gs_texture_unit output (gs_stub.sv:4352) — i.e. already CLUT-decoded for PSMT8. So the CLUT decode happens upstream of the half-rate capture.

What actually needs doing

  1. Relax the textured-alpha SPRITE eligibility gate (new_tex_abe_active, gs_stub.sv ~:5114): (tex0_psm==6'h00)(tex0_psm==6'h00 || tex0_psm==6'h13) (PSMT8). PSMT4 (0x14) left out for v1.
  2. Validate the timing — the one real risk. PSMT8 adds a byte-lane SELECT; under TEX_RD_REGISTERED=1 (the board config) the selector is realigned (SEL_DELAY). The Ch344 half-rate capture (ta_tex_q/ta_tex_q1, the 1-deep texel delay) was tuned to PSMCT32's registered-read latency. We must prove the CLUT-decoded texel is still valid at the frozen-beat capture for PSMT8 — a COMBINATIONAL-read TB would be a FALSE GREEN (this exact trap bit Ch344). Use a registered-read TB.
  3. CLUT precondition: a TEX0_1 write with CLD≠0 must fire (loading clut_stub) before the sprite draws — same precondition as the proven indexed-DECAL path; declared, asserted in the TB.

Pre-fit synthetic TB (buildable NOW — no capture needed), proving Codex's 5 points

tb_gs_psmt8_alpha_sprite (registered-read model, SPRITE_TEX_ALPHA=1, TEX_RD_REGISTERED=1):

  1. index fetch hits the right byte (PSMT8 linear address → correct VRAM byte lane);
  2. CLUT maps index → ABGR (program clut_stub via a CLD≠0 TEX0 / loader);
  3. the texel's alpha (from the CLUT entry) drives source-over against the dest;
  4. no read2 collision regression (texel read on primary beat, dest on frozen beat, CLUT lookup is combinational — assert no overlap, incl. vs load_busy);
  5. the PSMCT32 sprite path stays green (cross-check the existing tb_gs_textured_alpha_sprite + regression).

Acceptance for the synthetic brick: TB passes + full regression + quartus_syn 0-err. This banks the hardware without claiming authentic content.

Synthetic ≠ authentic — two separate labels (Codex)

The datapath proof (tb_gs_psmt8_alpha_sprite) proves index→CLUT→ABGR→source-over works. It is NOT authentic CLUT ingestion. Authentic PSMT8 additionally requires the emitted TEX0's CLUT-side fields to select a CLUT that is actually loaded and resident:

  • Screening (DONE, Ch346): gs_texture_residency.py now decodes CBP/CPSM/CSM/CSA/CLD and, for indexed-PSM (0x13/0x14) candidates, REQUIRES a resident CLUT upload at CBP before the draw (epoch-tracked, same as the texture) — else REJECT. It also flags CLD=0 (no load trigger -> possibly-stale palette). So residency_ok() won't green-light a PSMT8 candidate whose palette isn't resident.
  • Emission (capture-step TODO): the feeder/translator must carry the CLUT-side TEX0 fields. Today ps2_feeder.c's tex0 TBP TBW TW TH TFX grammar packs ONLY texture-side fields — it needs CBP/CPSM/CSM/CSA/ CLD added (and the fixture must upload the palette to CBP + a CLD!=0 TEX0 so clut_loader_stub fires). Build this around the exact Ch346-selected candidate, not speculatively.

Board-fit guardrail (Codex guardrail 1) — RESOLVED

The "missing HDMI IO_STANDARD" the synth smoke reported was a FALSE alarm: the assignments are present + correct in the QSF (with an -entity qualifier); the scaffold check's regexes were EOL-anchored and didn't tolerate the qualifier. Fixed 3 checks in sim/Makefile (VIRTUAL_PIN + HDMI/ADV7513 IO_STANDARD). The QSF carries the full 77-source list (incl. osd/qsys platform modules under USE_QSYS_TOP) so the owner's board fit is unaffected. NOTE: quartus_syn_only itself is a reduced smoke (files.f, 115 entries) that OMITS the platform modules, so it can't fully elaborate the de25 top — a pre-existing smoke-scope limitation, not a board-fit blocker. Quartus analyzed the Ch347 gs_stub change clean (the 7 elaboration errors are all unrelated platform entities).

Authentic acceptance (gated on the capture — do NOT commit the target until it exists)

  1. Capture a Beneath a Steel Sky (ScummVM-freeware) GS dump.
  2. gs_texture_residency.py (Ch346) picks a RESIDENT, plausible PSMT8 candidate WITH a resident CLUT — prefer a no-wrap footprint so we don't repeat the Ch345b wrap-mode ambiguity.
  3. Extend ps2_feeder.c/translator with CLUT-side TEX0 fields + palette upload; emit the scene; software reference pixel-diffs; then board fit (after confirming the board profile's clut_load_busy wiring).

Provenance: all dump-derived content stays LOCAL/gitignored, same discipline as the cube/sprite fixtures.