Files
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

41 KiB
Raw Permalink Blame History

SIO2 / pad input contract

Status: Draft / partial impl (Ch233 recon + Ch234 Option-A implementation landed). RTL: rtl/iop/sio2_input_stub.sv. Successor chapters (Ch235+) extend to analog / SIF mailbox / faithful SIO2.


Ch234 implementation (landed)

sio2_input_stub.sv is the Option-A surface from the recon below. It sits inside iop_memory_map_stub and translates the bridge-domain INPUT_P1 / INPUT_P2 bitmaps into a Sony-format 16-bit digital pad word readable from the IOP-side MMIO bus.

IOP MMIO surface (retroDE-local, not Sony-compatible):

Offset Reg Layout
0x1F80_8500 PAD_P1_STATE [7:0]=byte3 (D-pad/start/select/sticks), [15:8]=byte4 (face/shoulder), [31:16]=0
0x1F80_8504 PAD_P2_STATE Same shape, sourced from INPUT_P2
0x1F80_8508 PAD_STATUS [0]=present/valid=1, [31:1]=0
other reserved Read 0; write accepted-and-ignored

CDC: 2-FF synchronizer per bit on each of the 32-bit INPUT_P1 and INPUT_P2 inputs. Bridge writes at retrodesd's ≤ 1 kHz rate are millions of design-clock cycles apart, so partial-bit tearing during the sync settling window is theoretically possible but practically vanishingly rare. A future chapter can promote to "snapshot CDC" (latch + 2-sample coherency) if tearing ever becomes observable.

Active-high → active-low: each INPUT_P1 bit equal to 1 (pressed) maps to the corresponding Sony bit equal to 0 (pressed). Two combinational ~{...} assigns do the per-bit permutation + inversion in one cycle each.

Coverage: sim/tb/iop/tb_sio2_input_stub.sv exercises the new module directly (without going through the IOP map): reset state (all reads 0 except PAD_STATUS); no-buttons → Sony word 0xFFFF; single-bit pressed across all 16 retroDE bits; JOY_OSD (bit 16) deliberately not forwarded; combos (START+SELECT, face+D-pad); P1/P2 independence with distinct patterns; writes accepted-and-ignored; out-of-range word offsets read 0; clearing returns to 0xFFFF. 152 PASS sim regression intact (151 baseline

  • new TB).

The iop_memory_map_stub now also routes the new region in its read-response mux and trace; CPU reads to addresses in 0x1F80_8500..0x1F80_85FF route to the stub, others fall through unchanged. Sixteen existing IOP-map-consuming TBs gained a .input_p1(32'd0), .input_p2(32'd0) tie-off since the map signature gained two new input ports.

Bridge-side output ports landed in Ch235. ps2_hps_bridge now exposes input_p1_o / input_p2_o as bridge-clock-domain broadcasts of the Ch222 latches; iop_memory_map_stub.input_p1 / input_p2 consume them directly. The board top wires the bridge's new outputs to a pair of local bridge_input_p1 / bridge_input_p2 nets (unconnected for now — the synth top doesn't yet instantiate the IOP core, but the wires are placed for future hookup).

The full HPS → bridge → IOP path is sim-validated end-to-end by sim/tb/integration/tb_bridge_iop_pad_input.sv: two distinct clocks (100 MHz bclk for the bridge, 33 MHz iclk for the IOP map) so the bridge-clk → IOP-clk CDC inside the sio2_input_stub is genuinely exercised. The TB drives AXI writes into INPUT_P1/P2 at the standard 0x040/0x044 offsets and reads PAD_P1_STATE/PAD_P2_STATE at 0x1F80_8500/0x1F80_8504 — exactly the operator-visible end-to-end flow.


Ch237 — EE-visible pad-state buffer (recon)

Status: Recon (no RTL). Defines how the IOP-local Sony pad word (Ch234) becomes an EE-readable 16-byte buffer that libpad-shaped code can consume.

Why this recon exists

Ch234 gave PS2-side IOP code access to a Sony-format pad word. Ch235 wired the HPS→IOP half on real (sim) silicon. But the EE half — how EE-side software (eventually libpad, or hand-rolled homebrew) sees pad state — is still undefined. Ch237 picks a shape before Ch238 starts soldering RTL.

Survey: SIF infrastructure that already exists

The SIF seam is feature-complete for staged bring-up per rtl/sif/README.md. Relevant already-landed pieces for the pad-state path:

Module What it does
sif_mailbox_stub 4-register mailbox: MSCOM / SMCOM / MSFLG / SMFLG. Both EE-side and IOP-side ports.
sif_dma_iop_ram_bridge_stub EE→IOP DMA: 128-bit qword → 4×32-bit IOP RAM writes (with DEST_BASE_ADDR).
sif_dma_ee_ram_bridge_stub IOP→EE DMA: 4×32-bit IOP beats → 1×128-bit EE-RAM write at DEST_BASE_ADDR. Has last_seen_o.
sif_dma_ack_peer_stub Mailbox doorbell + payload-complete combiner (EE side waits).
sif_dma_ee_ack_peer_stub IOP-driven equivalent (mirror polarity).
boot_install_agent_stub EE-driven boot-image landing through SIF (different traffic shape but same primitives).

The IOP→EE data path already exists in RTL form. A 16-byte pad-state buffer arriving at a fixed EE-RAM address is one sif_dma_ee_ram_bridge transaction — exactly four 32-bit beats. The protocol-combiner peers handle the "payload landed, notify the other side via mailbox flag" sequence both ways.

What does NOT exist today

  • EE-side SIF register decode in ee_memory_map_stub. Real PS2 has SIF MSCOM/SMCOM/MSFLG/SMFLG visible to the EE at 0x1000_F200..0x1000_F2FF; the EE map doesn't yet decode that range. sif_mailbox_stub has an EE-side port, but no EE map routes CPU reads/writes there yet. (The IOP-side map decodes its own SIF window at 0x1D00_0000+.)
  • No EE-side execution primitive in the synth top. Same silicon-truth caveat as the IOP side from Ch236 — tb_* TBs exercise EE↔IOP coordination in sim with real EE/IOP CPU stubs, but the synth top doesn't instantiate either. The path can land in sim and stay sim-only until a future top-integration chapter wires both CPUs in.
  • No libpad / padman RPC layer. Real PS2: padman.irx on IOP receives RPC calls from EE-side libpad, services them with SIF DMAs back to EE buffers. The RPC layer is software on both sides, not RTL. Ch237 scope is the RTL-level buffer-delivery path — the RPC protocol on top can come later.

Three options for the EE-visible surface

Shape: IOP code reads PAD_P1_STATE / PAD_P2_STATE (Ch234), constructs a 16-byte Sony pad-state struct in IOP RAM, DMAs it via sif_dma_ee_ram_bridge_stub to a fixed address in EE RAM (e.g., EE_PAD_BUFFER_BASE = 0x0008_0000). EE-side code reads from that address.

Pros:

  • Uses the existing sif_dma_ee_ram_bridge_stub as-is.
  • Matches the shape libpad expects — pad state lands in EE-allocated memory, EE reads bytes directly.
  • The fixed address is a stub convention; a future libpad layer can carry the real per-port allocation address.
  • 16 bytes = exactly four 32-bit SIF DMA beats = exactly one qword write at the EE-RAM bridge. No partial-quad edge cases.

Cons:

  • Requires an IOP-side execution context that reads PAD_P1_STATE and drives the DMA — but Ch235's tb_bridge_iop_pad_input is the template; we already have small synthetic-IOP-code patterns in tb_iop_* TBs.
  • The DMA path has ack/handshake latency (mailbox doorbell + 4-beat DMA + completion flag). For Ch238's first stub this is fine; for real-time pad polling at 60 Hz it's also more than fine (each transaction is microseconds at typical clock rates).

Option B — Mailbox register packing (smallest possible)

Shape: IOP packs the 16-byte pad state into the 4×32-bit mailbox registers (MSCOM / SMCOM / MSFLG / SMFLG). EE reads them via the (not-yet-decoded) EE-side SIF window.

Pros:

  • No DMA, no payload completion. Just register writes.
  • Even smaller scope than Option A — could be one TB chapter.
  • Mailbox storage already exists.

Cons:

  • Overloads mailbox semantics: real PS2 uses MSFLG/SMFLG as flag/doorbell registers, not data carriers. A naive stub here breaks any future mailbox-based RPC protocol.
  • Not libpad-compatible at all. Real libpad never reads pad state from SIF mailbox registers — it reads from a DMA-populated EE-RAM buffer. Option B would require all EE-side code to use a PS2-local convention.
  • Still requires EE-side SIF window decode, so the "small" advantage shrinks once the EE map work is needed anyway.

Option C — retroDE-local EE MMIO (mirror IOP-side stub)

Shape: Add a pad_input_ee_stub in the EE map at a retroDE-local address (e.g., 0x1B00_8500 deliberately outside any real PS2 region). Combinationally surface the same Sony pad words the IOP-side stub exposes.

Pros:

  • Zero protocol overhead — combinational mirror, single register read.
  • No SIF involvement, no DMA, no handshake.
  • Symmetric with Ch234's IOP-side pattern.

Cons:

  • Doubles the platform-local surface — two non-Sony register windows (IOP + EE) doing the same thing.
  • Bypasses SIF entirely, so it doesn't exercise the EE↔IOP path that libpad / real games actually use.
  • Doesn't help with eventual SIF/RPC compatibility — when Option A lands, Option C becomes dead code.

Recommendation

Option A for the substantive next chapter. Reasoning:

  1. The existing sif_dma_ee_ram_bridge_stub already implements "IOP-side 4 beats → 1 qword EE-RAM write at a known address". Reusing it costs zero new RTL on the data path.
  2. The shape matches libpad's expected dataflow, so future RPC work composes cleanly without semantic refactoring.
  3. The fixed-address convention is a single parameter; a real libpad layer can override it per port without changing the RTL surface.

Option B is tempting for "fastest visible EE-side proof" but breaks libpad-shape; Option C is tempting for symmetry but creates dead code once Option A lands.

Where the path lights up in existing stubs

For a sim-only Ch238 (Option A), the data flow is:

sio2_input_stub.PAD_P1_STATE         // Ch234 — IOP reads here
   │
   ▼  (IOP-side test code: read, copy to IOP RAM)
iop_ram (16 bytes at iop_pad_buffer_addr)
   │
   ▼  IOP DMAC → sif_dma_iop_ram_bridge_stub egress    // EXISTS
sif_dma_stub (EE-side ingress buffer)                  // EXISTS
   │
   ▼  sif_dma_ee_ram_bridge_stub → ee_memory_map.bridge_wr // EXISTS
ee_ram (16 bytes at EE_PAD_BUFFER_BASE)                    // EXISTS
   │
   ▼  EE-side test code: cpu_rd from EE_PAD_BUFFER_BASE
EE-readable pad state                                       ← target

The only new pieces needed are:

  • A small IOP-side test harness that drives the read → DMA sequence (TB-level glue or a tiny synthetic-IOP-code fragment loaded into IOP RAM).
  • A new integration TB that wires all the existing stubs end-to-end and asserts an EE-side read of EE_PAD_BUFFER_BASE matches the Sony pad word from PAD_P1_STATE within some bounded latency.

No new RTL module is strictly required for Ch238 — the path composes from existing primitives. If the integration TB turns up a missing piece (e.g., a more convenient pad-state packing helper), that's a candidate for new RTL; otherwise Ch238 lands as one new TB plus possibly one tiny helper.

Proposed chapter sequence

Ch Scope
Ch238 Integration TB. Wires the existing IOP map (with Ch234 sio2_input_stub) + IOP DMAC + SIF mailbox + SIF DMA primitives + EE map → IOP-side test sequencer reads PAD_P1_STATE, packs a 16-byte Sony struct into IOP RAM, kicks an IOP→EE SIF DMA, signals via mailbox flag, then EE-side TB code reads the buffer at EE_PAD_BUFFER_BASE and verifies the bytes. End-to-end latency expected: ≤ a few microseconds at the existing clock rates.
Ch239 EE-side read surface polish: decode the SIF MSCOM/SMCOM window in ee_memory_map_stub (it currently doesn't decode SIF — fixing that lets the EE CPU stub poll the mailbox pad-ready flag without TB intervention). Optionally a tiny EE-side test program loaded into EE RAM that does lw $v0, EE_PAD_BUFFER_BASE and traces the result.
Ch240+ Real padman/libpad RPC compatibility: define the RPC frame format, build the EE-side request/IOP-side response pair, support multi-port + connected/disconnected state changes. Largest single chapter in the input arc — defer until Ch238+Ch239 are green and there's a real game/BIOS workflow demanding it.

Out of scope for Ch237 / Ch238 / Ch239

  • Analog stick fidelity (still digital-mode-only at all three Ch222 / Ch234 / Ch238 levels).
  • DualShock 2 pressure-sensitive buttons.
  • Multitap support.
  • Vibration / actuator feedback (output direction).
  • Faithful SIO2 protocol emulation at 0x1F80_8200..0x1F80_82FF (deferred per Ch233 / Ch234 reasoning).
  • Top-level synth integration of the IOP and EE cores. Until that lands, Ch238+ are sim-only chapters; the silicon-side story stays the Ch236 disclaimer ("non-zero INPUT_P1 values mean the bridge latch landed, NOT that PS2 code saw it").

Boundary call

The existing SIF DMA + mailbox infrastructure already implements the IOP→EE data delivery path; Ch238 only needs to compose those primitives with a small IOP-side test sequencer and define EE_PAD_BUFFER_BASE. Real libpad/ padman compatibility is a software layer on top of that path, not a separate RTL surface; Ch240+ work, post-MVP for the input arc.


Ch238 implementation (landed)

Option A is proven end-to-end in sim with no new production RTL — the path composes entirely from existing primitives.

New integration TB sim/tb/integration/tb_pad_state_via_sif_to_ee.sv:

Stage Module
HPS AXI write TB drives bridge's AXI4 slave
Bridge latch ps2_hps_bridge (Ch222 INPUT_P1)
Bridge→IOP CDC sio2_input_stub (Ch234 inside IOP map)
IOP read of pad word TB-side IOP read at 0x1F80_8500
16-byte pad packet TB packs Sony struct (status/type/token/byte3/byte4 + analog centers 0x80)
4-beat SIF DMA TB drives sif_dma_ee_ram_bridge_stub.in_*
EE-RAM landing ee_memory_map_stub.bridge_wr_*ee_ram_stub
EE-side verification TB issues DMAC qword read at landing addr

Two clocks (100 MHz bridge, 33 MHz IOP/EE/SIF) so the bridge-clk → IOP-clk CDC inside sio2_input_stub is genuinely exercised end-to-end.

Pad packet layout (16 bytes, packed into 4 little-endian 32-bit beats):

byte 0  : 0x00       success status
byte 1  : 0x41       response type (digital mode)
byte 2  : 0x5A       success token
byte 3  : Sony byte3 D-pad/start/select/sticks  (active-low)
byte 4  : Sony byte4 face/shoulder              (active-low)
bytes 58 : 0x80     RX/RY/LX/LY analog centers (digital mode)
bytes 915: 0x00     reserved (DualShock 2 pressure)

Verified scenarios:

§ INPUT_P1 (AXI write to 0x040) Expected Sony bytes 3/4
§1 0x00000000 (no buttons) byte3=0xFF, byte4=0xFF
§2 0x00000001 (JOY_RIGHT only) byte3=0xDF (bit 5 cleared), byte4=0xFF
§3 `0x00000031 (1<<6)` (RIGHT+START+SEL+△)
§4 0x00000000 (re-clear) byte3=0xFF, byte4=0xFF

The TB also confirms last_seen_o rises after each 4-beat burst (proves the in_last semantics propagate cleanly through the egress bridge's state machine).

Streaming-bridge note (timing artifact, not a bug): the existing sif_dma_ee_ram_bridge_stub advances wr_offset by 16 after every emit (streaming semantics — designed for multi-qword DMAs). Successive scenarios in this TB therefore land at successive 16-byte slots; the TB tracks the per-scenario landing address (EE_PAD_BUFFER_BASE + scenario_idx * 16) and verifies the byte layout at each. A real libpad/padman implementation will need either (a) a bridge-reset between transfers so every padRead() overwrites the same buffer, or (b) an SPS2-side counter so EE knows which slot holds the latest sample. That decision belongs to Ch239+, not Ch238.

P2 is deliberately left out of the first slice per Codex Ch238 framing. The next chapter can either reuse the same 16-byte slot (overwriting P1 each emit) or move to a multi-port layout (P1 at +0, P2 at +16, etc.).

Sim regression bumps from 153 → 154 PASS (new TB only, zero RTL change).


Ch239 — single-slot buffer contract (landed)

Ch238 exposed the streaming offset of sif_dma_ee_ram_bridge_stub (each emit advances wr_offset by 16). For a libpad-style consumer that wants padRead(port, &buf) to return a stable snapshot at a single buffer address, that's the wrong default. Ch239 adds a narrow rewind input that lets a producer reset the streaming offset between transfers — no other SIF semantics change.

RTL change

One new input on rtl/sif/sif_dma_ee_ram_bridge_stub.sv:

input logic rewind_i = 1'b0   // default — keeps existing consumers untouched

Behavior:

  • When rewind_i pulses HIGH (typically one iclk), wr_offset returns to 32'd0 on the next clock edge. The very next emit lands at DEST_BASE_ADDR + 0.
  • The accumulator (acc_data, acc_be, pos) is already zeroed at every emit's tail, so rewind doesn't need to touch them.
  • Rewind is intended to fire between transfers — when the bridge is idle (state == S_ACCUM && pos == 0). Misuse is caught by a sim-only $error assertion; the RTL still applies the rewind so the bug is loud, not silent.

The port has a 1'b0 default so existing instantiations (5 TBs, zero RTL parents) keep their streaming behaviour without code changes. Compile-checked against tb_sif_ee_landing_via_dmac — passes with no modification.

Single-slot buffer contract (new convention)

A producer using rewind gets these guarantees:

Property Value / meaning
Buffer base DEST_BASE_ADDR (parameter; pad-state path uses 0x0008_0000)
Buffer length One 16-byte qword
Rewind cadence One rewind_i pulse BEFORE each 4-beat transfer (between scenarios)
Stale-byte safety Each transfer's bridge_wr_be = 16'hFFFF (all 16 bytes enabled), so a fresh full-length transfer overwrites every byte — no leftover content from a prior transfer can survive
Mid-transfer rewind Illegal. Sim $error. Producer must wait for last_seen_o (or just a few clocks after the in_last beat) before pulsing rewind again

For libpad-style single-slot semantics (padRead(port, &buf) returning the same &buf every call), a producer pulses rewind between each pad packet. The consumer reads from the fixed address; the producer overwrites the slot in place.

Coverage

tb_pad_state_via_sif_to_ee updated to exercise the contract:

  • Every scenario pulses rewind_i BEFORE driving its 4 beats.
  • All four scenarios read from the same EE_PAD_BUFFER_BASE address (no per-scenario indexing — different from the Ch238 streaming-offset workaround).
  • Per-scenario check_eq128 against the expected qword implicitly proves no stale bytes from prior scenarios survived: if any byte leaked through, the 128-bit equality would fire.
  • §3's combo pattern (0xD6 / 0xEF) differs from §1/§2/§4 in multiple bit positions across both pad bytes — a partial-write bug would surface here even if simpler patterns happened to alias.

Existing tb_sif_ee_landing_via_dmac (which tests the bridge's streaming behavior) passes unchanged with the rewind port at its default 1'b0.

What last_seen_o means with rewind

last_seen_o is a level-held latch that rises on the in_last beat's accept. The Ch239 rewind does NOT clear this latch — it only touches wr_offset. A consumer can still gate on last_seen_o to detect "any payload has landed since reset."

A future chapter that wants a per-transfer "fresh data" signal (for libpad's padRead to know there's a new sample) will likely add an emit_done_pulse_o strobe; that's distinct from the rewind path and belongs with Ch240+ work.

Boundary call

Ch239 makes the single-slot buffer contract explicit and tested. A libpad-style consumer can now read a stable 16-byte pad packet at EE_PAD_BUFFER_BASE regardless of how many pad packets the producer has emitted. The next chapter (Ch240) can either decode the EE-side SIF register window in ee_memory_map_stub so EE CPU code can poll a "new sample" flag, or move on to a tiny EE-side test program that just reads from the fixed address.


Ch240 — EE-side consumer reads + branches (landed)

Ch239 stabilised the producer; Ch240 closes the consumer half with an actual EE-core program reading the buffer and branching on its contents. Per Codex framing, no EE-side SIF register decode yet — the EE program polls the fixed RAM-resident buffer directly.

EE test program

                  ; Initialization
slot 0   LUI $1, 0x8008      ; $1 = EE_PAD_BUFFER_KSEG0 (0x80080000)
slot 1   LUI $5, 0x8000      ; $5 = EE_MARKER_KSEG0 base
slot 2   ORI $5, $5, 0x1000  ; $5 = 0x80001000

                  ; Polling loop
LOOP:    LBU $2, 3($1)       ; $2 = pad byte3 (D-pad/start/select/sticks)
         ORI $3, $0, 0xFF
         BEQ $2, $3, MARK_A  ; byte3 = 0xFF → no buttons
         NOP
         ORI $3, $0, 0xDF
         BEQ $2, $3, MARK_B  ; byte3 = 0xDF → JOY_RIGHT only
         NOP
                              ; fall-through → COMBO
COMBO:   ORI $6, $0, 0xCC
         SW  $6, 0($5)        ; marker C
         J LOOP
         NOP
MARK_A:  ORI $6, $0, 0xAA
         SW  $6, 0($5)        ; marker A
         J LOOP
         NOP
MARK_B:  ORI $6, $0, 0xBB
         SW  $6, 0($5)        ; marker B
         J LOOP
         NOP

22 instructions including delay slots; each loop iteration is roughly 10 instructions. The program runs continuously — every scenario the TB drives, the loop sees a new buffer value and writes a fresh marker within ~500 design-clock cycles (well inside the per-scenario wait).

Kseg0 vs useg routing (important detail)

ee_memory_map_stub routes EE-CPU writes to useg addresses (addr[31] == 0) into an internal useg_shadow_mem array, NOT the external ee_ram_stub. The TB's DMAC-side reader goes through ee_ram_stub — different backing store. To make EE writes round-trip through the same RAM the TB samples, the EE program targets kseg0 addresses (0x80000000+):

  • EE_PAD_BUFFER_KSEG0 = 0x8008_0000 (EE reads via LBU at this address; phys = 0x0008_0000 after kseg0 strip; routes to ee_ram_stub)
  • EE_MARKER_KSEG0 = 0x8000_1000 (EE writes via SW at this address; same kseg0-strip routing)

The TB's DMAC-side reads use the matching physical addresses (0x0008_0000 and 0x0000_1000) — same backing RAM, different access port.

Verified scenarios

§ AXI write to INPUT_P1 Pad byte3 the EE sees Marker written
§1 0x0000_0000 (no buttons) 0xFF 0xAA
§2 0x0000_0001 (RIGHT only) 0xDF (bit 5 cleared) 0xBB
§3 0x0000_0021 (RIGHT + SELECT) 0xDE (bits 0 and 5 cleared) 0xCC
§4 0x0000_0000 (re-clear) 0xFF 0xAA

Each scenario: AXI write → 20-iclk CDC settle → IOP-side read of PAD_P1_STATE to confirm bridge latch arrived → pulse rewind_i → drive 4 SIF beats → wait 500 iclk for the EE program to consume the buffer and write the marker → TB DMAC read of marker byte → assert.

Sim regression

154 → 155 PASS (one new TB only; no production-RTL changes).

What Ch240 explicitly does NOT do

  • No EE-side SIF register decode. The ee_memory_map_stub still doesn't decode the SIF mailbox/flag window at 0x1000_F200..0x1000_F2FF. The EE program polls the RAM buffer directly instead of waiting on a doorbell.
  • No libpad RPC. The marker convention is TB-internal; real libpad would marshal pad state through padman.irx via SIF RPC and into a libpad-allocated buffer with a known per-port address.
  • No buffer-fresh signal. The EE loop doesn't know if it's reading the latest snapshot or the same one twice — it just reads every iteration. Adding an "emit counter" the consumer can compare against is a Ch241+ option.

Audit responses (per Codex)

Loop freshness — does each scenario's marker come from the NEW packet, not stale state? Yes. Two layers of evidence:

  • Each scenario has a distinct expected marker (0xAA / 0xBB / 0xCC / 0xAA). If the EE loop missed a buffer update and read the prior packet, the wrong marker would land and the per-scenario check_eq32 would fire.
  • §4 is the "clear and observe marker returns" case: after §3's combo write left the marker at 0xCC, §4 re-clears INPUT_P1 → byte3 returns to 0xFF → the loop branches to MARK_A → marker overwritten back to 0xAA. That specifically proves the EE loop is consuming live buffer state, not caching the first read.
  • Per-scenario wait is 500 design-clock cycles. Each EE loop iteration is ~10 instructions × ~5 cycles each ≈ 50 cycles, so the wait covers ~10 loop iterations — plenty of slack.

Branch semantics — markers keyed to cleared bits (active-low), not set bits? Yes:

  • 0xFF (all bits SET) = no buttons pressed → MARK_A. Set bits = released. ✓
  • 0xDF (bit 5 CLEARED) = JOY_RIGHT pressed → MARK_B. The cleared bit is what indicates "pressed." ✓
  • 0xDE (bits 5 AND 0 CLEARED) = JOY_RIGHT + JOY_SELECT pressed → falls through to MARK_C. ✓

A polarity inversion would be visible: e.g. if the program treated 0xFF as "all pressed" and branched to MARK_C, §1 would land 0xCC instead of 0xAA and the test would fire. The fact that §1 + §4 both successfully match MARK_A on the "no buttons" stimulus proves the active-low semantics are honored end-to-end (sio2_input_stub's per-bit inversion + the EE program's branch direction).

Boundary call

The full input arc is sim-validated end-to-end: HPS writes INPUT_P1 → bridge latches → IOP-side sio2_input_stub translates to Sony pad bytes → producer packs a 16-byte Sony struct → SIF DMA drops it into EE RAM at a fixed slot (Ch239 rewind keeping the slot stable) → EE-side MIPS code branches on a button bit → writes a per-scenario marker the consumer-side TB samples. Active-low + freshness + clear- and-restore behaviors are all covered by the existing tb_ee_pad_buffer_branch §1–§4 scenarios. Next options: EE-side SIF mailbox/flag decode (Ch242+), per-emit "new sample" gating, or pivot back to a different arc — input is done as far as platform RTL is concerned.


Original recon (Ch233)

Why this doc exists

Ch222Ch232 made the retroDE platform shell live on PS2: HPS writes controller bitmaps into ps2_hps_bridge.INPUT_P1/P2/P1_RAW (offsets 0x040/0x044/0x048), the OSD compositor renders text over PS2 video, and the supervisor menu round-trip is silicon-validated. The next bridge to build is between HPS-visible input latches and PS2-side software that wants to read controller state (eventually a real BIOS / game).

This doc maps that gap so the next code chapter has a small, named target instead of an open question.

Scope (Codex Ch233 framing)

  1. Survey existing PS2-side stubs touching SIO2 / pad / controller paths.
  2. Document what the real PS2 BIOS/game touches first for controller input.
  3. Map Ch222 INPUT_P1/INPUT_P1_RAW bits into a proposed internal pad state format.
  4. Identify the minimal MMIO surface to expose pad status to EE/IOP-side code.
  5. No RTL — the implementation chapter follows.

What exists today

HPS side (Ch222 — landed, silicon-validated by Ch226 DS2 stub)

  • ps2_hps_bridge.INPUT_P1 @ 0x040 (32-bit RW latch, retroDE SNES-style bitmap from input_common.h).
  • ps2_hps_bridge.INPUT_P2 @ 0x044 (player 2 latch).
  • ps2_hps_bridge.INPUT_P1_RAW @ 0x048 (un-remapped mirror used by retrodesd's OSD nav FSM in other cores).
  • ps2_hps_bridge.DS2_BUTTONS @ 0x0F4 (Ch226 read-only mirror of INPUT_P1; sibling-ABI DS2 path for retrodesd).
  • retrodesd/software/input_thread.c is the producer — evdev → remap → 32-bit AXI write into these offsets.

PS2 side

  • No SIO2 stub. docs/stub_module_plan.md:317 reserves rtl/peripherals/sio2_input_stub.sv as "Wave 2 #12", explicitly the last stub before "Wave 3 promotions" — never written.
  • No pad MMIO decode in iop_memory_map_stub.sv for the SIO2 region (0x1F80_8200..0x1F80_82FF on real hardware).
  • No EE-side libpad pathee_memory_map_stub.sv has no RPC/SIF awareness of controller state.
  • The IOP map's "Future regions" comment block (in rtl/iop/README.md:149) lists "Other IOP DMAC channels (CDVD / SPU2 / DEV9 / SIF1-2 / SIO2)" as deferred.

The platform shell talks to itself — HPS writes a latch, HPS reads it back (via Ch226 DS2_BUTTONS mirror). Nothing on the PS2 fabric side consumes the bits, which is the gap Ch233+ will close.

Real PS2 controller path (for orientation)

A real game running on a stock PS2 sees controller input through this chain (top → bottom in time):

Physical DualShock 2
    │  (custom serial protocol, ~250 kHz)
    ▼
SIO2 controller block @ IOP 0x1F80_8200..0x1F80_82FF
    │  (FIFO + command/response + DMA channel 11)
    ▼
IOP RAM (padman.irx — Sony's pad daemon)
    │  - issues SIO2 transactions every vsync
    │  - parses the response into a 16-byte pad state struct
    │  - publishes the struct to a known IOP RAM address
    ▼
SIF (RPC channel)
    │  - EE-side libpad opens an RPC channel
    │  - calls padRead(port, &state) → marshals 16 bytes
    │    of pad state over SIF DMA to EE-side buffer
    ▼
EE RAM (libpad-allocated buffer)
    │  - game / BIOS reads the 16 bytes directly
    ▼
Game logic

Where the bytes live in the 16-byte pad state (the format libpad/padman use, Sony's "digital mode" / type 0x4 response):

Byte Bit Function Active-low?
0 - success status usually 0x00 / 0xFF
1 - report type / pad-state-machine 0x41 = digital, 0x73 = analog
2 - success token
3 7 LEFT 0 = pressed
3 6 DOWN 0 = pressed
3 5 RIGHT 0 = pressed
3 4 UP 0 = pressed
3 3 START 0 = pressed
3 2 R3 0 = pressed
3 1 L3 0 = pressed
3 0 SELECT 0 = pressed
4 7 □ (square) 0 = pressed
4 6 × (cross) 0 = pressed
4 5 ○ (circle) 0 = pressed
4 4 △ (triangle) 0 = pressed
4 3 R1 0 = pressed
4 2 L1 0 = pressed
4 1 R2 0 = pressed
4 0 L2 0 = pressed
58 - RX, RY, LX, LY analog (0x80 centered, digital mode reports 0x80)
9-15 - pressure / reserved (DualShock 2 only)

Active-low semantics: every bit is 0 when the button is pressed. retroDE's INPUT_P1 from input_common.h is active-high. The translation layer must invert per-bit.

What software reads first. The Sony BIOS doesn't poll controllers during its own boot — the first pad transactions come from OSDSYS (the in-BIOS browser) and game executables linking libpad. So:

  • For a BIOS-bring-up smoke test, no pad surface is required.
  • For an OSDSYS-driven boot path, OSDSYS expects the SIF RPC server RPCID 0x80000100 (padman) to answer with a 16-byte pad state on every padRead call.
  • For homebrew or game code, libpad's standard API is the observable surface; the implementation strategy (faithful SIO2 vs simplified RPC vs simplified MMIO) is opaque to the caller.

Proposed mapping (Ch222 → Sony pad state)

Following the peripherals.md:30 open question ("simplified abstraction vs SIO2-faithful transactions?") the recon answer is: start with a simplified abstraction. SIO2-faithful transactions require IOP code that runs the protocol — fine for late-Wave-2 work but not the smallest useful first step.

INPUT_P1 bit assignments (from input_common.h) map to Sony pad state per the following table. SNES-style face buttons fold onto DualShock face buttons by spatial layout (Y top, B bottom, X left, A right — same as the standard SNES → PSX mapping retroDE already uses on coco2 / a2600):

INPUT_P1 bit retroDE name PS2 button (Sony name) Pad-state byte.bit
0 JOY_RIGHT RIGHT (D-pad) 3.5
1 JOY_LEFT LEFT (D-pad) 3.7
2 JOY_DOWN DOWN (D-pad) 3.6
3 JOY_UP UP (D-pad) 3.4
4 JOY_START START 3.3
5 JOY_SELECT SELECT 3.0
6 JOY_Y △ (triangle, top) 4.4
7 JOY_B × (cross, bottom) 4.6
8 JOY_X □ (square, left) 4.7
9 JOY_A ○ (circle, right) 4.5
10 JOY_L L1 4.2
11 JOY_R R1 4.3
12 JOY_L2 L2 4.0
13 JOY_R2 R2 4.1
14 JOY_L3 L3 3.1
15 JOY_R3 R3 3.2
16 JOY_OSD — (consumed by retrodesd, not forwarded)

Inversion rule: each PS2 byte starts at 0xFF (all released); each INPUT_P1 bit that's 1 clears the corresponding pad-state bit to 0. Two assigns of 8-bit pad bytes do the whole thing combinationally:

pad_state[3] = ~{INPUT_P1[1], INPUT_P1[2], INPUT_P1[0], INPUT_P1[3],
                 INPUT_P1[4], INPUT_P1[15], INPUT_P1[14], INPUT_P1[5]};
pad_state[4] = ~{INPUT_P1[8], INPUT_P1[7], INPUT_P1[9], INPUT_P1[6],
                 INPUT_P1[11], INPUT_P1[10], INPUT_P1[13], INPUT_P1[12]};

(Order inside {} is MSB→LSB to match the Sony bit numbering.)

Proposed minimum MMIO surface

For the smallest possible useful "PS2 code can read controller state" path:

Option A — IOP-readable PS2-local register (recommended).

Add a single 32-bit read-only register on the IOP MMIO bus that packs the two pad-state bytes plus a presence/status word:

IOP phys offset Name Layout (32-bit)
0x1F80_8500 PAD_P1_STATE [7:0]=byte3 (D-pad/SEL/START), [15:8]=byte4 (face/shoulder), [16]=connected=1, [17]=error=0, [31:18]=0
0x1F80_8504 PAD_P2_STATE Same layout, sourced from INPUT_P2

0x1F80_8500..0x1F80_85FF is a retroDE-local I/O range, not Sony-compatible. It deliberately sits outside the real SIO2 range (0x1F80_8200..0x1F80_82FF) so that landing real SIO2 emulation later doesn't collide. Bit fields are little-endian to match the IOP's native byte ordering.

IOP-side code (a small "fake padman" routine loaded at known address, or a future BIOS-replacement RPC server) reads PAD_P1_STATE, writes the 16-byte Sony pad state into the agreed EE-visible memory location, and signals via SIF.

Option B — SIF mailbox pad state.

Skip IOP code entirely. Add a mailbox in sif_mailbox_stub that the EE can read directly without any IOP cooperation. Faster to demo but breaks libpad's RPC contract — homebrew built against libpad won't work without a shim.

Option C — faithful SIO2 emulation.

Real 0x1F80_8200..0x1F80_82FF register surface, real FIFO, real DMA channel 11, real command/response protocol. padman.irx runs unchanged. Largest scope by far — defers to a later chapter once Option A is proven.

Recommendation: A → B → C as separate chapters. Most game/BIOS code talks to libpad, which talks to padman over SIF — Option A gives the smallest fabric surface that lets a stub padman work.

Proposed Ch234+ implementation chapters

Chapter Scope
Ch234 rtl/peripherals/sio2_input_stub.sv (Option A): single module, two read-only 32-bit registers; combinationally maps Ch222 INPUT_P1/P2 latches into PS2 pad-state bytes with the inversion rule above; IOP map decode added at 0x1F80_8500..0x1F80_85FF. Bridge gets a new output port carrying INPUT_P1/P2 into the IOP domain (single-bit register-stable signals, no CDC needed beyond the existing reset-sync because they update at retrodesd's 1 kHz rate). New focused TB: write INPUT_P1, read PAD_P1_STATE through the IOP map, verify the inversion + bit order.
Ch235 Either ramp Ch234 into Option B (SIF mailbox), or extend Ch234 to expose pad analog stick values (currently libpad reports 0x80 centered in digital mode — match that). Decision deferred per the BIOS-bringup observations.
Ch236+ Real SIO2 emulation (Option C) once a known BIOS or homebrew demands it.

Out of scope for this contract

  • Analog stick fidelity beyond "report 0x80 centered" (the INPUT_P1 bitmap is digital-only; full DualShock 2 analog requires a separate retrodesd-side path).
  • Pressure-sensitive buttons (DualShock 2 only).
  • Multitap support (most PS2 software doesn't require it for bringup).
  • Real SIO2 timing fidelity (the simplified register is combinational; real SIO2 has a multi-cycle command/response protocol).
  • Vibration / actuator feedback (output direction; needs EE → HPS path, not relevant for input recon).

Boundary call

The HPS-to-bridge half of the input path landed in Ch222 and is silicon-validated; the bridge-to-PS2-fabric half is open. Ch234 adds a small IOP-readable sio2_input_stub at the retroDE-local I/O range 0x1F80_8500..0x1F80_85FF that combinationally translates INPUT_P1/INPUT_P2 into Sony pad bytes; IOP code (eventually a stub padman) reads the registers and publishes the 16-byte pad state via SIF for EE-side libpad. Faithful SIO2 emulation is deferred until a real BIOS or homebrew needs it.