Files
retroDE_ps2/docs/contracts/sio2_pad.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

863 lines
41 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SIO2 / pad input contract
Status: `Draft / partial impl` (Ch233 recon + Ch234 Option-A implementation
landed). RTL: [`rtl/iop/sio2_input_stub.sv`](../../rtl/iop/sio2_input_stub.sv).
Successor chapters (Ch235+) extend to analog / SIF mailbox / faithful SIO2.
---
## Ch234 implementation (landed)
`sio2_input_stub.sv` is the Option-A surface from the recon below. It
sits inside `iop_memory_map_stub` and translates the bridge-domain
`INPUT_P1` / `INPUT_P2` bitmaps into a Sony-format 16-bit digital pad
word readable from the IOP-side MMIO bus.
**IOP MMIO surface (retroDE-local, not Sony-compatible):**
| Offset | Reg | Layout |
|-------------|----------------|---------------------------------------------------------------------|
| `0x1F80_8500` | `PAD_P1_STATE` | `[7:0]=byte3 (D-pad/start/select/sticks), [15:8]=byte4 (face/shoulder), [31:16]=0` |
| `0x1F80_8504` | `PAD_P2_STATE` | Same shape, sourced from `INPUT_P2` |
| `0x1F80_8508` | `PAD_STATUS` | `[0]=present/valid=1, [31:1]=0` |
| other | reserved | Read 0; write accepted-and-ignored |
**CDC: 2-FF synchronizer per bit** on each of the 32-bit `INPUT_P1`
and `INPUT_P2` inputs. Bridge writes at retrodesd's ≤ 1 kHz rate are
millions of design-clock cycles apart, so partial-bit tearing during
the sync settling window is theoretically possible but practically
vanishingly rare. A future chapter can promote to "snapshot CDC"
(latch + 2-sample coherency) if tearing ever becomes observable.
**Active-high → active-low**: each `INPUT_P1` bit equal to 1 (pressed)
maps to the corresponding Sony bit equal to 0 (pressed). Two
combinational `~{...}` assigns do the per-bit permutation +
inversion in one cycle each.
**Coverage:**
[`sim/tb/iop/tb_sio2_input_stub.sv`](../../sim/tb/iop/tb_sio2_input_stub.sv)
exercises the new module directly (without going through the IOP
map): reset state (all reads 0 except PAD_STATUS); no-buttons →
Sony word `0xFFFF`; single-bit pressed across all 16 retroDE bits;
JOY_OSD (bit 16) deliberately *not* forwarded; combos (START+SELECT,
face+D-pad); P1/P2 independence with distinct patterns; writes
accepted-and-ignored; out-of-range word offsets read 0; clearing
returns to `0xFFFF`. 152 PASS sim regression intact (151 baseline
+ new TB).
The `iop_memory_map_stub` now also routes the new region in its
read-response mux and trace; CPU reads to addresses in
`0x1F80_8500..0x1F80_85FF` route to the stub, others fall through
unchanged. Sixteen existing IOP-map-consuming TBs gained a
`.input_p1(32'd0), .input_p2(32'd0)` tie-off since the map signature
gained two new input ports.
**Bridge-side output ports landed in Ch235.** `ps2_hps_bridge` now
exposes `input_p1_o` / `input_p2_o` as bridge-clock-domain
broadcasts of the Ch222 latches; `iop_memory_map_stub.input_p1` /
`input_p2` consume them directly. The board top wires the bridge's
new outputs to a pair of local `bridge_input_p1` / `bridge_input_p2`
nets (unconnected for now — the synth top doesn't yet instantiate
the IOP core, but the wires are placed for future hookup).
The full HPS → bridge → IOP path is sim-validated end-to-end by
[`sim/tb/integration/tb_bridge_iop_pad_input.sv`](../../sim/tb/integration/tb_bridge_iop_pad_input.sv):
two distinct clocks (100 MHz bclk for the bridge, 33 MHz iclk for
the IOP map) so the bridge-clk → IOP-clk CDC inside the
sio2_input_stub is genuinely exercised. The TB drives AXI writes
into INPUT_P1/P2 at the standard 0x040/0x044 offsets and reads
PAD_P1_STATE/PAD_P2_STATE at 0x1F80_8500/0x1F80_8504 — exactly the
operator-visible end-to-end flow.
---
## Ch237 — EE-visible pad-state buffer (recon)
Status: `Recon` (no RTL). Defines how the IOP-local Sony pad word
(Ch234) becomes an EE-readable 16-byte buffer that libpad-shaped
code can consume.
### Why this recon exists
Ch234 gave PS2-side IOP code access to a Sony-format pad word.
Ch235 wired the HPS→IOP half on real (sim) silicon. But the EE
half — how EE-side software (eventually libpad, or hand-rolled
homebrew) sees pad state — is still undefined. Ch237 picks a
shape before Ch238 starts soldering RTL.
### Survey: SIF infrastructure that already exists
The SIF seam is **feature-complete for staged bring-up** per
[`rtl/sif/README.md`](../../rtl/sif/README.md). Relevant
already-landed pieces for the pad-state path:
| Module | What it does |
|-------------------------------------|---------------------------------------------------------------------------------------------------|
| `sif_mailbox_stub` | 4-register mailbox: `MSCOM` / `SMCOM` / `MSFLG` / `SMFLG`. Both EE-side and IOP-side ports. |
| `sif_dma_iop_ram_bridge_stub` | EE→IOP DMA: 128-bit qword → 4×32-bit IOP RAM writes (with `DEST_BASE_ADDR`). |
| `sif_dma_ee_ram_bridge_stub` | **IOP→EE DMA: 4×32-bit IOP beats → 1×128-bit EE-RAM write at `DEST_BASE_ADDR`.** Has `last_seen_o`. |
| `sif_dma_ack_peer_stub` | Mailbox doorbell + payload-complete combiner (EE side waits). |
| `sif_dma_ee_ack_peer_stub` | IOP-driven equivalent (mirror polarity). |
| `boot_install_agent_stub` | EE-driven boot-image landing through SIF (different traffic shape but same primitives). |
**The IOP→EE data path already exists in RTL form.** A 16-byte
pad-state buffer arriving at a fixed EE-RAM address is one
sif_dma_ee_ram_bridge transaction — exactly four 32-bit beats.
The protocol-combiner peers handle the "payload landed,
notify the other side via mailbox flag" sequence both ways.
### What does NOT exist today
- **EE-side SIF register decode in `ee_memory_map_stub`.** Real
PS2 has SIF MSCOM/SMCOM/MSFLG/SMFLG visible to the EE at
`0x1000_F200..0x1000_F2FF`; the EE map doesn't yet decode
that range. `sif_mailbox_stub` has an EE-side port, but no
EE map routes CPU reads/writes there yet. (The IOP-side map
decodes its own SIF window at `0x1D00_0000+`.)
- **No EE-side execution primitive in the synth top.** Same
silicon-truth caveat as the IOP side from Ch236 — `tb_*`
TBs exercise EE↔IOP coordination in sim with real
EE/IOP CPU stubs, but the synth top doesn't instantiate
either. The path can land in sim and stay sim-only until
a future top-integration chapter wires both CPUs in.
- **No libpad / padman RPC layer.** Real PS2: padman.irx on
IOP receives RPC calls from EE-side libpad, services them
with SIF DMAs back to EE buffers. The RPC layer is software
on both sides, not RTL. Ch237 scope is the RTL-level
buffer-delivery path — the RPC protocol on top can come
later.
### Three options for the EE-visible surface
#### Option A — IOP→EE DMA into a fixed EE-RAM buffer (recommended)
**Shape**: IOP code reads `PAD_P1_STATE` / `PAD_P2_STATE`
(Ch234), constructs a 16-byte Sony pad-state struct in IOP RAM,
DMAs it via `sif_dma_ee_ram_bridge_stub` to a fixed address in
EE RAM (e.g., `EE_PAD_BUFFER_BASE = 0x0008_0000`). EE-side code
reads from that address.
**Pros**:
- Uses the existing `sif_dma_ee_ram_bridge_stub` as-is.
- Matches the *shape* libpad expects — pad state lands in
EE-allocated memory, EE reads bytes directly.
- The fixed address is a stub convention; a future libpad
layer can carry the real per-port allocation address.
- 16 bytes = exactly four 32-bit SIF DMA beats = exactly one
qword write at the EE-RAM bridge. No partial-quad edge cases.
**Cons**:
- Requires an IOP-side execution context that reads
PAD_P1_STATE and drives the DMA — but Ch235's
`tb_bridge_iop_pad_input` is the template; we already have
small synthetic-IOP-code patterns in `tb_iop_*` TBs.
- The DMA path has ack/handshake latency (mailbox doorbell +
4-beat DMA + completion flag). For Ch238's first stub
this is fine; for real-time pad polling at 60 Hz it's also
more than fine (each transaction is microseconds at typical
clock rates).
#### Option B — Mailbox register packing (smallest possible)
**Shape**: IOP packs the 16-byte pad state into the 4×32-bit
mailbox registers (`MSCOM` / `SMCOM` / `MSFLG` / `SMFLG`).
EE reads them via the (not-yet-decoded) EE-side SIF window.
**Pros**:
- No DMA, no payload completion. Just register writes.
- Even smaller scope than Option A — could be one TB chapter.
- Mailbox storage already exists.
**Cons**:
- **Overloads mailbox semantics**: real PS2 uses MSFLG/SMFLG
as flag/doorbell registers, not data carriers. A naive stub
here breaks any future mailbox-based RPC protocol.
- **Not libpad-compatible at all.** Real libpad never reads
pad state from SIF mailbox registers — it reads from a
DMA-populated EE-RAM buffer. Option B would require all
EE-side code to use a PS2-local convention.
- **Still requires EE-side SIF window decode**, so the
"small" advantage shrinks once the EE map work is needed
anyway.
#### Option C — retroDE-local EE MMIO (mirror IOP-side stub)
**Shape**: Add a `pad_input_ee_stub` in the EE map at a
retroDE-local address (e.g., `0x1B00_8500` deliberately
outside any real PS2 region). Combinationally surface the
same Sony pad words the IOP-side stub exposes.
**Pros**:
- Zero protocol overhead — combinational mirror, single
register read.
- No SIF involvement, no DMA, no handshake.
- Symmetric with Ch234's IOP-side pattern.
**Cons**:
- **Doubles the platform-local surface** — two non-Sony
register windows (IOP + EE) doing the same thing.
- **Bypasses SIF entirely**, so it doesn't exercise the
EE↔IOP path that libpad / real games actually use.
- Doesn't help with eventual SIF/RPC compatibility — when
Option A lands, Option C becomes dead code.
### Recommendation
**Option A** for the substantive next chapter. Reasoning:
1. The existing `sif_dma_ee_ram_bridge_stub` already implements
"IOP-side 4 beats → 1 qword EE-RAM write at a known
address". Reusing it costs zero new RTL on the data path.
2. The shape matches libpad's expected dataflow, so future
RPC work composes cleanly without semantic refactoring.
3. The fixed-address convention is a single parameter; a
real libpad layer can override it per port without changing
the RTL surface.
Option B is tempting for "fastest visible EE-side proof" but
breaks libpad-shape; Option C is tempting for symmetry but
creates dead code once Option A lands.
### Where the path lights up in existing stubs
For a sim-only Ch238 (Option A), the data flow is:
```
sio2_input_stub.PAD_P1_STATE // Ch234 — IOP reads here
▼ (IOP-side test code: read, copy to IOP RAM)
iop_ram (16 bytes at iop_pad_buffer_addr)
▼ IOP DMAC → sif_dma_iop_ram_bridge_stub egress // EXISTS
sif_dma_stub (EE-side ingress buffer) // EXISTS
▼ sif_dma_ee_ram_bridge_stub → ee_memory_map.bridge_wr // EXISTS
ee_ram (16 bytes at EE_PAD_BUFFER_BASE) // EXISTS
▼ EE-side test code: cpu_rd from EE_PAD_BUFFER_BASE
EE-readable pad state ← target
```
The only **new** pieces needed are:
- A small IOP-side test harness that drives the read → DMA
sequence (TB-level glue or a tiny synthetic-IOP-code
fragment loaded into IOP RAM).
- A new integration TB that wires all the existing stubs
end-to-end and asserts an EE-side read of
`EE_PAD_BUFFER_BASE` matches the Sony pad word from
PAD_P1_STATE within some bounded latency.
No new RTL module is strictly required for Ch238 — the path
composes from existing primitives. If the integration TB
turns up a missing piece (e.g., a more convenient pad-state
packing helper), that's a candidate for new RTL; otherwise
Ch238 lands as one new TB plus possibly one tiny helper.
### Proposed chapter sequence
| Ch | Scope |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Ch238** | Integration TB. Wires the existing IOP map (with Ch234 sio2_input_stub) + IOP DMAC + SIF mailbox + SIF DMA primitives + EE map → IOP-side test sequencer reads PAD_P1_STATE, packs a 16-byte Sony struct into IOP RAM, kicks an IOP→EE SIF DMA, signals via mailbox flag, then EE-side TB code reads the buffer at `EE_PAD_BUFFER_BASE` and verifies the bytes. End-to-end latency expected: ≤ a few microseconds at the existing clock rates. |
| Ch239 | EE-side read surface polish: decode the SIF MSCOM/SMCOM window in `ee_memory_map_stub` (it currently doesn't decode SIF — fixing that lets the EE CPU stub poll the mailbox `pad-ready` flag without TB intervention). Optionally a tiny EE-side test program loaded into EE RAM that does `lw $v0, EE_PAD_BUFFER_BASE` and traces the result. |
| Ch240+ | Real padman/libpad RPC compatibility: define the RPC frame format, build the EE-side request/IOP-side response pair, support multi-port + connected/disconnected state changes. Largest single chapter in the input arc — defer until Ch238+Ch239 are green and there's a real game/BIOS workflow demanding it. |
### Out of scope for Ch237 / Ch238 / Ch239
- Analog stick fidelity (still digital-mode-only at all three
Ch222 / Ch234 / Ch238 levels).
- DualShock 2 pressure-sensitive buttons.
- Multitap support.
- Vibration / actuator feedback (output direction).
- Faithful SIO2 protocol emulation at `0x1F80_8200..0x1F80_82FF`
(deferred per Ch233 / Ch234 reasoning).
- Top-level synth integration of the IOP and EE cores. Until
that lands, Ch238+ are sim-only chapters; the silicon-side
story stays the Ch236 disclaimer ("non-zero INPUT_P1 values
mean the bridge latch landed, NOT that PS2 code saw it").
### Boundary call
> **The existing SIF DMA + mailbox infrastructure already
> implements the IOP→EE data delivery path; Ch238 only needs
> to compose those primitives with a small IOP-side test
> sequencer and define `EE_PAD_BUFFER_BASE`. Real libpad/
> padman compatibility is a software layer on top of that
> path, not a separate RTL surface; Ch240+ work, post-MVP
> for the input arc.**
---
## Ch238 implementation (landed)
Option A is proven end-to-end in sim with **no new production
RTL** — the path composes entirely from existing primitives.
**New integration TB**
[`sim/tb/integration/tb_pad_state_via_sif_to_ee.sv`](../../sim/tb/integration/tb_pad_state_via_sif_to_ee.sv):
| Stage | Module |
|------------------------|-------------------------------------|
| HPS AXI write | TB drives bridge's AXI4 slave |
| Bridge latch | `ps2_hps_bridge` (Ch222 INPUT_P1) |
| Bridge→IOP CDC | `sio2_input_stub` (Ch234 inside IOP map) |
| IOP read of pad word | TB-side IOP read at `0x1F80_8500` |
| 16-byte pad packet | TB packs Sony struct (status/type/token/byte3/byte4 + analog centers 0x80) |
| 4-beat SIF DMA | TB drives `sif_dma_ee_ram_bridge_stub.in_*` |
| EE-RAM landing | `ee_memory_map_stub.bridge_wr_*``ee_ram_stub` |
| EE-side verification | TB issues DMAC qword read at landing addr |
**Two clocks** (100 MHz bridge, 33 MHz IOP/EE/SIF) so the
bridge-clk → IOP-clk CDC inside `sio2_input_stub` is genuinely
exercised end-to-end.
**Pad packet layout** (16 bytes, packed into 4 little-endian
32-bit beats):
```
byte 0 : 0x00 success status
byte 1 : 0x41 response type (digital mode)
byte 2 : 0x5A success token
byte 3 : Sony byte3 D-pad/start/select/sticks (active-low)
byte 4 : Sony byte4 face/shoulder (active-low)
bytes 58 : 0x80 RX/RY/LX/LY analog centers (digital mode)
bytes 915: 0x00 reserved (DualShock 2 pressure)
```
Verified scenarios:
| § | INPUT_P1 (AXI write to 0x040) | Expected Sony bytes 3/4 |
|----|-------------------------------------------|--------------------------|
| §1 | `0x00000000` (no buttons) | byte3=`0xFF`, byte4=`0xFF` |
| §2 | `0x00000001` (JOY_RIGHT only) | byte3=`0xDF` (bit 5 cleared), byte4=`0xFF` |
| §3 | `0x00000031 | (1<<6)` (RIGHT+START+SEL+△) | byte3=`0xD6`, byte4=`0xEF` |
| §4 | `0x00000000` (re-clear) | byte3=`0xFF`, byte4=`0xFF` |
The TB also confirms `last_seen_o` rises after each 4-beat
burst (proves the in_last semantics propagate cleanly through
the egress bridge's state machine).
**Streaming-bridge note (timing artifact, not a bug):** the
existing `sif_dma_ee_ram_bridge_stub` advances `wr_offset` by
16 after every emit (streaming semantics — designed for
multi-qword DMAs). Successive scenarios in this TB therefore
land at successive 16-byte slots; the TB tracks the per-scenario
landing address (`EE_PAD_BUFFER_BASE + scenario_idx * 16`) and
verifies the byte layout at each. A real libpad/padman
implementation will need either (a) a bridge-reset between
transfers so every `padRead()` overwrites the same buffer, or
(b) an SPS2-side counter so EE knows which slot holds the
latest sample. That decision belongs to Ch239+, not Ch238.
**P2 is deliberately left out of the first slice** per Codex
Ch238 framing. The next chapter can either reuse the same
16-byte slot (overwriting P1 each emit) or move to a multi-port
layout (P1 at +0, P2 at +16, etc.).
**Sim regression** bumps from 153 → 154 PASS (new TB only,
zero RTL change).
---
## Ch239 — single-slot buffer contract (landed)
Ch238 exposed the streaming offset of
`sif_dma_ee_ram_bridge_stub` (each emit advances `wr_offset` by
16). For a libpad-style consumer that wants `padRead(port, &buf)`
to return a stable snapshot at a single buffer address, that's
the wrong default. Ch239 adds a narrow rewind input that lets a
producer reset the streaming offset between transfers — no other
SIF semantics change.
### RTL change
**One new input** on
[`rtl/sif/sif_dma_ee_ram_bridge_stub.sv`](../../rtl/sif/sif_dma_ee_ram_bridge_stub.sv):
```sv
input logic rewind_i = 1'b0 // default — keeps existing consumers untouched
```
Behavior:
- When `rewind_i` pulses HIGH (typically one iclk), `wr_offset`
returns to `32'd0` on the next clock edge. The very next emit
lands at `DEST_BASE_ADDR + 0`.
- The accumulator (`acc_data`, `acc_be`, `pos`) is already zeroed
at every emit's tail, so rewind doesn't need to touch them.
- Rewind is intended to fire **between transfers** — when the
bridge is idle (`state == S_ACCUM && pos == 0`). Misuse is
caught by a sim-only `$error` assertion; the RTL still applies
the rewind so the bug is loud, not silent.
The port has a `1'b0` default so existing instantiations (5 TBs,
zero RTL parents) keep their streaming behaviour without code
changes. Compile-checked against `tb_sif_ee_landing_via_dmac`
passes with no modification.
### Single-slot buffer contract (new convention)
A producer using rewind gets these guarantees:
| Property | Value / meaning |
|-----------------------------------------|----------------------------------------------------------------------------|
| Buffer base | `DEST_BASE_ADDR` (parameter; pad-state path uses `0x0008_0000`) |
| Buffer length | One 16-byte qword |
| Rewind cadence | One `rewind_i` pulse BEFORE each 4-beat transfer (between scenarios) |
| Stale-byte safety | Each transfer's `bridge_wr_be = 16'hFFFF` (all 16 bytes enabled), so a fresh full-length transfer overwrites every byte — no leftover content from a prior transfer can survive |
| Mid-transfer rewind | **Illegal.** Sim `$error`. Producer must wait for `last_seen_o` (or just a few clocks after the in_last beat) before pulsing rewind again |
For libpad-style single-slot semantics (`padRead(port, &buf)`
returning the same `&buf` every call), a producer pulses rewind
between each pad packet. The consumer reads from the fixed
address; the producer overwrites the slot in place.
### Coverage
`tb_pad_state_via_sif_to_ee` updated to exercise the contract:
- Every scenario pulses `rewind_i` BEFORE driving its 4 beats.
- All four scenarios read from the **same** `EE_PAD_BUFFER_BASE`
address (no per-scenario indexing — different from the Ch238
streaming-offset workaround).
- Per-scenario `check_eq128` against the expected qword
implicitly proves no stale bytes from prior scenarios survived:
if any byte leaked through, the 128-bit equality would fire.
- §3's combo pattern (`0xD6` / `0xEF`) differs from §1/§2/§4 in
multiple bit positions across both pad bytes — a partial-write
bug would surface here even if simpler patterns happened to
alias.
Existing `tb_sif_ee_landing_via_dmac` (which tests the bridge's
*streaming* behavior) passes unchanged with the rewind port at
its default `1'b0`.
### What `last_seen_o` means with rewind
`last_seen_o` is a level-held latch that rises on the in_last
beat's accept. The Ch239 rewind does NOT clear this latch — it
only touches `wr_offset`. A consumer can still gate on
`last_seen_o` to detect "any payload has landed since reset."
A future chapter that wants a per-transfer "fresh data" signal
(for libpad's `padRead` to know there's a new sample) will
likely add an `emit_done_pulse_o` strobe; that's distinct from
the rewind path and belongs with Ch240+ work.
### Boundary call
> **Ch239 makes the single-slot buffer contract explicit and
> tested. A libpad-style consumer can now read a stable
> 16-byte pad packet at `EE_PAD_BUFFER_BASE` regardless of how
> many pad packets the producer has emitted. The next chapter
> (Ch240) can either decode the EE-side SIF register window
> in `ee_memory_map_stub` so EE CPU code can poll a "new
> sample" flag, or move on to a tiny EE-side test program
> that just reads from the fixed address.**
---
## Ch240 — EE-side consumer reads + branches (landed)
Ch239 stabilised the producer; Ch240 closes the consumer half
with an actual EE-core program reading the buffer and
branching on its contents. Per Codex framing, **no EE-side
SIF register decode yet** — the EE program polls the fixed
RAM-resident buffer directly.
### EE test program
```mips
; Initialization
slot 0 LUI $1, 0x8008 ; $1 = EE_PAD_BUFFER_KSEG0 (0x80080000)
slot 1 LUI $5, 0x8000 ; $5 = EE_MARKER_KSEG0 base
slot 2 ORI $5, $5, 0x1000 ; $5 = 0x80001000
; Polling loop
LOOP: LBU $2, 3($1) ; $2 = pad byte3 (D-pad/start/select/sticks)
ORI $3, $0, 0xFF
BEQ $2, $3, MARK_A ; byte3 = 0xFF → no buttons
NOP
ORI $3, $0, 0xDF
BEQ $2, $3, MARK_B ; byte3 = 0xDF → JOY_RIGHT only
NOP
; fall-through → COMBO
COMBO: ORI $6, $0, 0xCC
SW $6, 0($5) ; marker C
J LOOP
NOP
MARK_A: ORI $6, $0, 0xAA
SW $6, 0($5) ; marker A
J LOOP
NOP
MARK_B: ORI $6, $0, 0xBB
SW $6, 0($5) ; marker B
J LOOP
NOP
```
22 instructions including delay slots; each loop iteration is
roughly 10 instructions. The program runs continuously — every
scenario the TB drives, the loop sees a new buffer value and
writes a fresh marker within ~500 design-clock cycles (well
inside the per-scenario wait).
### Kseg0 vs useg routing (important detail)
`ee_memory_map_stub` routes EE-CPU writes to **useg** addresses
(`addr[31] == 0`) into an internal `useg_shadow_mem` array,
NOT the external `ee_ram_stub`. The TB's DMAC-side reader goes
through `ee_ram_stub` — different backing store. To make EE
writes round-trip through the same RAM the TB samples, the EE
program targets **kseg0** addresses (0x80000000+):
- `EE_PAD_BUFFER_KSEG0 = 0x8008_0000` (EE reads via LBU at this
address; phys = `0x0008_0000` after kseg0 strip; routes to
`ee_ram_stub`)
- `EE_MARKER_KSEG0 = 0x8000_1000` (EE writes via SW at this
address; same kseg0-strip routing)
The TB's DMAC-side reads use the matching **physical**
addresses (`0x0008_0000` and `0x0000_1000`) — same backing
RAM, different access port.
### Verified scenarios
| § | AXI write to INPUT_P1 | Pad byte3 the EE sees | Marker written |
|----|--------------------------|------------------------|----------------|
| §1 | `0x0000_0000` (no buttons) | `0xFF` | `0xAA` |
| §2 | `0x0000_0001` (RIGHT only) | `0xDF` (bit 5 cleared) | `0xBB` |
| §3 | `0x0000_0021` (RIGHT + SELECT) | `0xDE` (bits 0 and 5 cleared) | `0xCC` |
| §4 | `0x0000_0000` (re-clear) | `0xFF` | `0xAA` |
Each scenario: AXI write → 20-iclk CDC settle → IOP-side read
of `PAD_P1_STATE` to confirm bridge latch arrived → pulse
`rewind_i` → drive 4 SIF beats → wait 500 iclk for the EE
program to consume the buffer and write the marker → TB DMAC
read of marker byte → assert.
### Sim regression
154 → 155 PASS (one new TB only; no production-RTL changes).
### What Ch240 explicitly does NOT do
- **No EE-side SIF register decode.** The `ee_memory_map_stub`
still doesn't decode the SIF mailbox/flag window at
`0x1000_F200..0x1000_F2FF`. The EE program polls the RAM
buffer directly instead of waiting on a doorbell.
- **No libpad RPC.** The marker convention is TB-internal;
real libpad would marshal pad state through padman.irx via
SIF RPC and into a libpad-allocated buffer with a known
per-port address.
- **No buffer-fresh signal.** The EE loop doesn't know if it's
reading the latest snapshot or the same one twice — it just
reads every iteration. Adding an "emit counter" the consumer
can compare against is a Ch241+ option.
### Audit responses (per Codex)
**Loop freshness — does each scenario's marker come from the
NEW packet, not stale state?** Yes. Two layers of evidence:
- Each scenario has a **distinct expected marker** (`0xAA` /
`0xBB` / `0xCC` / `0xAA`). If the EE loop missed a buffer
update and read the prior packet, the wrong marker would
land and the per-scenario `check_eq32` would fire.
- **§4 is the "clear and observe marker returns" case**: after
§3's combo write left the marker at `0xCC`, §4 re-clears
INPUT_P1 → byte3 returns to `0xFF` → the loop branches to
MARK_A → marker overwritten back to `0xAA`. That specifically
proves the EE loop is consuming live buffer state, not
caching the first read.
- Per-scenario wait is 500 design-clock cycles. Each EE loop
iteration is ~10 instructions × ~5 cycles each ≈ 50 cycles,
so the wait covers ~10 loop iterations — plenty of slack.
**Branch semantics — markers keyed to *cleared* bits
(active-low), not *set* bits?** Yes:
- `0xFF` (all bits SET) = no buttons pressed → MARK_A. Set
bits = released. ✓
- `0xDF` (bit 5 CLEARED) = JOY_RIGHT pressed → MARK_B. The
cleared bit is what indicates "pressed." ✓
- `0xDE` (bits 5 AND 0 CLEARED) = JOY_RIGHT + JOY_SELECT
pressed → falls through to MARK_C. ✓
A polarity inversion would be visible: e.g. if the program
treated `0xFF` as "all pressed" and branched to MARK_C, §1
would land `0xCC` instead of `0xAA` and the test would fire.
The fact that §1 + §4 both successfully match MARK_A on the
"no buttons" stimulus proves the active-low semantics are
honored end-to-end (sio2_input_stub's per-bit inversion +
the EE program's branch direction).
### Boundary call
> **The full input arc is sim-validated end-to-end: HPS writes
> INPUT_P1 → bridge latches → IOP-side sio2_input_stub
> translates to Sony pad bytes → producer packs a 16-byte
> Sony struct → SIF DMA drops it into EE RAM at a fixed slot
> (Ch239 rewind keeping the slot stable) → EE-side MIPS code
> branches on a button bit → writes a per-scenario marker the
> consumer-side TB samples. Active-low + freshness + clear-
> and-restore behaviors are all covered by the existing
> tb_ee_pad_buffer_branch §1–§4 scenarios. Next options:
> EE-side SIF mailbox/flag decode (Ch242+), per-emit "new
> sample" gating, or pivot back to a different arc — input is
> done as far as platform RTL is concerned.**
---
## Original recon (Ch233)
## Why this doc exists
Ch222Ch232 made the retroDE platform shell live on PS2: HPS writes
controller bitmaps into `ps2_hps_bridge.INPUT_P1/P2/P1_RAW` (offsets
0x040/0x044/0x048), the OSD compositor renders text over PS2 video, and
the supervisor menu round-trip is silicon-validated. The next bridge to
build is between **HPS-visible input latches** and **PS2-side software
that wants to read controller state** (eventually a real BIOS / game).
This doc maps that gap so the next code chapter has a small, named
target instead of an open question.
## Scope (Codex Ch233 framing)
1. Survey existing PS2-side stubs touching SIO2 / pad / controller paths.
2. Document what the real PS2 BIOS/game touches first for controller
input.
3. Map Ch222 `INPUT_P1`/`INPUT_P1_RAW` bits into a proposed internal
pad state format.
4. Identify the minimal MMIO surface to expose pad status to EE/IOP-side
code.
5. No RTL — the implementation chapter follows.
## What exists today
### HPS side (Ch222 — landed, silicon-validated by Ch226 DS2 stub)
- `ps2_hps_bridge.INPUT_P1` @ 0x040 (32-bit RW latch, retroDE
SNES-style bitmap from `input_common.h`).
- `ps2_hps_bridge.INPUT_P2` @ 0x044 (player 2 latch).
- `ps2_hps_bridge.INPUT_P1_RAW` @ 0x048 (un-remapped mirror used by
retrodesd's OSD nav FSM in other cores).
- `ps2_hps_bridge.DS2_BUTTONS` @ 0x0F4 (Ch226 read-only mirror of
INPUT_P1; sibling-ABI DS2 path for retrodesd).
- `retrodesd/software/input_thread.c` is the producer — evdev →
remap → 32-bit AXI write into these offsets.
### PS2 side
- **No SIO2 stub.** `docs/stub_module_plan.md:317` reserves
`rtl/peripherals/sio2_input_stub.sv` as "Wave 2 #12", explicitly
the last stub before "Wave 3 promotions" — never written.
- **No pad MMIO decode** in `iop_memory_map_stub.sv` for the SIO2
region (`0x1F80_8200..0x1F80_82FF` on real hardware).
- **No EE-side libpad path** — `ee_memory_map_stub.sv` has no
RPC/SIF awareness of controller state.
- The IOP map's "Future regions" comment block (in
`rtl/iop/README.md:149`) lists "Other IOP DMAC channels (CDVD /
SPU2 / DEV9 / SIF1-2 / SIO2)" as deferred.
The platform shell talks to itself — HPS writes a latch, HPS reads
it back (via Ch226 DS2_BUTTONS mirror). **Nothing on the PS2 fabric
side consumes the bits**, which is the gap Ch233+ will close.
## Real PS2 controller path (for orientation)
A real game running on a stock PS2 sees controller input through this
chain (top → bottom in time):
```
Physical DualShock 2
│ (custom serial protocol, ~250 kHz)
SIO2 controller block @ IOP 0x1F80_8200..0x1F80_82FF
│ (FIFO + command/response + DMA channel 11)
IOP RAM (padman.irx — Sony's pad daemon)
│ - issues SIO2 transactions every vsync
│ - parses the response into a 16-byte pad state struct
│ - publishes the struct to a known IOP RAM address
SIF (RPC channel)
│ - EE-side libpad opens an RPC channel
│ - calls padRead(port, &state) → marshals 16 bytes
│ of pad state over SIF DMA to EE-side buffer
EE RAM (libpad-allocated buffer)
│ - game / BIOS reads the 16 bytes directly
Game logic
```
**Where the bytes live in the 16-byte pad state** (the format
libpad/padman use, Sony's "digital mode" / type `0x4` response):
| Byte | Bit | Function | Active-low? |
|------|-----|-------------------|-------------|
| 0 | - | success status | usually 0x00 / 0xFF |
| 1 | - | report type / pad-state-machine | 0x41 = digital, 0x73 = analog |
| 2 | - | success token | |
| 3 | 7 | LEFT | 0 = pressed |
| 3 | 6 | DOWN | 0 = pressed |
| 3 | 5 | RIGHT | 0 = pressed |
| 3 | 4 | UP | 0 = pressed |
| 3 | 3 | START | 0 = pressed |
| 3 | 2 | R3 | 0 = pressed |
| 3 | 1 | L3 | 0 = pressed |
| 3 | 0 | SELECT | 0 = pressed |
| 4 | 7 | □ (square) | 0 = pressed |
| 4 | 6 | × (cross) | 0 = pressed |
| 4 | 5 | ○ (circle) | 0 = pressed |
| 4 | 4 | △ (triangle) | 0 = pressed |
| 4 | 3 | R1 | 0 = pressed |
| 4 | 2 | L1 | 0 = pressed |
| 4 | 1 | R2 | 0 = pressed |
| 4 | 0 | L2 | 0 = pressed |
| 58 | - | RX, RY, LX, LY | analog (0x80 centered, digital mode reports 0x80) |
| 9-15 | - | pressure / reserved (DualShock 2 only) | |
**Active-low semantics:** every bit is 0 when the button is pressed.
retroDE's `INPUT_P1` from `input_common.h` is **active-high**.
The translation layer must invert per-bit.
**What software reads first.** The Sony BIOS doesn't poll controllers
during its own boot — the first pad transactions come from
`OSDSYS` (the in-BIOS browser) and game executables linking
libpad. So:
- For a **BIOS-bring-up smoke test**, no pad surface is required.
- For an **OSDSYS-driven boot path**, OSDSYS expects the SIF
RPC server `RPCID 0x80000100` (padman) to answer with a 16-byte
pad state on every `padRead` call.
- For **homebrew or game code**, libpad's standard API is the
observable surface; the implementation strategy (faithful
SIO2 vs simplified RPC vs simplified MMIO) is opaque to the
caller.
## Proposed mapping (Ch222 → Sony pad state)
Following the `peripherals.md:30` open question ("simplified
abstraction vs SIO2-faithful transactions?") the recon answer is:
**start with a simplified abstraction.** SIO2-faithful transactions
require IOP code that runs the protocol — fine for late-Wave-2 work
but not the smallest useful first step.
`INPUT_P1` bit assignments (from `input_common.h`) map to Sony pad
state per the following table. SNES-style face buttons fold onto
DualShock face buttons by *spatial layout* (Y top, B bottom,
X left, A right — same as the standard SNES → PSX mapping retroDE
already uses on coco2 / a2600):
| INPUT_P1 bit | retroDE name | PS2 button (Sony name) | Pad-state byte.bit |
|--------------|--------------|------------------------|--------------------|
| 0 | JOY_RIGHT | RIGHT (D-pad) | 3.5 |
| 1 | JOY_LEFT | LEFT (D-pad) | 3.7 |
| 2 | JOY_DOWN | DOWN (D-pad) | 3.6 |
| 3 | JOY_UP | UP (D-pad) | 3.4 |
| 4 | JOY_START | START | 3.3 |
| 5 | JOY_SELECT | SELECT | 3.0 |
| 6 | JOY_Y | △ (triangle, top) | 4.4 |
| 7 | JOY_B | × (cross, bottom) | 4.6 |
| 8 | JOY_X | □ (square, left) | 4.7 |
| 9 | JOY_A | ○ (circle, right) | 4.5 |
| 10 | JOY_L | L1 | 4.2 |
| 11 | JOY_R | R1 | 4.3 |
| 12 | JOY_L2 | L2 | 4.0 |
| 13 | JOY_R2 | R2 | 4.1 |
| 14 | JOY_L3 | L3 | 3.1 |
| 15 | JOY_R3 | R3 | 3.2 |
| 16 | JOY_OSD | — (consumed by retrodesd, not forwarded) | — |
Inversion rule: each PS2 byte starts at `0xFF` (all released);
each `INPUT_P1` bit that's `1` clears the corresponding pad-state
bit to `0`. Two `assign`s of 8-bit pad bytes do the whole thing
combinationally:
```
pad_state[3] = ~{INPUT_P1[1], INPUT_P1[2], INPUT_P1[0], INPUT_P1[3],
INPUT_P1[4], INPUT_P1[15], INPUT_P1[14], INPUT_P1[5]};
pad_state[4] = ~{INPUT_P1[8], INPUT_P1[7], INPUT_P1[9], INPUT_P1[6],
INPUT_P1[11], INPUT_P1[10], INPUT_P1[13], INPUT_P1[12]};
```
(Order inside `{}` is MSB→LSB to match the Sony bit numbering.)
## Proposed minimum MMIO surface
For the smallest possible useful "PS2 code can read controller
state" path:
**Option A — IOP-readable PS2-local register (recommended).**
Add a single 32-bit read-only register on the IOP MMIO bus that
packs the two pad-state bytes plus a presence/status word:
| IOP phys offset | Name | Layout (32-bit) |
|--------------------|-----------------|----------------------------------------------------------------|
| `0x1F80_8500` | `PAD_P1_STATE` | `[7:0]=byte3 (D-pad/SEL/START)`, `[15:8]=byte4 (face/shoulder)`, `[16]=connected=1`, `[17]=error=0`, `[31:18]=0` |
| `0x1F80_8504` | `PAD_P2_STATE` | Same layout, sourced from `INPUT_P2` |
`0x1F80_8500..0x1F80_85FF` is a **retroDE-local** I/O range, not
Sony-compatible. It deliberately sits *outside* the real SIO2 range
(`0x1F80_8200..0x1F80_82FF`) so that landing real SIO2 emulation later
doesn't collide. Bit fields are little-endian to match the IOP's
native byte ordering.
IOP-side code (a small "fake padman" routine loaded at known address,
or a future BIOS-replacement RPC server) reads `PAD_P1_STATE`, writes
the 16-byte Sony pad state into the agreed EE-visible memory location,
and signals via SIF.
**Option B — SIF mailbox pad state.**
Skip IOP code entirely. Add a mailbox in `sif_mailbox_stub` that
the EE can read directly without any IOP cooperation. Faster to
demo but breaks libpad's RPC contract — homebrew built against
libpad won't work without a shim.
**Option C — faithful SIO2 emulation.**
Real `0x1F80_8200..0x1F80_82FF` register surface, real FIFO,
real DMA channel 11, real command/response protocol. padman.irx
runs unchanged. **Largest scope by far** — defers to a later
chapter once Option A is proven.
**Recommendation:** A → B → C as separate chapters. Most game/BIOS
code talks to libpad, which talks to padman over SIF — Option A
gives the smallest fabric surface that lets a stub padman work.
## Proposed Ch234+ implementation chapters
| Chapter | Scope |
|-----------|-------------------------------------------------------------------------------------------------------------------------|
| **Ch234** | `rtl/peripherals/sio2_input_stub.sv` (Option A): single module, two read-only 32-bit registers; combinationally maps Ch222 INPUT_P1/P2 latches into PS2 pad-state bytes with the inversion rule above; IOP map decode added at `0x1F80_8500..0x1F80_85FF`. **Bridge gets a new output port** carrying INPUT_P1/P2 into the IOP domain (single-bit register-stable signals, no CDC needed beyond the existing reset-sync because they update at retrodesd's 1 kHz rate). New focused TB: write INPUT_P1, read PAD_P1_STATE through the IOP map, verify the inversion + bit order. |
| **Ch235** | Either ramp Ch234 into Option B (SIF mailbox), or extend Ch234 to expose pad analog stick values (currently libpad reports 0x80 centered in digital mode — match that). Decision deferred per the BIOS-bringup observations. |
| Ch236+ | Real SIO2 emulation (Option C) once a known BIOS or homebrew demands it. |
## Out of scope for this contract
- Analog stick fidelity beyond "report 0x80 centered" (the
`INPUT_P1` bitmap is digital-only; full DualShock 2 analog
requires a separate retrodesd-side path).
- Pressure-sensitive buttons (DualShock 2 only).
- Multitap support (most PS2 software doesn't require it for
bringup).
- Real SIO2 timing fidelity (the simplified register is
combinational; real SIO2 has a multi-cycle command/response
protocol).
- Vibration / actuator feedback (output direction; needs
EE → HPS path, not relevant for input recon).
## Boundary call
> **The HPS-to-bridge half of the input path landed in Ch222 and
> is silicon-validated; the bridge-to-PS2-fabric half is open.
> Ch234 adds a small IOP-readable `sio2_input_stub` at the
> retroDE-local I/O range `0x1F80_8500..0x1F80_85FF` that
> combinationally translates `INPUT_P1`/`INPUT_P2` into Sony pad
> bytes; IOP code (eventually a stub padman) reads the registers
> and publishes the 16-byte pad state via SIF for EE-side libpad.
> Faithful SIO2 emulation is deferred until a real BIOS or
> homebrew needs it.**