Files
retroDE_ps2/docs/ch299_closeout.md
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

179 lines
6.9 KiB
Markdown

# Ch299 closeout — Strategy B-lite: narrow library-ready gate poke; wait loop collapses
**Status:** Closed. Codex's "Strategy B-lite" (TB-side poke
triggered by narrow syscall 0x77 match) worked first try.
**Verdict from re-running qbert.elf:**
`elf_first_unsupported_opcode (pc=0x00110BB4 instr=0x70081EE9)`
qbert exited the Ch298 wait loop on iteration 1 and advanced into
new code, hitting an unimplemented MMI3 sub-op.
## What landed
The TB-side gate-poke pattern: tb_ee_core_elf_runner now observes
syscall 0x77 retires and, when the args match the qbert-specific
narrow guard, writes 1 to the polled memory location.
### Implementation — `sim/tb/integration/tb_ee_core_elf_runner.sv`
Per Codex's framing ("if direct memory write from syscall FSM is
awkward, then a TB-side poke is acceptable, but trigger it on
observing syscall 0x77, not on an arbitrary cycle"):
```sv
localparam logic [31:0] LIBRARY_READY_GATE_ADDR = 32'h0013_29C0;
localparam logic [19:0] LIBRARY_READY_SHADOW_IDX = 20'h4_CA70;
localparam logic [31:0] LIBRARY_READY_GATE_VALUE = 32'h0000_0001;
```
Narrow guard:
```sv
if ((a0 >= 32'h001D_FD50) && (a0 <= 32'h001D_FDB0)
&& ((a3 == 32'h0000_0010) || (a3 == 32'h0000_0014))) begin
u_ee_map.useg_shadow_mem[LIBRARY_READY_SHADOW_IDX] <= LIBRARY_READY_GATE_VALUE;
library_ready_poke_count <= library_ready_poke_count + 1;
...
end
```
The guard matches **exactly** the two arg tuples Ch297 observed
($a0 ∈ {0x001DFD50, 0x001DFDB0}, $a3 ∈ {0x14, 0x10}). RTL-side
direct write from the syscall FSM was rejected as too invasive
(would require a new state and combinational map-driver changes).
TB-side poke is Codex's authorized fallback.
### SUMMARY line — `library_gate`
```
library_gate = addr=0x001329c0 initial=0x00000000 final=0x00000001
poked=1 poke_count=2 first_poke_cycle=100093
source=syscall_0x77_narrow_match
```
- **initial** (sampled at cycle 100): 0 (matches Ch298's
"starts zero" observation).
- **final** (sampled continuously, latches last value): 1
(gate is now non-zero, wait condition satisfied).
- **poke_count = 2**: both qbert-observed 0x77 calls (with
$a3=0x14 and $a3=0x10) fired the poke.
- **first_poke_cycle = 100,093**: just after qbert's second init
zero-write at cycle 98,576 — the order is correct (zero-write
first, then poke, so the poked-1 doesn't get clobbered).
- **source = "syscall_0x77_narrow_match"**: the poke fired from
the narrow-matched syscall observer, NOT a blind cycle-fixed
poke.
## The narrow guard's third-tuple falsifier
The qbert run after Ch299 shows a **THIRD** distinct 0x77 tuple:
```
syscall_0x77 = count=3 distinct_tuples=3
tuple[0] = ($a0=0x001dfd50, $a3=0x14) count=1 ← matches guard, fires poke
tuple[1] = ($a0=0x001dfdb0, $a3=0x10) count=1 ← matches guard, fires poke
tuple[2] = ($a0=0x001dfd70, $a3=0x40) count=1 ← $a3 outside guard, NO poke
```
The new third call wasn't visible in Ch297's qbert run because
the wait loop blocked qbert from making it. With the Ch299 gate
opening, qbert advanced past the wait loop and made this third
0x77 call before hitting the opcode trap.
**The narrow guard correctly excluded the third tuple** ($a3=0x40
is not in {0x10, 0x14}). poke_count=2 (not 3) confirms it. This
is exactly the falsifiability surface Codex asked for — if the
guard were too broad, poke_count would equal count_0x77 even when
new arg shapes surface.
## qbert progression
| Chapter | Blocker | retire_count | Notes |
|---|---|---|---|
| Post-Ch297 (0x77) | wait loop spinning | 1,469,235 (watchdog) | gate never set |
| **Post-Ch299 (gate poke)** | **MMI3 opcode trap at 0x70081EE9** | **28,655** | gate→1 at cycle 100,093; loop exits iter 1 |
The retire count *appears* smaller (28,655 < 1,469,235) but
that's misleading — Ch297's number included the 1.44M spin. The
MEANINGFUL signal is the **verdict-shape change** from
`elf_timeout_with_hot_pc` (stuck) → `elf_first_unsupported_opcode`
(concrete next demand). Same shape transition as Ch295.
## Ch300 framing — new MMI3 sub-op at sa=0x1B
The new trap is opcode `0x70081EE9` at PC 0x00110BB4. Decode:
- opcode = `011100` = 0x1C (MMI)
- rs = `00000` = $0
- rt = `01000` = 8 = $t0
- rd = `00011` = 3 = $v1
- sa = `11011` = 0x1B (= 27)
- funct = `101001` = 0x29 = MMI3
So this is **MMI3 / sa = 0x1B**, an unimplemented MMI3 sub-op.
Our current MMI3 coverage:
- sa 0x0E = PCPYUD (Ch283)
- sa 0x13 = PNOR (Ch281)
sa 0x1B is **new**. Per R5900 references, possibilities:
- **PEXEH** (Parallel Exchange Even Halfword) — sa 0x1A in some
sources
- **PREVH** (Parallel Reverse Halfword) — sa 0x1B
- **PEXCH** (Parallel Exchange Center Halfword) — sa 0x1A
If sa 0x1B is PREVH: reverses the order of 16-bit halfwords
within each 64-bit doubleword.
Mechanical Ch300 chapter: extend MMI3 narrow-decode (Ch278
pattern) with `MMI3_PREVH = 5'h1B`, add `is_prevh`, add the
writeback arm that implements halfword reversal across the
128-bit shadow (similar to PCPYUD's full-128 writeback). ~4-5
RTL edits + focused TB.
This is **back to opcode-era for one chapter** — fitting since
Ch299 cleared the wait loop and qbert progressed to executable
code with new MMI demands.
## Pattern milestone
The third clean "inflection → autopsy → unblock" cycle is **not**
needed yet. Ch299 successfully unblocked the second wait loop,
and qbert is back in opcode-trap mode. The pattern can be
sequenced more flexibly than I expected:
| Cycle | Inflection | Autopsy | Unblock |
|-------|------------|---------|---------|
| 1 | Ch293 (1.66M, 0x0011242C) | Ch294 (syscall 0x7A bit-17) | Ch295 ($a0-aware HLE) |
| 2 | Ch297 (1.47M, 0x00112554) | Ch298 (memory poll 0x001329C0) | **Ch299 (narrow 0x77 gate poke)** |
## Documentation status: qbert-specific HLE
Per Codex's instruction: "document this as a qbert-specific
library-ready HLE, not architectural truth."
This is explicitly **NOT** a faithful model of PS2 kernel
behavior. The real PS2 kernel's RegisterLibraryEntries writes a
"library ready" word based on the registration record layout +
the registered library's status. Our TB-side poke writes 1 to a
hardcoded address that happens to match qbert's specific poll
target.
Risks if another ELF uses syscall 0x77:
- A different ELF with $a0 in the same range AND $a3 in {0x10,
0x14} would also get its 0x001329C0 word poked to 1 —
potentially wrong if the ELF expects 0 or a different value.
- An ELF with different registration buffer addresses won't get
the poke at all (correct, since the guard is narrow).
The risk is **low for qbert** but should be revisited if Ch300+
surfaces another ELF or another syscall pattern in the same area.
## Files changed
- `sim/tb/integration/tb_ee_core_elf_runner.sv` — 6 new state
signals + observer arm with narrow guard + SUMMARY display.
No RTL changes. No new TB target. Regression count unchanged at
**176/176**.
## Regression
**176/176 PASS** (unchanged from Ch298; runner-only changes).