Files
retroDE_ps2/docs/ch299_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

6.9 KiB

Ch299 closeout — Strategy B-lite: narrow library-ready gate poke; wait loop collapses

Status: Closed. Codex's "Strategy B-lite" (TB-side poke triggered by narrow syscall 0x77 match) worked first try. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x00110BB4 instr=0x70081EE9) — qbert exited the Ch298 wait loop on iteration 1 and advanced into new code, hitting an unimplemented MMI3 sub-op.

What landed

The TB-side gate-poke pattern: tb_ee_core_elf_runner now observes syscall 0x77 retires and, when the args match the qbert-specific narrow guard, writes 1 to the polled memory location.

Implementation — sim/tb/integration/tb_ee_core_elf_runner.sv

Per Codex's framing ("if direct memory write from syscall FSM is awkward, then a TB-side poke is acceptable, but trigger it on observing syscall 0x77, not on an arbitrary cycle"):

localparam logic [31:0] LIBRARY_READY_GATE_ADDR  = 32'h0013_29C0;
localparam logic [19:0] LIBRARY_READY_SHADOW_IDX = 20'h4_CA70;
localparam logic [31:0] LIBRARY_READY_GATE_VALUE = 32'h0000_0001;

Narrow guard:

if ((a0 >= 32'h001D_FD50) && (a0 <= 32'h001D_FDB0)
        && ((a3 == 32'h0000_0010) || (a3 == 32'h0000_0014))) begin
    u_ee_map.useg_shadow_mem[LIBRARY_READY_SHADOW_IDX] <= LIBRARY_READY_GATE_VALUE;
    library_ready_poke_count <= library_ready_poke_count + 1;
    ...
end

The guard matches exactly the two arg tuples Ch297 observed ($a0 ∈ {0x001DFD50, 0x001DFDB0}, $a3 ∈ {0x14, 0x10}). RTL-side direct write from the syscall FSM was rejected as too invasive (would require a new state and combinational map-driver changes). TB-side poke is Codex's authorized fallback.

SUMMARY line — library_gate

library_gate = addr=0x001329c0 initial=0x00000000 final=0x00000001
               poked=1 poke_count=2 first_poke_cycle=100093
               source=syscall_0x77_narrow_match
  • initial (sampled at cycle 100): 0 (matches Ch298's "starts zero" observation).
  • final (sampled continuously, latches last value): 1 (gate is now non-zero, wait condition satisfied).
  • poke_count = 2: both qbert-observed 0x77 calls (with $a3=0x14 and $a3=0x10) fired the poke.
  • first_poke_cycle = 100,093: just after qbert's second init zero-write at cycle 98,576 — the order is correct (zero-write first, then poke, so the poked-1 doesn't get clobbered).
  • source = "syscall_0x77_narrow_match": the poke fired from the narrow-matched syscall observer, NOT a blind cycle-fixed poke.

The narrow guard's third-tuple falsifier

The qbert run after Ch299 shows a THIRD distinct 0x77 tuple:

syscall_0x77 = count=3 distinct_tuples=3
  tuple[0] = ($a0=0x001dfd50, $a3=0x14) count=1   ← matches guard, fires poke
  tuple[1] = ($a0=0x001dfdb0, $a3=0x10) count=1   ← matches guard, fires poke
  tuple[2] = ($a0=0x001dfd70, $a3=0x40) count=1   ← $a3 outside guard, NO poke

The new third call wasn't visible in Ch297's qbert run because the wait loop blocked qbert from making it. With the Ch299 gate opening, qbert advanced past the wait loop and made this third 0x77 call before hitting the opcode trap.

The narrow guard correctly excluded the third tuple ($a3=0x40 is not in {0x10, 0x14}). poke_count=2 (not 3) confirms it. This is exactly the falsifiability surface Codex asked for — if the guard were too broad, poke_count would equal count_0x77 even when new arg shapes surface.

qbert progression

Chapter Blocker retire_count Notes
Post-Ch297 (0x77) wait loop spinning 1,469,235 (watchdog) gate never set
Post-Ch299 (gate poke) MMI3 opcode trap at 0x70081EE9 28,655 gate→1 at cycle 100,093; loop exits iter 1

The retire count appears smaller (28,655 < 1,469,235) but that's misleading — Ch297's number included the 1.44M spin. The MEANINGFUL signal is the verdict-shape change from elf_timeout_with_hot_pc (stuck) → elf_first_unsupported_opcode (concrete next demand). Same shape transition as Ch295.

Ch300 framing — new MMI3 sub-op at sa=0x1B

The new trap is opcode 0x70081EE9 at PC 0x00110BB4. Decode:

  • opcode = 011100 = 0x1C (MMI)
  • rs = 00000 = $0
  • rt = 01000 = 8 = $t0
  • rd = 00011 = 3 = $v1
  • sa = 11011 = 0x1B (= 27)
  • funct = 101001 = 0x29 = MMI3

So this is MMI3 / sa = 0x1B, an unimplemented MMI3 sub-op. Our current MMI3 coverage:

  • sa 0x0E = PCPYUD (Ch283)
  • sa 0x13 = PNOR (Ch281)

sa 0x1B is new. Per R5900 references, possibilities:

  • PEXEH (Parallel Exchange Even Halfword) — sa 0x1A in some sources
  • PREVH (Parallel Reverse Halfword) — sa 0x1B
  • PEXCH (Parallel Exchange Center Halfword) — sa 0x1A

If sa 0x1B is PREVH: reverses the order of 16-bit halfwords within each 64-bit doubleword.

Mechanical Ch300 chapter: extend MMI3 narrow-decode (Ch278 pattern) with MMI3_PREVH = 5'h1B, add is_prevh, add the writeback arm that implements halfword reversal across the 128-bit shadow (similar to PCPYUD's full-128 writeback). ~4-5 RTL edits + focused TB.

This is back to opcode-era for one chapter — fitting since Ch299 cleared the wait loop and qbert progressed to executable code with new MMI demands.

Pattern milestone

The third clean "inflection → autopsy → unblock" cycle is not needed yet. Ch299 successfully unblocked the second wait loop, and qbert is back in opcode-trap mode. The pattern can be sequenced more flexibly than I expected:

Cycle Inflection Autopsy Unblock
1 Ch293 (1.66M, 0x0011242C) Ch294 (syscall 0x7A bit-17) Ch295 ($a0-aware HLE)
2 Ch297 (1.47M, 0x00112554) Ch298 (memory poll 0x001329C0) Ch299 (narrow 0x77 gate poke)

Documentation status: qbert-specific HLE

Per Codex's instruction: "document this as a qbert-specific library-ready HLE, not architectural truth."

This is explicitly NOT a faithful model of PS2 kernel behavior. The real PS2 kernel's RegisterLibraryEntries writes a "library ready" word based on the registration record layout + the registered library's status. Our TB-side poke writes 1 to a hardcoded address that happens to match qbert's specific poll target.

Risks if another ELF uses syscall 0x77:

  • A different ELF with $a0 in the same range AND $a3 in {0x10, 0x14} would also get its 0x001329C0 word poked to 1 — potentially wrong if the ELF expects 0 or a different value.
  • An ELF with different registration buffer addresses won't get the poke at all (correct, since the guard is narrow).

The risk is low for qbert but should be revisited if Ch300+ surfaces another ELF or another syscall pattern in the same area.

Files changed

  • sim/tb/integration/tb_ee_core_elf_runner.sv — 6 new state signals + observer arm with narrow guard + SUMMARY display.

No RTL changes. No new TB target. Regression count unchanged at 176/176.

Regression

176/176 PASS (unchanged from Ch298; runner-only changes).