Files
retroDE_ps2/docs/ch270_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

7.8 KiB

Ch270 closeout — BIOS-bypass EE ELF runner; synthetic test passes

Status: Closed. Ch270 is the framework chapter — the first time this core executes "real code at a real entry point" through a generic loader rather than a hardcoded BIOS path. The synthetic test passes; the verdict shape is exactly what Codex framed; the infrastructure is reusable for real PS2 ELFs.

Synthetic verdict: elf_timeout_with_hot_pc with hot_pc = 0x80100010 (count=128 / ring=256). The hot PC matches the J-self instruction in the synthetic 5-instruction loop, and the 128/256 ratio matches the J + delay-slot NOP pair retiring 1:1.

What landed

Tools

  • tools/generate_synthetic_image.py — emits a tiny EE-RAM image (4 MIPS instructions + NOPs) and a manifest (entry, stack-top) in iverilog $readmemh format. No external dependencies. The generated image places code at PHYS 0x00100000 with entry at kseg0 VA 0x80100008 (real PS2 ELFs use kseg0 too, because the ee_memory_map_stub routes useg to a separate shadow region).
  • tools/elf_to_eeram.py — minimal ELF32-LE-MIPS converter: parses PT_LOAD segments, strips kseg/kuseg alias bits (low 29 bits of p_vaddr → phys offset), emits the same image.hex + manifest.hex pair. Pure stdlib (struct module), no pyelftools.

Testbench

  • sim/tb/integration/tb_ee_core_elf_runner.sv — instantiates ee_core_stub with STRICT_UNSUPPORTED=1 + ee_memory_map_stub
    • 2 MiB ee_ram_stub + bios_rom_stub. Bootstrap: TB pokes a 4-instruction trampoline at 0xBFC00000 (LUI/ORI/JR/NOP) that loads the ELF entry into $at and jumps. Then a 50 ms watchdog
    • live-latch trackers for: entry_reached, first strict trap (PC + instr), first unmapped MMIO (EA + PC), halt, and a hot-PC histogram over the last 256 retires (chosen per feedback-observer-design-for-lineage — bounded ring with trigger-time read, not a fill-from-boot array).

5-way verdict:

Verdict Meaning
elf_first_unsupported_opcode strict trap on a missing decode → Ch271+ adds the opcode
elf_first_unmapped_mmio ev_arg3 == REGION_UNMAPPED → Ch271+ adds the device stub
elf_halted core asserted halt_o; ELF ran a HALT pattern
elf_timeout_with_hot_pc watchdog fired; reports the most-retired PC of the last 256
elf_entry_unreached / elf_no_retires bootstrap failure; fail fast

Verdict precedence enforces "first decisive event wins": strict trap > unmapped MMIO > halt > timeout > bootstrap diagnostics.

Makefile

  • tb_ee_core_elf_runner (default, synthetic) — regenerates the synthetic image via Python on each build (cheap; Python emits in < 1s).
  • tb_ee_core_elf_runner_real ELF=/path/to/game.elf — converts the user-supplied ELF and runs it. The exact same TB, just different input.
  • Added to both PHONY list (line 407) and the run: master list (line 2337) per the dual-list rule in feedback-makefile-two-lists.

Synthetic test result

[tb_ee_core_elf_runner] elf_entry=0x80100008 elf_stack_top=0x801ffff0
[tb_ee_core_elf_runner] BIOS trampoline @0xBFC00000:
  lui $1, 0x8010
  ori $1, $1, 0x0008
  jr  $1
  nop
[tb_ee_core_elf_runner] SUMMARY:
  elf_entry           = 0x80100008
  entry_reached       = 1
  retire_count        = 1666665
  saw_trap            = 0
  saw_unmapped_mmio   = 0
  saw_halt            = 0
  hot_pc              = 0x80100010 (count=128 / ring=256)
[tb_ee_core_elf_runner] verdict=elf_timeout_with_hot_pc (...)
  • 1.67M instructions retired in 50 ms sim time. The synthetic loop is a 2-instruction body (J self + delay-slot NOP), so retires_per_loop_cycle ≈ 1.67M / 50 ms / 2 = ~16.7 cycles per loop iteration. Per the existing reference-ee-core-stub-timing memory (18 cyc/iter for a similar tight loop), this is right in band.
  • saw_unmapped_mmio = 0 means the EE never accessed anything outside the EE RAM region — the J self loop confines execution to two known instructions.
  • hot_pc = 0x80100010 (the J), count=128 / ring=256 — exactly half the ring is the J PC, the other half is the delay-slot PC at 0x80100014. Confirms the loop is the dominant flow.

What this enables

The runner is now ready for real PS2 ELFs. Run:

make tb_ee_core_elf_runner_real ELF=/path/to/game.elf

…and the first verdict will be one of:

  • elf_first_unsupported_opcode (pc=... instr=...) — Ch271 implements the missing opcode. This is the incremental-growth path that built BIOS support; same pattern now applies to game code.
  • elf_first_unmapped_mmio (ea=... pc=...) — Ch271 adds a region stub. Most likely candidates for first hit on a real game ELF: EE timers, EE GS_PRIV, VIF0/VIF1, DMAC channels we haven't mapped, scratch/SPRAM.
  • elf_timeout_with_hot_pc with a non-loop hot PC — the game is in a wait-for-service loop (libpad/libcdvd polling), which guides what subsystem to model next.

Codex's framing was right: the first real-ELF blocker is more informative than another BIOS-flow autopsy, because it tells us which subsystem to model in priority order driven by what real software actually exercises.

Bumps hit during implementation (and notes for future TBs)

  1. iverilog 12: @(posedge clk) inside always_ff is illegal. The first compile attempt used always_ff for the "watch for decisive event then $finish" block, with an extra @(posedge clk) inside for trace-sink flush. iverilog errored. Fix: use plain always @(posedge clk) (not always_ff) when the block needs multiple event controls. Saved as a one-line note here because the broader pattern was already covered by feedback-observer-design-for-lineage.

  2. EE memory map routes useg (top bit 0) to a separate shadow. Initial synthetic test used entry = 0x00100008 (kuseg). The TB loaded code into ee_ram at PHYS 0x100000, but the EE core fetching VA 0x00100008 saw zeros from the useg_shadow region (a Ch33 de-aliasing decision documented in ee_memory_map_stub.sv). Switched the synthetic entry to 0x80100008 (kseg0) so the fetch is routed to ee_ram via phys-strip. Real PS2 ELFs use kseg0 for their text segment anyway — this matches reality. The tools/elf_to_eeram.py converter already strips alias bits to compute phys placement, so it works for either kseg0 or kuseg entries — only the synthetic generator's default needed updating.

  3. Trampoline at 0xBFC00000 instead of PC_RESET override. ee_core_stub does have a PC_RESET parameter, but it's elaboration-time only. To keep the runtime ELF entry selectable via plusarg, the TB pokes a LUI/ORI/JR trampoline into bios_rom's writeable mem array (sim-only hierarchical access). EE boots at 0xBFC00000, runs the 3-instruction trampoline, and jumps to the ELF entry. Same technique the existing addi/slti TBs use to install instruction images.

Regression

Adding tb_ee_core_elf_runner to the run: list bumps the expected PASS count from 157 to 158. Regression in flight.

Recommendation for Codex's Ch271 call

The synthetic test is the framework smoke. The real signal is what happens when a user-supplied game ELF lands:

make tb_ee_core_elf_runner_real ELF=<game.elf>

Whatever verdict that emits is Ch271's framing. If elf_first_unsupported_opcode, implement that opcode. If elf_first_unmapped_mmio, add that region stub. The chapter is one question — "what's the first blocker?" — and the verdict answers it.

Standing by for the first real ELF run. The user can supply any PS2 ELF — a homebrew demo, an extracted SLUS/SCUS executable from a disc image, or a small libtoolchain test binary. The framework treats them all identically; the verdict tells us where to spend Ch271.