Files
retroDE_ps2/docs/ch270_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

180 lines
7.8 KiB
Markdown

# Ch270 closeout — BIOS-bypass EE ELF runner; synthetic test passes
**Status:** Closed. Ch270 is the framework chapter — the first time
this core executes "real code at a real entry point" through a
generic loader rather than a hardcoded BIOS path. The synthetic
test passes; the verdict shape is exactly what Codex framed; the
infrastructure is reusable for real PS2 ELFs.
**Synthetic verdict:** `elf_timeout_with_hot_pc` with
`hot_pc = 0x80100010 (count=128 / ring=256)`. The hot PC matches
the J-self instruction in the synthetic 5-instruction loop, and
the 128/256 ratio matches the J + delay-slot NOP pair retiring 1:1.
## What landed
### Tools
- `tools/generate_synthetic_image.py` — emits a tiny EE-RAM image
(4 MIPS instructions + NOPs) and a manifest (entry, stack-top)
in iverilog `$readmemh` format. No external dependencies. The
generated image places code at PHYS `0x00100000` with entry at
kseg0 VA `0x80100008` (real PS2 ELFs use kseg0 too, because the
ee_memory_map_stub routes useg to a separate shadow region).
- `tools/elf_to_eeram.py` — minimal ELF32-LE-MIPS converter:
parses PT_LOAD segments, strips kseg/kuseg alias bits (low 29
bits of p_vaddr → phys offset), emits the same `image.hex` +
`manifest.hex` pair. Pure stdlib (struct module), no pyelftools.
### Testbench
- `sim/tb/integration/tb_ee_core_elf_runner.sv` — instantiates
`ee_core_stub` with `STRICT_UNSUPPORTED=1` + `ee_memory_map_stub`
+ 2 MiB `ee_ram_stub` + `bios_rom_stub`. Bootstrap: TB pokes a
4-instruction trampoline at `0xBFC00000` (LUI/ORI/JR/NOP) that
loads the ELF entry into `$at` and jumps. Then a 50 ms watchdog
+ live-latch trackers for: `entry_reached`, first strict trap
(PC + instr), first unmapped MMIO (EA + PC), halt, and a hot-PC
histogram over the last 256 retires (chosen per
[[feedback-observer-design-for-lineage]] — bounded ring with
trigger-time read, not a fill-from-boot array).
5-way verdict:
| Verdict | Meaning |
|---------------------------------|------------------------------------------------|
| `elf_first_unsupported_opcode` | strict trap on a missing decode → Ch271+ adds the opcode |
| `elf_first_unmapped_mmio` | ev_arg3 == REGION_UNMAPPED → Ch271+ adds the device stub |
| `elf_halted` | core asserted halt_o; ELF ran a HALT pattern |
| `elf_timeout_with_hot_pc` | watchdog fired; reports the most-retired PC of the last 256 |
| `elf_entry_unreached` / `elf_no_retires` | bootstrap failure; fail fast |
Verdict precedence enforces "first decisive event wins": strict
trap > unmapped MMIO > halt > timeout > bootstrap diagnostics.
### Makefile
- `tb_ee_core_elf_runner` (default, synthetic) — regenerates the
synthetic image via Python on each build (cheap; Python emits in
< 1s).
- `tb_ee_core_elf_runner_real ELF=/path/to/game.elf` — converts the
user-supplied ELF and runs it. The exact same TB, just different
input.
- Added to both PHONY list (line 407) and the `run:` master list
(line 2337) per the dual-list rule in
[[feedback-makefile-two-lists]].
## Synthetic test result
```
[tb_ee_core_elf_runner] elf_entry=0x80100008 elf_stack_top=0x801ffff0
[tb_ee_core_elf_runner] BIOS trampoline @0xBFC00000:
lui $1, 0x8010
ori $1, $1, 0x0008
jr $1
nop
[tb_ee_core_elf_runner] SUMMARY:
elf_entry = 0x80100008
entry_reached = 1
retire_count = 1666665
saw_trap = 0
saw_unmapped_mmio = 0
saw_halt = 0
hot_pc = 0x80100010 (count=128 / ring=256)
[tb_ee_core_elf_runner] verdict=elf_timeout_with_hot_pc (...)
```
- **1.67M instructions retired in 50 ms sim time.** The synthetic
loop is a 2-instruction body (J self + delay-slot NOP), so
retires_per_loop_cycle ≈ 1.67M / 50 ms / 2 = ~16.7 cycles per
loop iteration. Per the existing
[[reference-ee-core-stub-timing]] memory (18 cyc/iter for a
similar tight loop), this is right in band.
- **`saw_unmapped_mmio = 0`** means the EE never accessed
anything outside the EE RAM region — the J self loop confines
execution to two known instructions.
- **hot_pc = 0x80100010 (the J), count=128 / ring=256** — exactly
half the ring is the J PC, the other half is the delay-slot PC
at 0x80100014. Confirms the loop is the dominant flow.
## What this enables
The runner is now ready for **real PS2 ELFs**. Run:
```
make tb_ee_core_elf_runner_real ELF=/path/to/game.elf
```
…and the first verdict will be one of:
- `elf_first_unsupported_opcode (pc=... instr=...)` — Ch271 implements
the missing opcode. This is the **incremental-growth path** that
built BIOS support; same pattern now applies to game code.
- `elf_first_unmapped_mmio (ea=... pc=...)` — Ch271 adds a region
stub. Most likely candidates for first hit on a real game ELF:
EE timers, EE GS_PRIV, VIF0/VIF1, DMAC channels we haven't
mapped, scratch/SPRAM.
- `elf_timeout_with_hot_pc` with a non-loop hot PC — the game is
in a wait-for-service loop (libpad/libcdvd polling), which
guides what subsystem to model next.
Codex's framing was right: the first real-ELF blocker is more
informative than another BIOS-flow autopsy, because it tells us
which subsystem to model in priority order driven by what real
software actually exercises.
## Bumps hit during implementation (and notes for future TBs)
1. **iverilog 12: `@(posedge clk)` inside `always_ff` is illegal.**
The first compile attempt used `always_ff` for the "watch for
decisive event then $finish" block, with an extra
`@(posedge clk)` inside for trace-sink flush. iverilog errored.
Fix: use plain `always @(posedge clk)` (not `always_ff`) when
the block needs multiple event controls. Saved as a one-line
note here because the broader pattern was already covered by
[[feedback-observer-design-for-lineage]].
2. **EE memory map routes useg (top bit 0) to a separate
shadow.** Initial synthetic test used `entry = 0x00100008`
(kuseg). The TB loaded code into `ee_ram` at PHYS 0x100000,
but the EE core fetching VA 0x00100008 saw zeros from the
useg_shadow region (a Ch33 de-aliasing decision documented
in `ee_memory_map_stub.sv`). Switched the synthetic entry to
`0x80100008` (kseg0) so the fetch is routed to ee_ram via
phys-strip. **Real PS2 ELFs use kseg0 for their text segment
anyway** — this matches reality. The
`tools/elf_to_eeram.py` converter already strips alias bits
to compute phys placement, so it works for either kseg0 or
kuseg entries — only the synthetic generator's default
needed updating.
3. **Trampoline at 0xBFC00000 instead of `PC_RESET` override.**
ee_core_stub does have a `PC_RESET` parameter, but it's
elaboration-time only. To keep the runtime ELF entry
selectable via plusarg, the TB pokes a LUI/ORI/JR trampoline
into bios_rom's writeable `mem` array (sim-only hierarchical
access). EE boots at `0xBFC00000`, runs the 3-instruction
trampoline, and jumps to the ELF entry. Same technique the
existing addi/slti TBs use to install instruction images.
## Regression
Adding `tb_ee_core_elf_runner` to the run: list bumps the
expected PASS count from 157 to 158. Regression in flight.
## Recommendation for Codex's Ch271 call
The synthetic test is the framework smoke. The real signal is
what happens when a user-supplied game ELF lands:
> `make tb_ee_core_elf_runner_real ELF=<game.elf>`
Whatever verdict that emits is Ch271's framing. If
`elf_first_unsupported_opcode`, implement that opcode. If
`elf_first_unmapped_mmio`, add that region stub. The chapter is
one question — "what's the first blocker?" — and the verdict
answers it.
**Standing by for the first real ELF run.** The user can supply
any PS2 ELF — a homebrew demo, an extracted SLUS/SCUS executable
from a disc image, or a small libtoolchain test binary. The
framework treats them all identically; the verdict tells us
where to spend Ch271.