Files
retroDE_ps2/docs/ch295_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

184 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ch295 closeout — Strategy A worked: wait loop exited in one iteration
**Status:** Closed. Codex's Strategy A ($a0-aware experimental HLE
patch) worked **first try**. **Verdict from re-running qbert.elf:**
`elf_first_unhandled_syscall (pc=0x00111D94 $v1=0x79 (=121))`
qbert exited the Ch294 wait loop after exactly one iteration and
advanced into new code, hitting the next syscall blocker.
## The Ch294 hypothesis confirmed
Ch294 diagnosed: qbert spins forever because syscall 0x7A($a0=4)
returns 0, so `(retval & 0x00020000) == 0` always — bit 17 never
sets. Ch295 patched the HLE to return `0x00020000` when `$a0 == 4`.
**Result:** the wait loop iterated exactly once and exited. The
runner observer's `syscall_0x7A_split` line tells the whole story:
```
syscall_0x7A_split = count_a0_4=1 count_a0_0x80000000=1 count_a0_other=1
last_a0=0x00000002
```
| $a0 class | Calls | Match Ch294 |
|-----------|-------|-------------|
| 0x80000000 (init) | 1 | yes — the call at PC 0x00112408 before the loop |
| 0x00000004 (poll) | **1** | yes — the loop iterated exactly once and exited |
| other (= 2) | 1 | the post-loop call at PC 0x00112434 with $a0=2 |
**Loop iterations dropped from 181,494 → 1.** That's a 181k× collapse.
Ch294's gate identification was exactly right.
## What landed
### `rtl/ee/ee_core_stub.sv` — $a0-aware HLE
```sv
32'h0000_007A: begin
if (regfile[4] == 32'h0000_0004) begin
regfile[2] <= 32'h0002_0000;
gpr128[2] <= {96'd0, 32'h0002_0000};
end else begin
regfile[2] <= 32'd0;
gpr128[2] <= 128'd0;
end
pc <= pc + 32'd4;
retire_pulse <= 1'b1;
state <= S_IFETCH_REQ;
end
```
The HLE branches on `regfile[4]` (= `$a0`). For `$a0 == 4`, return
bit-17-set; otherwise return 0. Documented in the RTL comment as an
**experimental** unblock — not architectural truth. If qbert
misbranches downstream, this gets rolled back in favor of SDK
semantics or interrupt-delivery modeling.
### `tb_ee_core_syscall_hle.sv` — extended with the $a0=4 subcase
Six new BIOS slots (`S_ORI_A0_4`, `S_ORI_V1_7A_4`, `S_SYS_7A_4`,
`S_LUI_EXP_4`, `S_BNE_7A_4`, `S_DS_7A_4`) cover the $a0=4 case:
```
ori $a0, $0, 4 ; $a0 = 4
ori $v1, $0, 0x7A ; $v1 = 0x7A
syscall ; → $v0 = 0x00020000
lui $t1, 0x2 ; $t1 = 0x00020000 (expected)
bne $v0, $t1, FAIL ; verify
nop
```
Plus a new latch (`v0_after_7A_a0_4` / `seen_7A_a0_4_return`) +
assertion + display field. Existing 0x7A subcase ($a0=0, $v0=0)
unchanged. Result:
```
$v0_after_7A=0x00000000 $v0_after_7A_a0_4=0x00020000
```
### `tb_ee_core_elf_runner.sv` — per-$a0-class counters
New `syscall_0x7A_split` SUMMARY line shows count_a0_4 /
count_a0_0x80000000 / count_a0_other separately, plus
`first_v0_after` and `last_v0_after` for the actual returned $v0
sampled one cycle after retire (after the NBA commits).
These counters are the key Ch295 instrumentation: at a glance you
can see whether qbert's $a0-class distribution matches expectations
and whether the wait loop is collapsing or still spinning.
## qbert progression
| Chapter | Blocker | retire_count | Notes |
|---|---|---|---|
| Post-Ch293 (syscall 0x7A returns 0) | wait-loop spin | 1,661,413 (watchdog) | hot_pc=0x0011242C |
| **Post-Ch295 ($a0-aware 0x7A)** | **syscall $v1=0x79 at 0x00111D94** | **27,996** | hot_pc=0x00112354 |
The 1.66M → 27,996 retire-count drop is misleading on its own —
the Ch293 number was a watchdog total that included 181k spinning
loop iterations. The MEANINGFUL signal is:
- Wait loop iterations: 181,494 → **1**
- Next blocker shape: from `elf_timeout_with_hot_pc` (no progress)
`elf_first_unhandled_syscall` (concrete next demand)
That's a clean phase change from "stuck" to "next problem."
## Ch296 framing — syscall 0x79
The new blocker:
- `$v1 = 0x79` (= 121)
- `$a0 = 0x80000000` (kseg0 base — same as the 0x7A init call!)
- `$a1 = 0x00000000`
- `$a2 = 0x00000000`
- `$a3 = 0x001328C0` (same global context pointer)
- PC = `0x00111D94`
PS2 standard syscall table cites names like `ResetEE` (121) or
similar in this slot. The arg shape ($a0 = kseg0 base, $a3 = ctx)
suggests **a cleanup/finalize call symmetric to one of the earlier
init calls**. Note PC `0x00111D94` is close to `0x00111D24` (the
Ch289 syscall 0x78 site) — adjacent in the same kernel-wrapper
neighborhood.
Per the Ch285/289/290/291/293 precedent: another narrow $v0=0
extension + runner observer for syscall 0x79. Probably one
chapter. If qbert misbranches downstream, examine $a0/$a3 shapes
for hints.
## Notes on the experimental nature of Ch295
This chapter explicitly violates one principle: **the HLE return
value for syscall 0x7A is now a *hardcoded answer to qbert's
specific question*, not a model of any real PS2 kernel state.**
If a different ELF calls syscall 0x7A($a0=4), it'll get bit 17 set
unconditionally — which may or may not be correct for that ELF.
Codex framed this as acceptable for the falsifiable experiment:
"if it advances meaningfully, Ch296 identifies what bit 17
represents." We did advance meaningfully. The semantic question
("what does bit 17 actually mean in real PS2 kernel state?") is
deferred to whenever a second consumer of syscall 0x7A surfaces.
Risks logged:
- A different ELF might call syscall 0x7A($a0=4) expecting bit 17
to be 0 (e.g., a "not ready yet" semantic). For qbert, "ready"
= bit-17-set works. For other ELFs, the answer might differ.
- If qbert's downstream code reads syscall 0x7A($a0=4) more than
once per "event," we might see the same "ready" response too
many times — possibly causing duplicate event handling.
The runner observer's `count_a0_4=1` for qbert mitigates risk #2
for this specific run.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 1 dispatcher case modified
($a0-aware branch, ~10 LOC delta).
- `sim/tb/integration/tb_ee_core_syscall_hle.sv` — 6 new slots +
1 latch + 1 assertion + 1 display field.
- `sim/tb/integration/tb_ee_core_elf_runner.sv` — 3 new counter
signals + observer arm + SUMMARY line.
No new TB, no new Makefile target; regression count unchanged at
**176/176**.
## Pattern review (25 chapters)
| Ch | Pattern | Effect on qbert |
|----|---------|-----------------|
| 286 EI / 292 SYNC | narrow opcode accept | -- |
| 287/288 DMAC MMIO | new stubs | unmapped_mmio → 0 |
| 285/289/290/291/293 syscall HLE | narrow $v0=0 cases | each unlocks +few retires to +1.6M |
| 294 wait autopsy | observation-only | named the gate |
| **295 experimental $a0-aware HLE** | falsifiable patch | **loop iterations: 181,494 → 1** |
Ch295 is the first chapter where the HLE return value is
**context-dependent** rather than constant. The runner observer's
per-arg-class split made this falsifiable: the count_a0_4=1 fact
proves the patch worked, and the verdict shape change (timeout →
unhandled_syscall) proves qbert progressed semantically.
## Regression
**176/176 PASS** (unchanged from Ch294; no new TB).