ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
184 lines
6.9 KiB
Markdown
184 lines
6.9 KiB
Markdown
# Ch295 closeout — Strategy A worked: wait loop exited in one iteration
|
||
|
||
**Status:** Closed. Codex's Strategy A ($a0-aware experimental HLE
|
||
patch) worked **first try**. **Verdict from re-running qbert.elf:**
|
||
`elf_first_unhandled_syscall (pc=0x00111D94 $v1=0x79 (=121))` —
|
||
qbert exited the Ch294 wait loop after exactly one iteration and
|
||
advanced into new code, hitting the next syscall blocker.
|
||
|
||
## The Ch294 hypothesis confirmed
|
||
|
||
Ch294 diagnosed: qbert spins forever because syscall 0x7A($a0=4)
|
||
returns 0, so `(retval & 0x00020000) == 0` always — bit 17 never
|
||
sets. Ch295 patched the HLE to return `0x00020000` when `$a0 == 4`.
|
||
|
||
**Result:** the wait loop iterated exactly once and exited. The
|
||
runner observer's `syscall_0x7A_split` line tells the whole story:
|
||
|
||
```
|
||
syscall_0x7A_split = count_a0_4=1 count_a0_0x80000000=1 count_a0_other=1
|
||
last_a0=0x00000002
|
||
```
|
||
|
||
| $a0 class | Calls | Match Ch294 |
|
||
|-----------|-------|-------------|
|
||
| 0x80000000 (init) | 1 | yes — the call at PC 0x00112408 before the loop |
|
||
| 0x00000004 (poll) | **1** | yes — the loop iterated exactly once and exited |
|
||
| other (= 2) | 1 | the post-loop call at PC 0x00112434 with $a0=2 |
|
||
|
||
**Loop iterations dropped from 181,494 → 1.** That's a 181k× collapse.
|
||
Ch294's gate identification was exactly right.
|
||
|
||
## What landed
|
||
|
||
### `rtl/ee/ee_core_stub.sv` — $a0-aware HLE
|
||
|
||
```sv
|
||
32'h0000_007A: begin
|
||
if (regfile[4] == 32'h0000_0004) begin
|
||
regfile[2] <= 32'h0002_0000;
|
||
gpr128[2] <= {96'd0, 32'h0002_0000};
|
||
end else begin
|
||
regfile[2] <= 32'd0;
|
||
gpr128[2] <= 128'd0;
|
||
end
|
||
pc <= pc + 32'd4;
|
||
retire_pulse <= 1'b1;
|
||
state <= S_IFETCH_REQ;
|
||
end
|
||
```
|
||
|
||
The HLE branches on `regfile[4]` (= `$a0`). For `$a0 == 4`, return
|
||
bit-17-set; otherwise return 0. Documented in the RTL comment as an
|
||
**experimental** unblock — not architectural truth. If qbert
|
||
misbranches downstream, this gets rolled back in favor of SDK
|
||
semantics or interrupt-delivery modeling.
|
||
|
||
### `tb_ee_core_syscall_hle.sv` — extended with the $a0=4 subcase
|
||
|
||
Six new BIOS slots (`S_ORI_A0_4`, `S_ORI_V1_7A_4`, `S_SYS_7A_4`,
|
||
`S_LUI_EXP_4`, `S_BNE_7A_4`, `S_DS_7A_4`) cover the $a0=4 case:
|
||
|
||
```
|
||
ori $a0, $0, 4 ; $a0 = 4
|
||
ori $v1, $0, 0x7A ; $v1 = 0x7A
|
||
syscall ; → $v0 = 0x00020000
|
||
lui $t1, 0x2 ; $t1 = 0x00020000 (expected)
|
||
bne $v0, $t1, FAIL ; verify
|
||
nop
|
||
```
|
||
|
||
Plus a new latch (`v0_after_7A_a0_4` / `seen_7A_a0_4_return`) +
|
||
assertion + display field. Existing 0x7A subcase ($a0=0, $v0=0)
|
||
unchanged. Result:
|
||
|
||
```
|
||
$v0_after_7A=0x00000000 $v0_after_7A_a0_4=0x00020000
|
||
```
|
||
|
||
### `tb_ee_core_elf_runner.sv` — per-$a0-class counters
|
||
|
||
New `syscall_0x7A_split` SUMMARY line shows count_a0_4 /
|
||
count_a0_0x80000000 / count_a0_other separately, plus
|
||
`first_v0_after` and `last_v0_after` for the actual returned $v0
|
||
sampled one cycle after retire (after the NBA commits).
|
||
|
||
These counters are the key Ch295 instrumentation: at a glance you
|
||
can see whether qbert's $a0-class distribution matches expectations
|
||
and whether the wait loop is collapsing or still spinning.
|
||
|
||
## qbert progression
|
||
|
||
| Chapter | Blocker | retire_count | Notes |
|
||
|---|---|---|---|
|
||
| Post-Ch293 (syscall 0x7A returns 0) | wait-loop spin | 1,661,413 (watchdog) | hot_pc=0x0011242C |
|
||
| **Post-Ch295 ($a0-aware 0x7A)** | **syscall $v1=0x79 at 0x00111D94** | **27,996** | hot_pc=0x00112354 |
|
||
|
||
The 1.66M → 27,996 retire-count drop is misleading on its own —
|
||
the Ch293 number was a watchdog total that included 181k spinning
|
||
loop iterations. The MEANINGFUL signal is:
|
||
- Wait loop iterations: 181,494 → **1**
|
||
- Next blocker shape: from `elf_timeout_with_hot_pc` (no progress)
|
||
→ `elf_first_unhandled_syscall` (concrete next demand)
|
||
|
||
That's a clean phase change from "stuck" to "next problem."
|
||
|
||
## Ch296 framing — syscall 0x79
|
||
|
||
The new blocker:
|
||
- `$v1 = 0x79` (= 121)
|
||
- `$a0 = 0x80000000` (kseg0 base — same as the 0x7A init call!)
|
||
- `$a1 = 0x00000000`
|
||
- `$a2 = 0x00000000`
|
||
- `$a3 = 0x001328C0` (same global context pointer)
|
||
- PC = `0x00111D94`
|
||
|
||
PS2 standard syscall table cites names like `ResetEE` (121) or
|
||
similar in this slot. The arg shape ($a0 = kseg0 base, $a3 = ctx)
|
||
suggests **a cleanup/finalize call symmetric to one of the earlier
|
||
init calls**. Note PC `0x00111D94` is close to `0x00111D24` (the
|
||
Ch289 syscall 0x78 site) — adjacent in the same kernel-wrapper
|
||
neighborhood.
|
||
|
||
Per the Ch285/289/290/291/293 precedent: another narrow $v0=0
|
||
extension + runner observer for syscall 0x79. Probably one
|
||
chapter. If qbert misbranches downstream, examine $a0/$a3 shapes
|
||
for hints.
|
||
|
||
## Notes on the experimental nature of Ch295
|
||
|
||
This chapter explicitly violates one principle: **the HLE return
|
||
value for syscall 0x7A is now a *hardcoded answer to qbert's
|
||
specific question*, not a model of any real PS2 kernel state.**
|
||
If a different ELF calls syscall 0x7A($a0=4), it'll get bit 17 set
|
||
unconditionally — which may or may not be correct for that ELF.
|
||
|
||
Codex framed this as acceptable for the falsifiable experiment:
|
||
"if it advances meaningfully, Ch296 identifies what bit 17
|
||
represents." We did advance meaningfully. The semantic question
|
||
("what does bit 17 actually mean in real PS2 kernel state?") is
|
||
deferred to whenever a second consumer of syscall 0x7A surfaces.
|
||
|
||
Risks logged:
|
||
- A different ELF might call syscall 0x7A($a0=4) expecting bit 17
|
||
to be 0 (e.g., a "not ready yet" semantic). For qbert, "ready"
|
||
= bit-17-set works. For other ELFs, the answer might differ.
|
||
- If qbert's downstream code reads syscall 0x7A($a0=4) more than
|
||
once per "event," we might see the same "ready" response too
|
||
many times — possibly causing duplicate event handling.
|
||
|
||
The runner observer's `count_a0_4=1` for qbert mitigates risk #2
|
||
for this specific run.
|
||
|
||
## Files changed
|
||
|
||
- `rtl/ee/ee_core_stub.sv` — 1 dispatcher case modified
|
||
($a0-aware branch, ~10 LOC delta).
|
||
- `sim/tb/integration/tb_ee_core_syscall_hle.sv` — 6 new slots +
|
||
1 latch + 1 assertion + 1 display field.
|
||
- `sim/tb/integration/tb_ee_core_elf_runner.sv` — 3 new counter
|
||
signals + observer arm + SUMMARY line.
|
||
|
||
No new TB, no new Makefile target; regression count unchanged at
|
||
**176/176**.
|
||
|
||
## Pattern review (25 chapters)
|
||
|
||
| Ch | Pattern | Effect on qbert |
|
||
|----|---------|-----------------|
|
||
| 286 EI / 292 SYNC | narrow opcode accept | -- |
|
||
| 287/288 DMAC MMIO | new stubs | unmapped_mmio → 0 |
|
||
| 285/289/290/291/293 syscall HLE | narrow $v0=0 cases | each unlocks +few retires to +1.6M |
|
||
| 294 wait autopsy | observation-only | named the gate |
|
||
| **295 experimental $a0-aware HLE** | falsifiable patch | **loop iterations: 181,494 → 1** |
|
||
|
||
Ch295 is the first chapter where the HLE return value is
|
||
**context-dependent** rather than constant. The runner observer's
|
||
per-arg-class split made this falsifiable: the count_a0_4=1 fact
|
||
proves the patch worked, and the verdict shape change (timeout →
|
||
unhandled_syscall) proves qbert progressed semantically.
|
||
|
||
## Regression
|
||
|
||
**176/176 PASS** (unchanged from Ch294; no new TB).
|