ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
196 lines
8.7 KiB
Markdown
196 lines
8.7 KiB
Markdown
# Ch273 closeout — minimal EE syscall HLE; qbert clears its kernel-call prolog, next blocker is BEQL
|
||
|
||
**Status:** Closed. Codex's spec implemented exactly: minimal
|
||
HLE dispatcher for three crt0 syscalls (`EndOfHeap`,
|
||
`InitMainThread`, `FlushCache`), gated behind a parameter so
|
||
existing TBs are unaffected. **Verdict from re-running
|
||
qbert.elf:** `elf_first_unsupported_opcode (pc=0x001000C0
|
||
instr=0x50600004)` — **BEQL** (branch on equal likely), MIPS-II.
|
||
That frames Ch274.
|
||
|
||
## Numbers across the opcode/syscall chapters
|
||
|
||
| Chapter | Blocker | qbert retire_count | Verdict |
|
||
|---------|---------|---------------------|---------|
|
||
| Ch270 (init) | SQ at 0x00100024 | 12 | first_unsupported_opcode |
|
||
| Post-Ch271 (SQ) | DADDU at 0x00100068 | 26,958 | first_unsupported_opcode |
|
||
| Post-Ch272 (DADDU) | SYSCALL at 0x00100070 | 26,960 | `elf_halted` |
|
||
| **Post-Ch273 (SYSCALL HLE)** | **BEQL at 0x001000C0** | **26,980** | **`elf_first_unsupported_opcode`** |
|
||
|
||
20 more retires this chapter: all 3 syscalls dispatched, the
|
||
prolog used the returns to set up `$sp` and a small initializer-
|
||
table walker, and the trap fires at the FIRST instruction the
|
||
crt0 emits that we don't decode — `BEQL`.
|
||
|
||
## What landed
|
||
|
||
### RTL — 2 surgical additions in `ee_core_stub.sv`
|
||
|
||
1. **Parameter**: `EE_SYSCALL_HLE_ENABLE` (default `1'b0`) +
|
||
`SYSCALL_HEAP_END` (default `32'h001E_0000`). Default-off so
|
||
every existing TB whose `syscall` is a "halt-PASS-marker"
|
||
(addi/slti/etc.) keeps its semantics.
|
||
2. **Dispatcher**: new `else if (EE_SYSCALL_HLE_ENABLE)` branch
|
||
after the Ch199 special case. `case (regfile[3])` on `$v1`:
|
||
|
||
| `$v1` | name | `$v0` returned | resume |
|
||
|-------|----------------|-----------------------|-------------|
|
||
| 0x3C | EndOfHeap | `SYSCALL_HEAP_END` | PC + 4 |
|
||
| 0x3D | InitMainThread | 0 | PC + 4 |
|
||
| 0x64 | FlushCache | 0 | PC + 4 |
|
||
| other | (unhandled) | (none) | **halt** |
|
||
|
||
`pc <= pc + 4` (per Codex's correction — this is normal
|
||
user-code SYSCALL resume, NOT RFE; RFE is Ch199's path).
|
||
|
||
### Focused TB — `tb_ee_core_syscall_hle`
|
||
|
||
Four cases:
|
||
1. `syscall` with `$v1=0x3C` → verify `$v0 = 0x001E0000`
|
||
2. `syscall` with `$v1=0x3D` → verify `$v0 = 0`
|
||
3. `syscall` with `$v1=0x64` → verify `$v0 = 0`
|
||
4. `syscall` with `$v1=0x7777` → verify HALT (PASS marker)
|
||
|
||
Independent verification: captures `$v0` at the cycle AFTER each
|
||
known syscall retires AND runs a `BNE $v0, expected, FAIL` chain.
|
||
Both must agree. Final PC + `$v1=0x7777` post-halt confirms we
|
||
landed on the unhandled-syscall path correctly.
|
||
|
||
Result: `retired=17 halt=1 trap=0 errors=0 PASS`.
|
||
|
||
### Runner update — `tb_ee_core_elf_runner.sv`
|
||
|
||
- Wires `EE_SYSCALL_HLE_ENABLE=1` on the ee_core_stub.
|
||
- Halt-time SUMMARY now includes the live register snapshot:
|
||
```
|
||
saw_halt = 1 at_pc=0x... $v1=0x... $a0=0x... $a1=0x... $a2=0x... $a3=0x...
|
||
```
|
||
- New verdict shape `elf_first_unhandled_syscall` when the halt
|
||
is on a `0x0000000C` instruction with unknown `$v1`. (For this
|
||
qbert run, the dispatcher handled all 3 and the trap was a
|
||
separate opcode issue — but the verdict shape is ready for
|
||
whenever the next unknown SYSCALL surfaces.)
|
||
|
||
### Makefile
|
||
|
||
- `tb_ee_core_syscall_hle` target.
|
||
- Added to both regression lists.
|
||
- Regression: 160 → **161**.
|
||
|
||
## Codex Ch273 acceptance — line-by-line
|
||
|
||
| Requirement | Status |
|
||
|----------------------------------------------------------------------------|--------|
|
||
| Minimal HLE handler in ee_core_stub for normal user-mode SYSCALL | ✅ |
|
||
| $v1=0x3C EndOfHeap → conservative top-of-RAM, PC+=4 | ✅ |
|
||
| $v1=0x3D InitMainThread → success ($v0=0), no scheduler mutation, PC+=4 | ✅ |
|
||
| $v1=0x64 FlushCache → no-op success, PC+=4 | ✅ |
|
||
| **Not RFE — PC = syscall PC + 4** | ✅ |
|
||
| Unhandled $v1 still halts; TB can read $v1/$a0-$a3 for verdict | ✅ |
|
||
| Focused TB: 3 syscalls in sequence + 1 unknown-fallback | ✅ |
|
||
| Regression unchanged for default-off | ✅ |
|
||
| Re-run qbert, report next blocker | ✅ |
|
||
|
||
## qbert disassembly around the new blocker
|
||
|
||
```
|
||
0x001000A0: lui $v0, 0x0013 ; $v0 = 0x00130000
|
||
0x001000A4: addiu $v0, $v0, 0xC800 ; $v0 = 0x0012C800
|
||
0x001000A8: lw $v1, 0($v0) ; $v1 = mem[0x0012C800]
|
||
0x001000AC: bne $v1, $0, +7*4 ; skip ahead if non-zero
|
||
0x001000B0: nop ; delay
|
||
0x001000B4: lui $v0, 0x0013
|
||
0x001000B8: addiu $v0, $v0, 0xC944 ; $v0 = 0x0012C944
|
||
0x001000BC: lw $v1, 0($v0) ; $v1 = mem[0x0012C944] (= 0 per halt $v1=0)
|
||
0x001000C0: beql $v1, $0, +4*4 ; <-- TRAPS HERE
|
||
0x001000C4: addiu $a0, $0, 0 ; delay slot (squashed if BEQL not taken)
|
||
0x001000C8: addiu $v0, $v1, 4
|
||
0x001000CC: lw $a0, 0($v0)
|
||
0x001000D0: addiu $a1, $v0, 4
|
||
0x001000D4: jal <constructor table walker>
|
||
```
|
||
|
||
This is the C++ static-constructor walker (or a similar
|
||
initialization table). The BEQL checks whether the table head
|
||
pointer is null — and **branch-likely semantics are
|
||
load-bearing**: the delay slot at `0x001000C4` clobbers `$a0`
|
||
to 0 only if the branch is taken. If we naïvely decode BEQL as
|
||
plain BEQ, the delay slot would execute on the not-taken path
|
||
too, silently corrupting `$a0`.
|
||
|
||
## Recommendation for Codex's Ch274
|
||
|
||
**Implement BEQL with proper "squash on not-taken" semantics.**
|
||
|
||
MIPS-II "branch likely" family: BEQL (0x14), BNEL (0x15), BLEZL
|
||
(0x16), BGTZL (0x17), and REGIMM BLTZL/BGEZL/BLTZALL/BGEZALL.
|
||
Compilers (especially older PS2 SDK gcc with `-fmoveloop-invariants`
|
||
or default for-loops) emit these as the canonical loop branch.
|
||
|
||
Three Ch274 framings, in order of scope:
|
||
|
||
1. **BEQL only.** Smallest change. Decode `is_beql`, share
|
||
`branch_taken` logic with BEQ (rs==rt), but unlike BEQ, when
|
||
not taken: PC += 8 (skip both the branch and its delay slot),
|
||
no delay-slot execute. Adds `is_branch_likely` distinction
|
||
in the retire/PC-advance logic.
|
||
2. **BEQL + BNEL** (the two most common). BNEL is the inverse
|
||
condition (rs!=rt); same likely semantics. Both surface as
|
||
`0x14` (BEQL) and `0x15` (BNEL) opcodes.
|
||
3. **Full branch-likely family.** BEQL/BNEL/BLEZL/BGTZL + REGIMM
|
||
variants. Bigger surface; usually you only need 1–2 of these
|
||
per chapter until qbert/a later ELF surfaces another.
|
||
|
||
**My read: (1) — BEQL only.** Same one-question-one-chapter
|
||
pattern. The next blocker after BEQL might or might not be
|
||
BNEL; let the runner pick.
|
||
|
||
The implementation hook: existing ee_core_stub has
|
||
`branch_pending` + `instr_in_delay_slot` + a `branch_taken`
|
||
combinational signal. For BEQL we need to gate "set
|
||
branch_pending + queue delay-slot execution" on `branch_taken`,
|
||
and on not-taken just `pc <= pc + 8` directly (skip the delay
|
||
slot). Probably a 5–8 line change.
|
||
|
||
Focused TB: 3 cases mirroring Ch272 shape —
|
||
- BEQL taken: `$v1==$0`, target reached, delay slot executed
|
||
(writes $a0 to a sentinel value).
|
||
- BEQL not-taken: `$v1!=$0`, target NOT reached, delay slot
|
||
squashed (sentinel value NOT written; the original $a0
|
||
preserved).
|
||
- Cross-check vs BEQ: identical inputs through a BEQ should
|
||
produce different $a0 on the not-taken case (BEQ's delay
|
||
slot fires).
|
||
|
||
## Files changed
|
||
|
||
- `rtl/ee/ee_core_stub.sv` — 2 surgical additions (parameter +
|
||
dispatcher case statement, ~30 LOC).
|
||
- `sim/tb/integration/tb_ee_core_syscall_hle.sv` — new focused TB.
|
||
- `sim/tb/integration/tb_ee_core_elf_runner.sv` — enable
|
||
`EE_SYSCALL_HLE_ENABLE`; new halt-time register snapshot;
|
||
`elf_first_unhandled_syscall` verdict shape.
|
||
- `sim/Makefile` — target + both regression lists.
|
||
|
||
## Regression
|
||
|
||
In flight; expected **161/161** (was 160, +1 for
|
||
`tb_ee_core_syscall_hle`).
|
||
|
||
## Process notes
|
||
|
||
- **Codex's PC+4 correction was right.** My initial closeout
|
||
draft for Ch272 suggested "RFE-style return" — Codex caught
|
||
it. RFE is for the Ch199 `_ReturnFromException` path; normal
|
||
user-mode `syscall` resumes at PC+4, no Status stack pop.
|
||
Filed this in the memory entry so a future chapter doesn't
|
||
repeat the same wrong assumption.
|
||
- **Parameter gating is the right call.** Existing TBs that use
|
||
`syscall` as a halt-PASS-marker would have broken if their
|
||
`$v1` happened to be 0x3C/0x3D/0x64. Gating preserved 160
|
||
passing tests trivially; only the ELF runner opts in.
|
||
- **The verdict shape now distinguishes 4 halts**: trap (strict
|
||
opcode), unmapped MMIO, halt-on-syscall (with $v1/$a0..$a3),
|
||
halt-on-other (unexpected). The runner is becoming a real
|
||
triage tool.
|