Files
retroDE_ps2/docs/ch273_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

8.7 KiB
Raw Blame History

Ch273 closeout — minimal EE syscall HLE; qbert clears its kernel-call prolog, next blocker is BEQL

Status: Closed. Codex's spec implemented exactly: minimal HLE dispatcher for three crt0 syscalls (EndOfHeap, InitMainThread, FlushCache), gated behind a parameter so existing TBs are unaffected. Verdict from re-running qbert.elf: elf_first_unsupported_opcode (pc=0x001000C0 instr=0x50600004)BEQL (branch on equal likely), MIPS-II. That frames Ch274.

Numbers across the opcode/syscall chapters

Chapter Blocker qbert retire_count Verdict
Ch270 (init) SQ at 0x00100024 12 first_unsupported_opcode
Post-Ch271 (SQ) DADDU at 0x00100068 26,958 first_unsupported_opcode
Post-Ch272 (DADDU) SYSCALL at 0x00100070 26,960 elf_halted
Post-Ch273 (SYSCALL HLE) BEQL at 0x001000C0 26,980 elf_first_unsupported_opcode

20 more retires this chapter: all 3 syscalls dispatched, the prolog used the returns to set up $sp and a small initializer- table walker, and the trap fires at the FIRST instruction the crt0 emits that we don't decode — BEQL.

What landed

RTL — 2 surgical additions in ee_core_stub.sv

  1. Parameter: EE_SYSCALL_HLE_ENABLE (default 1'b0) + SYSCALL_HEAP_END (default 32'h001E_0000). Default-off so every existing TB whose syscall is a "halt-PASS-marker" (addi/slti/etc.) keeps its semantics.

  2. Dispatcher: new else if (EE_SYSCALL_HLE_ENABLE) branch after the Ch199 special case. case (regfile[3]) on $v1:

    $v1 name $v0 returned resume
    0x3C EndOfHeap SYSCALL_HEAP_END PC + 4
    0x3D InitMainThread 0 PC + 4
    0x64 FlushCache 0 PC + 4
    other (unhandled) (none) halt

    pc <= pc + 4 (per Codex's correction — this is normal user-code SYSCALL resume, NOT RFE; RFE is Ch199's path).

Focused TB — tb_ee_core_syscall_hle

Four cases:

  1. syscall with $v1=0x3C → verify $v0 = 0x001E0000
  2. syscall with $v1=0x3D → verify $v0 = 0
  3. syscall with $v1=0x64 → verify $v0 = 0
  4. syscall with $v1=0x7777 → verify HALT (PASS marker)

Independent verification: captures $v0 at the cycle AFTER each known syscall retires AND runs a BNE $v0, expected, FAIL chain. Both must agree. Final PC + $v1=0x7777 post-halt confirms we landed on the unhandled-syscall path correctly.

Result: retired=17 halt=1 trap=0 errors=0 PASS.

Runner update — tb_ee_core_elf_runner.sv

  • Wires EE_SYSCALL_HLE_ENABLE=1 on the ee_core_stub.
  • Halt-time SUMMARY now includes the live register snapshot:
    saw_halt = 1  at_pc=0x... $v1=0x... $a0=0x... $a1=0x... $a2=0x... $a3=0x...
    
  • New verdict shape elf_first_unhandled_syscall when the halt is on a 0x0000000C instruction with unknown $v1. (For this qbert run, the dispatcher handled all 3 and the trap was a separate opcode issue — but the verdict shape is ready for whenever the next unknown SYSCALL surfaces.)

Makefile

  • tb_ee_core_syscall_hle target.
  • Added to both regression lists.
  • Regression: 160 → 161.

Codex Ch273 acceptance — line-by-line

Requirement Status
Minimal HLE handler in ee_core_stub for normal user-mode SYSCALL
$v1=0x3C EndOfHeap → conservative top-of-RAM, PC+=4
$v1=0x3D InitMainThread → success ($v0=0), no scheduler mutation, PC+=4
$v1=0x64 FlushCache → no-op success, PC+=4
Not RFE — PC = syscall PC + 4
Unhandled $v1 still halts; TB can read $v1/$a0-$a3 for verdict
Focused TB: 3 syscalls in sequence + 1 unknown-fallback
Regression unchanged for default-off
Re-run qbert, report next blocker

qbert disassembly around the new blocker

0x001000A0: lui   $v0, 0x0013          ; $v0 = 0x00130000
0x001000A4: addiu $v0, $v0, 0xC800     ; $v0 = 0x0012C800
0x001000A8: lw    $v1, 0($v0)          ; $v1 = mem[0x0012C800]
0x001000AC: bne   $v1, $0, +7*4        ; skip ahead if non-zero
0x001000B0: nop                         ; delay
0x001000B4: lui   $v0, 0x0013
0x001000B8: addiu $v0, $v0, 0xC944     ; $v0 = 0x0012C944
0x001000BC: lw    $v1, 0($v0)          ; $v1 = mem[0x0012C944]  (= 0 per halt $v1=0)
0x001000C0: beql  $v1, $0, +4*4        ; <-- TRAPS HERE
0x001000C4: addiu $a0, $0, 0           ; delay slot (squashed if BEQL not taken)
0x001000C8: addiu $v0, $v1, 4
0x001000CC: lw    $a0, 0($v0)
0x001000D0: addiu $a1, $v0, 4
0x001000D4: jal   <constructor table walker>

This is the C++ static-constructor walker (or a similar initialization table). The BEQL checks whether the table head pointer is null — and branch-likely semantics are load-bearing: the delay slot at 0x001000C4 clobbers $a0 to 0 only if the branch is taken. If we naïvely decode BEQL as plain BEQ, the delay slot would execute on the not-taken path too, silently corrupting $a0.

Recommendation for Codex's Ch274

Implement BEQL with proper "squash on not-taken" semantics.

MIPS-II "branch likely" family: BEQL (0x14), BNEL (0x15), BLEZL (0x16), BGTZL (0x17), and REGIMM BLTZL/BGEZL/BLTZALL/BGEZALL. Compilers (especially older PS2 SDK gcc with -fmoveloop-invariants or default for-loops) emit these as the canonical loop branch.

Three Ch274 framings, in order of scope:

  1. BEQL only. Smallest change. Decode is_beql, share branch_taken logic with BEQ (rs==rt), but unlike BEQ, when not taken: PC += 8 (skip both the branch and its delay slot), no delay-slot execute. Adds is_branch_likely distinction in the retire/PC-advance logic.
  2. BEQL + BNEL (the two most common). BNEL is the inverse condition (rs!=rt); same likely semantics. Both surface as 0x14 (BEQL) and 0x15 (BNEL) opcodes.
  3. Full branch-likely family. BEQL/BNEL/BLEZL/BGTZL + REGIMM variants. Bigger surface; usually you only need 12 of these per chapter until qbert/a later ELF surfaces another.

My read: (1) — BEQL only. Same one-question-one-chapter pattern. The next blocker after BEQL might or might not be BNEL; let the runner pick.

The implementation hook: existing ee_core_stub has branch_pending + instr_in_delay_slot + a branch_taken combinational signal. For BEQL we need to gate "set branch_pending + queue delay-slot execution" on branch_taken, and on not-taken just pc <= pc + 8 directly (skip the delay slot). Probably a 58 line change.

Focused TB: 3 cases mirroring Ch272 shape —

  • BEQL taken: $v1==$0, target reached, delay slot executed (writes $a0 to a sentinel value).
  • BEQL not-taken: $v1!=$0, target NOT reached, delay slot squashed (sentinel value NOT written; the original $a0 preserved).
  • Cross-check vs BEQ: identical inputs through a BEQ should produce different $a0 on the not-taken case (BEQ's delay slot fires).

Files changed

  • rtl/ee/ee_core_stub.sv — 2 surgical additions (parameter + dispatcher case statement, ~30 LOC).
  • sim/tb/integration/tb_ee_core_syscall_hle.sv — new focused TB.
  • sim/tb/integration/tb_ee_core_elf_runner.sv — enable EE_SYSCALL_HLE_ENABLE; new halt-time register snapshot; elf_first_unhandled_syscall verdict shape.
  • sim/Makefile — target + both regression lists.

Regression

In flight; expected 161/161 (was 160, +1 for tb_ee_core_syscall_hle).

Process notes

  • Codex's PC+4 correction was right. My initial closeout draft for Ch272 suggested "RFE-style return" — Codex caught it. RFE is for the Ch199 _ReturnFromException path; normal user-mode syscall resumes at PC+4, no Status stack pop. Filed this in the memory entry so a future chapter doesn't repeat the same wrong assumption.
  • Parameter gating is the right call. Existing TBs that use syscall as a halt-PASS-marker would have broken if their $v1 happened to be 0x3C/0x3D/0x64. Gating preserved 160 passing tests trivially; only the ELF runner opts in.
  • The verdict shape now distinguishes 4 halts: trap (strict opcode), unmapped MMIO, halt-on-syscall (with $v1/$a0..$a3), halt-on-other (unexpected). The runner is becoming a real triage tool.