Files
retroDE_ps2/docs/ch272_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

7.0 KiB

Ch272 closeout — DADDU implemented; qbert clears the prolog ALU work, hits SYSCALL #60

Status: Closed. Verdict from re-running qbert.elf: elf_halted — qbert ran past DADDU cleanly and executed SYSCALL at PC 0x00100070 (= SYSCALL #60, EndOfHeap, the first kernel call in the standard PS2 crt0 prolog). That frames Ch273.

Numbers

Metric Ch270 (init) Post-Ch271 (SQ) Post-Ch272 (DADDU)
qbert retire_count 12 26,958 26,960
Verdict first_unsupported_opcode first_unsupported_opcode elf_halted (new)
Blocker PC 0x00100024 0x00100068 0x00100070
Blocker instr / kind 0x7C400000 (SQ) 0x0080E02D (DADDU) 0x0000000C (SYSCALL)

The retire delta from Ch271 → Ch272 is small (+2) because the DADDU we implemented is at PC 0x00100068, immediately followed by addiu $v1, $0, 0x3C (the syscall number) and syscall. The core retires the DADDU + the ADDIU, then halts on the SYSCALL. The chain of next syscalls (61, 100) is queued up at 0x0010008C / 0x0010009C.

What landed

RTL — 4 surgical edits in ee_core_stub.sv

  1. localparam logic [5:0] FUNC_DADDU = 6'h2D alongside FUNC_ADDU.
  2. is_daddu logic decl + assign is_daddu = is_special && (func == FUNC_DADDU).
  3. Added is_daddu to the is_rtype_alu group.
  4. Added is_daddu to the (is_add || is_addu) arm of rtype_alu_wb — same low-32-bit add, no overflow trap.

Upper 32 bits of the 64-bit DADDU are silently dropped, exactly matching how ADDU already behaves in this stub. Documented in the RTL comment.

Focused TB — tb_ee_core_daddu

Three cases per Codex's spec:

  1. Normal add: daddu $t0, $a0, $a1 with $a0=5, $a1=3$t0 = 8.
  2. Move case (exact qbert encoding): builds the literal 0x0080E02D via enc_rtype() and asserts the produced word equals 0x0080E02D before installing it — so a future regression to the encoder helper trips loudly here. Then daddu $gp, $a0, $zero with $a0=5$gp = 5.
  3. Wraparound: daddu $t3, $a2, $a2 with $a2 = 0x80000000$t3 = 0 (low 32 bits wrap). No overflow trap. Post-halt, trap_events == 0 confirms.

Belt-and-braces hierarchical register peeks after halt for $t0/$gp/$t3 so a future BNE-chain regression can't silently pass with wrong values.

Result: retired=17 halt=1 trap=0 pc=0xbfc00138 errors=0 PASS. Final PC at the PASS syscall slot.

Makefile + regression

  • tb_ee_core_daddu target.
  • Added to both PHONY list and run: master.
  • Regression bumps 159 → 160.

qbert disassembly around the new blocker (PC 0x00100070)

Decoded from the qbert.elf file (python3 -c "..." with struct.unpack):

0x00100060: 0x3C080010  lui   $t0, 0x0010
0x00100064: 0x25080188  addiu $t0, $t0, 0x0188      ; $t0 = 0x00100188 ($gp seed?)
0x00100068: 0x0080E02D  daddu $gp, $a0, $0          ; Ch272 — $gp <- $a0
0x0010006C: 0x2403003C  addiu $v1, $0, 0x003C       ; $v1 = 60 = EndOfHeap
0x00100070: 0x0000000C  syscall                     ; <-- CURRENT BLOCKER
0x00100074: 0x0040E82D  daddu $sp, $v0, $0          ; $sp <- $v0 (heap-end addr)
0x00100078: 0x2403003D  addiu $v1, $0, 0x003D       ; $v1 = 61 = InitMainThread
0x0010007C: 0x3C040014  lui   $a0, 0x0014
0x00100080: 0x2484B6E8  addiu $a0, $a0, -0x4918     ; $a0 = 0x0013B6E8
0x00100084: 0x3C050000  lui   $a1, 0x0000
0x00100088: 0x24A5FFFF  addiu $a1, $a1, -1          ; $a1 = -1 (default stack size)
0x0010008C: 0x0000000C  syscall                     ; SYSCALL #61
0x00100090: 0x00000000  nop
0x00100094: 0x24030064  addiu $v1, $0, 0x0064       ; $v1 = 100 = FlushCache
0x00100098: 0x0000202D  daddu $a0, $0, $0           ; $a0 = 0
0x0010009C: 0x0000000C  syscall                     ; SYSCALL #100

This is textbook PS2 crt0 init:

  1. EndOfHeap() returns the end of the heap; result becomes $sp.
  2. InitMainThread(stack_addr=0x0013B6E8, stack_size=-1, gp, priority) initializes the main thread; result presumably also touches $sp or returns success.
  3. FlushCache(0) flushes the instruction cache.

If we don't model these, qbert can't even reach main().

Recommendation for Codex's Ch273

The next blocker is SYSCALL, not an opcode. Three Ch273 framings:

(A) Minimal "kernel-stub" SYSCALL dispatch. Replace the current "halt on any non-Ch199 syscall" with a small case statement keyed on $v1. For the three qbert needs immediately:

$v1 name minimum needed
0x3C EndOfHeap return $v0 = 0x001E0000 (or any plausible end-of-RAM); advance PC; RFE
0x3D InitMainThread return $v0 = $a0 (or $a0+$a1; "stack-base" pattern); advance PC; RFE
0x64 FlushCache return $v0 = 0 (no model'd cache); advance PC; RFE

Each case is "set $v0, RFE back to EPC+4." Unhandled syscalls fall through to the existing halt (so we still find the next real blocker).

(B) "Generic-return" SYSCALL. Make EVERY SYSCALL (other than the Ch199 special case) just set $v0 = 0 and RFE. Even faster to land, but a syscall that EXPECTS a non-zero return (like EndOfHeap returning the heap-end address) would silently misbehave — $sp would become 0, and the next LW would AdES-trap or write to garbage. Probably wrong choice.

(C) Full PS2 EE kernel-call dispatcher. Hundreds of syscalls (InitMainThread, CreateThread, WaitSema, SifSetReg, GsPutIMR, ...). Out of scope for one chapter.

My read: (A). Three syscalls, three case arms, three focused TB checks. Same incremental-growth pattern as Ch271/272 but at the system-call level instead of the opcode level.

The three values returned (EndOfHeap, InitMainThread, FlushCache) need to be plausible for qbert's downstream code to work. EndOfHeap returning 0x001E0000 (1.875 MiB) keeps the stack below the 2 MiB EE-RAM ceiling our TB allocates. The exact return values for InitMainThread can probably be "return what would be sensible" — Codex can pick.

Files changed

  • rtl/ee/ee_core_stub.sv — 4 surgical edits (~6 LOC total).
  • sim/tb/integration/tb_ee_core_daddu.sv — new focused TB.
  • sim/Makefiletb_ee_core_daddu target + both regression lists.

Regression

In flight; expected 160/160 (was 159, +1 for tb_ee_core_daddu).

Pattern-summary

Ch271 + Ch272 = the opcode-by-opcode growth track Codex originally framed. Two chapters, two opcodes, two focused TBs, qbert progresses from 12 → 26,960 retires + clears the entire ALU portion of the prolog. The runner is doing exactly what it's supposed to do — surface the next concrete blocker, chapter by chapter.

Ch273 is the first non-opcode blocker. It still fits the "one-question-one-chapter" pattern but now the surface is "what should the kernel return for this syscall?" instead of "what does this opcode do?".