RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7.0 KiB
Ch272 closeout — DADDU implemented; qbert clears the prolog ALU work, hits SYSCALL #60
Status: Closed. Verdict from re-running qbert.elf:
elf_halted — qbert ran past DADDU cleanly and executed
SYSCALL at PC 0x00100070 (= SYSCALL #60, EndOfHeap,
the first kernel call in the standard PS2 crt0 prolog).
That frames Ch273.
Numbers
| Metric | Ch270 (init) | Post-Ch271 (SQ) | Post-Ch272 (DADDU) |
|---|---|---|---|
| qbert retire_count | 12 | 26,958 | 26,960 |
| Verdict | first_unsupported_opcode | first_unsupported_opcode | elf_halted (new) |
| Blocker PC | 0x00100024 | 0x00100068 | 0x00100070 |
| Blocker instr / kind | 0x7C400000 (SQ) | 0x0080E02D (DADDU) | 0x0000000C (SYSCALL) |
The retire delta from Ch271 → Ch272 is small (+2) because the
DADDU we implemented is at PC 0x00100068, immediately followed by
addiu $v1, $0, 0x3C (the syscall number) and syscall. The
core retires the DADDU + the ADDIU, then halts on the SYSCALL.
The chain of next syscalls (61, 100) is queued up at
0x0010008C / 0x0010009C.
What landed
RTL — 4 surgical edits in ee_core_stub.sv
localparam logic [5:0] FUNC_DADDU = 6'h2Dalongside FUNC_ADDU.is_daddulogic decl +assign is_daddu = is_special && (func == FUNC_DADDU).- Added
is_dadduto theis_rtype_alugroup. - Added
is_dadduto the(is_add || is_addu)arm ofrtype_alu_wb— same low-32-bit add, no overflow trap.
Upper 32 bits of the 64-bit DADDU are silently dropped, exactly matching how ADDU already behaves in this stub. Documented in the RTL comment.
Focused TB — tb_ee_core_daddu
Three cases per Codex's spec:
- Normal add:
daddu $t0, $a0, $a1with$a0=5, $a1=3→$t0 = 8. - Move case (exact qbert encoding): builds the literal
0x0080E02Dviaenc_rtype()and asserts the produced word equals 0x0080E02D before installing it — so a future regression to the encoder helper trips loudly here. Thendaddu $gp, $a0, $zerowith$a0=5→$gp = 5. - Wraparound:
daddu $t3, $a2, $a2with$a2 = 0x80000000→$t3 = 0(low 32 bits wrap). No overflow trap. Post-halt,trap_events == 0confirms.
Belt-and-braces hierarchical register peeks after halt for $t0/$gp/$t3 so a future BNE-chain regression can't silently pass with wrong values.
Result: retired=17 halt=1 trap=0 pc=0xbfc00138 errors=0 PASS.
Final PC at the PASS syscall slot.
Makefile + regression
tb_ee_core_daddutarget.- Added to both PHONY list and
run:master. - Regression bumps 159 → 160.
qbert disassembly around the new blocker (PC 0x00100070)
Decoded from the qbert.elf file (python3 -c "..." with struct.unpack):
0x00100060: 0x3C080010 lui $t0, 0x0010
0x00100064: 0x25080188 addiu $t0, $t0, 0x0188 ; $t0 = 0x00100188 ($gp seed?)
0x00100068: 0x0080E02D daddu $gp, $a0, $0 ; Ch272 — $gp <- $a0
0x0010006C: 0x2403003C addiu $v1, $0, 0x003C ; $v1 = 60 = EndOfHeap
0x00100070: 0x0000000C syscall ; <-- CURRENT BLOCKER
0x00100074: 0x0040E82D daddu $sp, $v0, $0 ; $sp <- $v0 (heap-end addr)
0x00100078: 0x2403003D addiu $v1, $0, 0x003D ; $v1 = 61 = InitMainThread
0x0010007C: 0x3C040014 lui $a0, 0x0014
0x00100080: 0x2484B6E8 addiu $a0, $a0, -0x4918 ; $a0 = 0x0013B6E8
0x00100084: 0x3C050000 lui $a1, 0x0000
0x00100088: 0x24A5FFFF addiu $a1, $a1, -1 ; $a1 = -1 (default stack size)
0x0010008C: 0x0000000C syscall ; SYSCALL #61
0x00100090: 0x00000000 nop
0x00100094: 0x24030064 addiu $v1, $0, 0x0064 ; $v1 = 100 = FlushCache
0x00100098: 0x0000202D daddu $a0, $0, $0 ; $a0 = 0
0x0010009C: 0x0000000C syscall ; SYSCALL #100
This is textbook PS2 crt0 init:
EndOfHeap()returns the end of the heap; result becomes$sp.InitMainThread(stack_addr=0x0013B6E8, stack_size=-1, gp, priority)initializes the main thread; result presumably also touches$spor returns success.FlushCache(0)flushes the instruction cache.
If we don't model these, qbert can't even reach main().
Recommendation for Codex's Ch273
The next blocker is SYSCALL, not an opcode. Three Ch273 framings:
(A) Minimal "kernel-stub" SYSCALL dispatch. Replace the
current "halt on any non-Ch199 syscall" with a small case
statement keyed on $v1. For the three qbert needs immediately:
$v1 |
name | minimum needed |
|---|---|---|
| 0x3C | EndOfHeap | return $v0 = 0x001E0000 (or any plausible end-of-RAM); advance PC; RFE |
| 0x3D | InitMainThread | return $v0 = $a0 (or $a0+$a1; "stack-base" pattern); advance PC; RFE |
| 0x64 | FlushCache | return $v0 = 0 (no model'd cache); advance PC; RFE |
Each case is "set $v0, RFE back to EPC+4." Unhandled syscalls fall through to the existing halt (so we still find the next real blocker).
(B) "Generic-return" SYSCALL. Make EVERY SYSCALL (other
than the Ch199 special case) just set $v0 = 0 and RFE. Even
faster to land, but a syscall that EXPECTS a non-zero return
(like EndOfHeap returning the heap-end address) would
silently misbehave — $sp would become 0, and the next LW
would AdES-trap or write to garbage. Probably wrong choice.
(C) Full PS2 EE kernel-call dispatcher. Hundreds of
syscalls (InitMainThread, CreateThread, WaitSema,
SifSetReg, GsPutIMR, ...). Out of scope for one chapter.
My read: (A). Three syscalls, three case arms, three focused TB checks. Same incremental-growth pattern as Ch271/272 but at the system-call level instead of the opcode level.
The three values returned (EndOfHeap, InitMainThread,
FlushCache) need to be plausible for qbert's downstream code
to work. EndOfHeap returning 0x001E0000 (1.875 MiB) keeps the
stack below the 2 MiB EE-RAM ceiling our TB allocates. The
exact return values for InitMainThread can probably be
"return what would be sensible" — Codex can pick.
Files changed
rtl/ee/ee_core_stub.sv— 4 surgical edits (~6 LOC total).sim/tb/integration/tb_ee_core_daddu.sv— new focused TB.sim/Makefile—tb_ee_core_daddutarget + both regression lists.
Regression
In flight; expected 160/160 (was 159, +1 for tb_ee_core_daddu).
Pattern-summary
Ch271 + Ch272 = the opcode-by-opcode growth track Codex originally framed. Two chapters, two opcodes, two focused TBs, qbert progresses from 12 → 26,960 retires + clears the entire ALU portion of the prolog. The runner is doing exactly what it's supposed to do — surface the next concrete blocker, chapter by chapter.
Ch273 is the first non-opcode blocker. It still fits the "one-question-one-chapter" pattern but now the surface is "what should the kernel return for this syscall?" instead of "what does this opcode do?".