Files
retroDE_ps2/docs/ch272_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

162 lines
7.0 KiB
Markdown

# Ch272 closeout — DADDU implemented; qbert clears the prolog ALU work, hits SYSCALL #60
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_halted` — qbert ran past DADDU cleanly and **executed
`SYSCALL` at PC 0x00100070** (= `SYSCALL #60`, `EndOfHeap`,
the first kernel call in the standard PS2 crt0 prolog).
That frames Ch273.
## Numbers
| Metric | Ch270 (init) | Post-Ch271 (SQ) | **Post-Ch272 (DADDU)** |
|-----------------------|---------------|------------------|-------------------------|
| qbert retire_count | 12 | 26,958 | **26,960** |
| Verdict | first_unsupported_opcode | first_unsupported_opcode | **`elf_halted`** (new) |
| Blocker PC | 0x00100024 | 0x00100068 | 0x00100070 |
| Blocker instr / kind | 0x7C400000 (SQ) | 0x0080E02D (DADDU) | 0x0000000C (**SYSCALL**) |
The retire delta from Ch271 → Ch272 is small (+2) because the
DADDU we implemented is at PC 0x00100068, immediately followed by
`addiu $v1, $0, 0x3C` (the syscall number) and `syscall`. The
core retires the DADDU + the ADDIU, then halts on the SYSCALL.
The chain of next syscalls (61, 100) is queued up at
0x0010008C / 0x0010009C.
## What landed
### RTL — 4 surgical edits in `ee_core_stub.sv`
1. `localparam logic [5:0] FUNC_DADDU = 6'h2D` alongside FUNC_ADDU.
2. `is_daddu` logic decl + `assign is_daddu = is_special && (func == FUNC_DADDU)`.
3. Added `is_daddu` to the `is_rtype_alu` group.
4. Added `is_daddu` to the `(is_add || is_addu)` arm of
`rtype_alu_wb` — same low-32-bit add, no overflow trap.
Upper 32 bits of the 64-bit DADDU are silently dropped, exactly
matching how ADDU already behaves in this stub. Documented in
the RTL comment.
### Focused TB — `tb_ee_core_daddu`
Three cases per Codex's spec:
1. **Normal add**: `daddu $t0, $a0, $a1` with `$a0=5, $a1=3`
`$t0 = 8`.
2. **Move case (exact qbert encoding)**: builds the literal
`0x0080E02D` via `enc_rtype()` and **asserts the produced
word equals 0x0080E02D** before installing it — so a future
regression to the encoder helper trips loudly here. Then
`daddu $gp, $a0, $zero` with `$a0=5``$gp = 5`.
3. **Wraparound**: `daddu $t3, $a2, $a2` with `$a2 = 0x80000000`
`$t3 = 0` (low 32 bits wrap). No overflow trap. Post-halt,
`trap_events == 0` confirms.
Belt-and-braces hierarchical register peeks after halt for
$t0/$gp/$t3 so a future BNE-chain regression can't silently
pass with wrong values.
Result: `retired=17 halt=1 trap=0 pc=0xbfc00138 errors=0 PASS`.
Final PC at the PASS syscall slot.
### Makefile + regression
- `tb_ee_core_daddu` target.
- Added to both PHONY list and `run:` master.
- Regression bumps 159 → 160.
## qbert disassembly around the new blocker (PC 0x00100070)
Decoded from the qbert.elf file (`python3 -c "..." with struct.unpack`):
```
0x00100060: 0x3C080010 lui $t0, 0x0010
0x00100064: 0x25080188 addiu $t0, $t0, 0x0188 ; $t0 = 0x00100188 ($gp seed?)
0x00100068: 0x0080E02D daddu $gp, $a0, $0 ; Ch272 — $gp <- $a0
0x0010006C: 0x2403003C addiu $v1, $0, 0x003C ; $v1 = 60 = EndOfHeap
0x00100070: 0x0000000C syscall ; <-- CURRENT BLOCKER
0x00100074: 0x0040E82D daddu $sp, $v0, $0 ; $sp <- $v0 (heap-end addr)
0x00100078: 0x2403003D addiu $v1, $0, 0x003D ; $v1 = 61 = InitMainThread
0x0010007C: 0x3C040014 lui $a0, 0x0014
0x00100080: 0x2484B6E8 addiu $a0, $a0, -0x4918 ; $a0 = 0x0013B6E8
0x00100084: 0x3C050000 lui $a1, 0x0000
0x00100088: 0x24A5FFFF addiu $a1, $a1, -1 ; $a1 = -1 (default stack size)
0x0010008C: 0x0000000C syscall ; SYSCALL #61
0x00100090: 0x00000000 nop
0x00100094: 0x24030064 addiu $v1, $0, 0x0064 ; $v1 = 100 = FlushCache
0x00100098: 0x0000202D daddu $a0, $0, $0 ; $a0 = 0
0x0010009C: 0x0000000C syscall ; SYSCALL #100
```
This is **textbook PS2 crt0 init**:
1. `EndOfHeap()` returns the end of the heap; result becomes `$sp`.
2. `InitMainThread(stack_addr=0x0013B6E8, stack_size=-1, gp, priority)` initializes the main thread; result presumably also touches `$sp` or returns success.
3. `FlushCache(0)` flushes the instruction cache.
If we don't model these, qbert can't even reach `main()`.
## Recommendation for Codex's Ch273
The next blocker is **SYSCALL**, not an opcode. Three Ch273 framings:
**(A) Minimal "kernel-stub" SYSCALL dispatch.** Replace the
current "halt on any non-Ch199 syscall" with a small case
statement keyed on `$v1`. For the three qbert needs immediately:
| `$v1` | name | minimum needed |
|-------|----------------|--------------------------------------------------------------------------|
| 0x3C | EndOfHeap | return `$v0 = 0x001E0000` (or any plausible end-of-RAM); advance PC; RFE |
| 0x3D | InitMainThread | return `$v0 = $a0` (or `$a0+$a1`; "stack-base" pattern); advance PC; RFE |
| 0x64 | FlushCache | return `$v0 = 0` (no model'd cache); advance PC; RFE |
Each case is "set $v0, RFE back to EPC+4." Unhandled syscalls
fall through to the existing halt (so we still find the next
real blocker).
**(B) "Generic-return" SYSCALL.** Make EVERY SYSCALL (other
than the Ch199 special case) just set `$v0 = 0` and RFE. Even
faster to land, but a syscall that EXPECTS a non-zero return
(like `EndOfHeap` returning the heap-end address) would
silently misbehave — `$sp` would become 0, and the next LW
would AdES-trap or write to garbage. Probably wrong choice.
**(C) Full PS2 EE kernel-call dispatcher.** Hundreds of
syscalls (`InitMainThread`, `CreateThread`, `WaitSema`,
`SifSetReg`, `GsPutIMR`, ...). Out of scope for one chapter.
**My read: (A).** Three syscalls, three case arms, three
focused TB checks. Same incremental-growth pattern as Ch271/272
but at the system-call level instead of the opcode level.
The three values returned (EndOfHeap, InitMainThread,
FlushCache) need to be plausible for qbert's downstream code
to work. `EndOfHeap` returning 0x001E0000 (1.875 MiB) keeps the
stack below the 2 MiB EE-RAM ceiling our TB allocates. The
exact return values for `InitMainThread` can probably be
"return what would be sensible" — Codex can pick.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 4 surgical edits (~6 LOC total).
- `sim/tb/integration/tb_ee_core_daddu.sv` — new focused TB.
- `sim/Makefile``tb_ee_core_daddu` target + both regression
lists.
## Regression
In flight; expected 160/160 (was 159, +1 for tb_ee_core_daddu).
## Pattern-summary
Ch271 + Ch272 = the opcode-by-opcode growth track Codex
originally framed. Two chapters, two opcodes, two focused TBs,
qbert progresses from 12 → 26,960 retires + clears the entire
ALU portion of the prolog. **The runner is doing exactly what
it's supposed to do** — surface the next concrete blocker,
chapter by chapter.
Ch273 is the first non-opcode blocker. It still fits the
"one-question-one-chapter" pattern but now the surface is
"what should the kernel return for this syscall?" instead of
"what does this opcode do?".