ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
162 lines
7.0 KiB
Markdown
162 lines
7.0 KiB
Markdown
# Ch272 closeout — DADDU implemented; qbert clears the prolog ALU work, hits SYSCALL #60
|
|
|
|
**Status:** Closed. **Verdict from re-running qbert.elf:**
|
|
`elf_halted` — qbert ran past DADDU cleanly and **executed
|
|
`SYSCALL` at PC 0x00100070** (= `SYSCALL #60`, `EndOfHeap`,
|
|
the first kernel call in the standard PS2 crt0 prolog).
|
|
That frames Ch273.
|
|
|
|
## Numbers
|
|
|
|
| Metric | Ch270 (init) | Post-Ch271 (SQ) | **Post-Ch272 (DADDU)** |
|
|
|-----------------------|---------------|------------------|-------------------------|
|
|
| qbert retire_count | 12 | 26,958 | **26,960** |
|
|
| Verdict | first_unsupported_opcode | first_unsupported_opcode | **`elf_halted`** (new) |
|
|
| Blocker PC | 0x00100024 | 0x00100068 | 0x00100070 |
|
|
| Blocker instr / kind | 0x7C400000 (SQ) | 0x0080E02D (DADDU) | 0x0000000C (**SYSCALL**) |
|
|
|
|
The retire delta from Ch271 → Ch272 is small (+2) because the
|
|
DADDU we implemented is at PC 0x00100068, immediately followed by
|
|
`addiu $v1, $0, 0x3C` (the syscall number) and `syscall`. The
|
|
core retires the DADDU + the ADDIU, then halts on the SYSCALL.
|
|
The chain of next syscalls (61, 100) is queued up at
|
|
0x0010008C / 0x0010009C.
|
|
|
|
## What landed
|
|
|
|
### RTL — 4 surgical edits in `ee_core_stub.sv`
|
|
|
|
1. `localparam logic [5:0] FUNC_DADDU = 6'h2D` alongside FUNC_ADDU.
|
|
2. `is_daddu` logic decl + `assign is_daddu = is_special && (func == FUNC_DADDU)`.
|
|
3. Added `is_daddu` to the `is_rtype_alu` group.
|
|
4. Added `is_daddu` to the `(is_add || is_addu)` arm of
|
|
`rtype_alu_wb` — same low-32-bit add, no overflow trap.
|
|
|
|
Upper 32 bits of the 64-bit DADDU are silently dropped, exactly
|
|
matching how ADDU already behaves in this stub. Documented in
|
|
the RTL comment.
|
|
|
|
### Focused TB — `tb_ee_core_daddu`
|
|
|
|
Three cases per Codex's spec:
|
|
|
|
1. **Normal add**: `daddu $t0, $a0, $a1` with `$a0=5, $a1=3` →
|
|
`$t0 = 8`.
|
|
2. **Move case (exact qbert encoding)**: builds the literal
|
|
`0x0080E02D` via `enc_rtype()` and **asserts the produced
|
|
word equals 0x0080E02D** before installing it — so a future
|
|
regression to the encoder helper trips loudly here. Then
|
|
`daddu $gp, $a0, $zero` with `$a0=5` → `$gp = 5`.
|
|
3. **Wraparound**: `daddu $t3, $a2, $a2` with `$a2 = 0x80000000`
|
|
→ `$t3 = 0` (low 32 bits wrap). No overflow trap. Post-halt,
|
|
`trap_events == 0` confirms.
|
|
|
|
Belt-and-braces hierarchical register peeks after halt for
|
|
$t0/$gp/$t3 so a future BNE-chain regression can't silently
|
|
pass with wrong values.
|
|
|
|
Result: `retired=17 halt=1 trap=0 pc=0xbfc00138 errors=0 PASS`.
|
|
Final PC at the PASS syscall slot.
|
|
|
|
### Makefile + regression
|
|
|
|
- `tb_ee_core_daddu` target.
|
|
- Added to both PHONY list and `run:` master.
|
|
- Regression bumps 159 → 160.
|
|
|
|
## qbert disassembly around the new blocker (PC 0x00100070)
|
|
|
|
Decoded from the qbert.elf file (`python3 -c "..." with struct.unpack`):
|
|
|
|
```
|
|
0x00100060: 0x3C080010 lui $t0, 0x0010
|
|
0x00100064: 0x25080188 addiu $t0, $t0, 0x0188 ; $t0 = 0x00100188 ($gp seed?)
|
|
0x00100068: 0x0080E02D daddu $gp, $a0, $0 ; Ch272 — $gp <- $a0
|
|
0x0010006C: 0x2403003C addiu $v1, $0, 0x003C ; $v1 = 60 = EndOfHeap
|
|
0x00100070: 0x0000000C syscall ; <-- CURRENT BLOCKER
|
|
0x00100074: 0x0040E82D daddu $sp, $v0, $0 ; $sp <- $v0 (heap-end addr)
|
|
0x00100078: 0x2403003D addiu $v1, $0, 0x003D ; $v1 = 61 = InitMainThread
|
|
0x0010007C: 0x3C040014 lui $a0, 0x0014
|
|
0x00100080: 0x2484B6E8 addiu $a0, $a0, -0x4918 ; $a0 = 0x0013B6E8
|
|
0x00100084: 0x3C050000 lui $a1, 0x0000
|
|
0x00100088: 0x24A5FFFF addiu $a1, $a1, -1 ; $a1 = -1 (default stack size)
|
|
0x0010008C: 0x0000000C syscall ; SYSCALL #61
|
|
0x00100090: 0x00000000 nop
|
|
0x00100094: 0x24030064 addiu $v1, $0, 0x0064 ; $v1 = 100 = FlushCache
|
|
0x00100098: 0x0000202D daddu $a0, $0, $0 ; $a0 = 0
|
|
0x0010009C: 0x0000000C syscall ; SYSCALL #100
|
|
```
|
|
|
|
This is **textbook PS2 crt0 init**:
|
|
|
|
1. `EndOfHeap()` returns the end of the heap; result becomes `$sp`.
|
|
2. `InitMainThread(stack_addr=0x0013B6E8, stack_size=-1, gp, priority)` initializes the main thread; result presumably also touches `$sp` or returns success.
|
|
3. `FlushCache(0)` flushes the instruction cache.
|
|
|
|
If we don't model these, qbert can't even reach `main()`.
|
|
|
|
## Recommendation for Codex's Ch273
|
|
|
|
The next blocker is **SYSCALL**, not an opcode. Three Ch273 framings:
|
|
|
|
**(A) Minimal "kernel-stub" SYSCALL dispatch.** Replace the
|
|
current "halt on any non-Ch199 syscall" with a small case
|
|
statement keyed on `$v1`. For the three qbert needs immediately:
|
|
|
|
| `$v1` | name | minimum needed |
|
|
|-------|----------------|--------------------------------------------------------------------------|
|
|
| 0x3C | EndOfHeap | return `$v0 = 0x001E0000` (or any plausible end-of-RAM); advance PC; RFE |
|
|
| 0x3D | InitMainThread | return `$v0 = $a0` (or `$a0+$a1`; "stack-base" pattern); advance PC; RFE |
|
|
| 0x64 | FlushCache | return `$v0 = 0` (no model'd cache); advance PC; RFE |
|
|
|
|
Each case is "set $v0, RFE back to EPC+4." Unhandled syscalls
|
|
fall through to the existing halt (so we still find the next
|
|
real blocker).
|
|
|
|
**(B) "Generic-return" SYSCALL.** Make EVERY SYSCALL (other
|
|
than the Ch199 special case) just set `$v0 = 0` and RFE. Even
|
|
faster to land, but a syscall that EXPECTS a non-zero return
|
|
(like `EndOfHeap` returning the heap-end address) would
|
|
silently misbehave — `$sp` would become 0, and the next LW
|
|
would AdES-trap or write to garbage. Probably wrong choice.
|
|
|
|
**(C) Full PS2 EE kernel-call dispatcher.** Hundreds of
|
|
syscalls (`InitMainThread`, `CreateThread`, `WaitSema`,
|
|
`SifSetReg`, `GsPutIMR`, ...). Out of scope for one chapter.
|
|
|
|
**My read: (A).** Three syscalls, three case arms, three
|
|
focused TB checks. Same incremental-growth pattern as Ch271/272
|
|
but at the system-call level instead of the opcode level.
|
|
|
|
The three values returned (EndOfHeap, InitMainThread,
|
|
FlushCache) need to be plausible for qbert's downstream code
|
|
to work. `EndOfHeap` returning 0x001E0000 (1.875 MiB) keeps the
|
|
stack below the 2 MiB EE-RAM ceiling our TB allocates. The
|
|
exact return values for `InitMainThread` can probably be
|
|
"return what would be sensible" — Codex can pick.
|
|
|
|
## Files changed
|
|
|
|
- `rtl/ee/ee_core_stub.sv` — 4 surgical edits (~6 LOC total).
|
|
- `sim/tb/integration/tb_ee_core_daddu.sv` — new focused TB.
|
|
- `sim/Makefile` — `tb_ee_core_daddu` target + both regression
|
|
lists.
|
|
|
|
## Regression
|
|
|
|
In flight; expected 160/160 (was 159, +1 for tb_ee_core_daddu).
|
|
|
|
## Pattern-summary
|
|
|
|
Ch271 + Ch272 = the opcode-by-opcode growth track Codex
|
|
originally framed. Two chapters, two opcodes, two focused TBs,
|
|
qbert progresses from 12 → 26,960 retires + clears the entire
|
|
ALU portion of the prolog. **The runner is doing exactly what
|
|
it's supposed to do** — surface the next concrete blocker,
|
|
chapter by chapter.
|
|
|
|
Ch273 is the first non-opcode blocker. It still fits the
|
|
"one-question-one-chapter" pattern but now the surface is
|
|
"what should the kernel return for this syscall?" instead of
|
|
"what does this opcode do?".
|