Files
retroDE_ps2/docs/ch284_closeout.md
T
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

118 lines
5.0 KiB
Markdown

# Ch284 closeout — LD (Load Doubleword); next blocker is syscall $v1=0x40
**Status:** Closed. **Verdict from re-running qbert.elf:**
`elf_first_unhandled_syscall (pc=0x00111D24 $v1=0x40 (=64))`. qbert
got past the function-epilogue `ld $ra, 0($sp)` at PC 0x00113378
plus 23 more instructions, then hit a SYSCALL whose `$v1=64` isn't
in Ch273's HLE dispatcher (which handles 0x3C / 0x3D / 0x64).
retire_count: 27,067 → **27,091** (+24).
## What landed
LD as the structural read-side of SD. The same `sq_beat` counter that
SD reuses (terminal beat = 1) now drives LD; the same beat-addressed
`map_rd_addr = ea + sq_beat*4` already in place for LQ also serves
LD. Beat 0 captures `mem[ea+0]` into `gpr128[rt][31:0]` and mirrors
to `regfile[rt]`; beat 1 captures `mem[ea+4]` into
`gpr128[rt][63:32]`. `gpr128[rt][127:64]` is left untouched (LD only
loads doubleword; the upper 64 of $rt are architecturally preserved
on R5900).
## RTL — surgical edits in `ee_core_stub.sv`
1. `localparam OP_LD = 6'h37` alongside OP_SD.
2. `logic ... is_ld;` decl + `is_ld = (opcode == OP_LD)` decode.
3. `is_dword_access = is_sd || is_ld` — picks up the existing
8-byte alignment fault path. AdEL is emitted for misaligned LD
(since `is_align_store` stays SD-only).
4. `is_nop_class` exclusion adds `!is_ld`.
5. `map_rd_addr` beat-stepping condition broadened from `is_lq` to
`(is_lq || is_ld)`.
6. Dispatch arm: when `is_ld`, set `sq_beat <= 0` then go to
`S_MEM_REQ` (parallel to LQ).
7. `S_MEM_WAIT` multi-beat branch generalized from "LQ only" to
"LQ || LD" with a `terminal_beat` local: `is_lq ? 3 : 1`. Both
ops share the same lane-capture case statement.
Five RTL touchpoints — purely structural reuse of the Ch283 gpr128
+ Ch271 sq_beat machinery.
## Focused TB — `tb_ee_core_ld.sv`
- **Case 2 (round-trip, runs first):** SD $ra(=0xABCD1234), 0($v0).
LD $t2, 0($v0). Verify regfile[$t2]=0xABCD1234 and
gpr128[$t2][63:32]=0 (SD beat 1 wrote 0).
- **Case 1 (exact qbert encoding, runs LAST so $ra holds the LD
result):** $sp set to 0x80000400; RAM pre-poked with
`(0xAABBCCDD, 0x11223344)` at ea/ea+4. Encoder asserts
`enc_i(OP_LD, 29, 31, 0) === 0xDFBF0000` (matches qbert's exact
PC 0x00113378 instruction). LD executes; in-program BNE compares
$ra to 0xAABBCCDD; post-halt peeks confirm
`gpr128[$ra][31:0] = 0xAABBCCDD` and `gpr128[$ra][63:32] = 0x11223344`.
(Initial draft of the TB mis-decoded 0xDFBF0000 as `ld $ra, 0($ra)`;
the encoder-output assertion caught the mistake immediately — the
same pattern that caught Ch278 PCPYLD's mis-decode. The correct
encoding is `ld $ra, 0($sp)` — function epilogue restoring $ra from
the stack frame.)
Result: `retired=20 halt=1 trap=0 errors=0 PASS`.
## Makefile + regression
- `tb_ee_core_ld` target.
- Added to both PHONY list and `run:` master list.
- Regression: 171 → **172**.
## qbert progression
| Chapter | Blocker | qbert retire_count |
|---------|---------|---------------------|
| Post-Ch282 (PAND) | PCPYUD at 0x00112CA0 | 27,024 |
| Post-Ch283 (PCPYUD + gpr128) | LD at 0x00113378 | 27,067 |
| **Post-Ch284 (LD)** | **SYSCALL $v1=0x40 at 0x00111D24** | **27,091** |
qbert is now executing through function returns. The next blocker is
**syscall #64** with `$a0 = 0x001DFFC0` (looks like a heap-top
address — possibly a memory-management or thread-context call) and
`$a1 = 0x0011C326`. Ch285 framing: add the 0x40 case to Ch273's
syscall HLE dispatcher (mirror the existing EndOfHeap / InitMainThread
/ FlushCache pattern). Open question for Codex: what is syscall 64?
The standard PS2 kernel syscall table is well-documented; Codex can
identify the exact service and the right stub-return semantics.
## Files changed
- `rtl/ee/ee_core_stub.sv` — 7 surgical edits (decode, alignment,
dispatch, multi-beat S_MEM_WAIT generalization).
- `sim/tb/integration/tb_ee_core_ld.sv` — new focused TB.
- `sim/Makefile` — target + both regression lists.
## Pattern review (14 chapters)
| Ch | Blocker | Edits | Pattern |
|-----|--------------|-------|---------|
| 271 | SQ | 5 | NEW 4-beat write |
| 272 | DADDU | 4 | NEW ALU-low-32 |
| 273 | SYSCALL HLE | 2 | NEW gated dispatcher |
| 274 | BEQL | 6 | NEW branch+squash |
| 275 | SD | 7 | REUSE SQ counter |
| 276 | DSLL | 4 | REUSE DADDU |
| 277 | BNEL | 6 | REUSE BEQL squash |
| 278 | PCPYLD | 4 | NEW MMI narrow-decode |
| 279 | LQ | 5 | REUSE LW path |
| 280 | PSUBB | 5 | REUSE MMI narrow (byte-SIMD new) |
| 281 | PNOR | 5 | REUSE MMI narrow + NOR arm |
| 282 | PAND | 5 | REUSE MMI narrow + AND arm |
| 283 | PCPYUD + gpr128 | architectural | NEW 128-bit shadow |
| **284** | **LD** | **7** | **REUSE Ch283 multi-beat path** |
Ch283's "one-time architectural investment" already paying off:
LD landed by extending the LQ/SQ/SD multi-beat machinery, not by
inventing new infrastructure. Future doubleword/multi-beat ops will
follow the same pattern.
## Regression
**172/172 PASS** (was 171/171 in Ch283).