ec82764bef
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression (272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps, and all dump-derived textures/traces) is excluded via .gitignore and stays local. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
rtl/ee
Emotion Engine-side RTL. Matches docs/contracts/ee.md.
Current contents
ee_fetch_stub.sv— minimal sequential fetcher from the early waves. On reset, PC = BIOS reset vector (0xBFC00000). Each cycle whileenableis high, issues a read at PC and advances PC += 4. No decode, no branches, no exceptions. EmitsEV_RESETonce at reset exit andEV_IFETCHfor each returned response. Retained for the Milestone-B golden-reference comparison.ee_core_stub.sv— first real EE instruction-decoding core. Structural mirror ofiop_core_stub: same multi-cycle FSM, same R3000 subset (LUI/ORI/ADDIU/LW/SW/BEQ/BNE/J/JR/NOP/SYSCALL/MFC0/MTC0/ RFE), same branch-delay-slot discipline, same minimal COP0 + exception entry, sameSTRICT_UNSUPPORTEDtrap gate. Separate file from the IOP core because the EE is fundamentally an R5900 and will eventually need 64-bit registers, COP1/COP2, VU-side plumbing the IOP will never grow. Emits traces underSUBSYS_EE(vs.SUBSYS_IOPfor the IOP core).
Current status
The EE side has a first real execution primitive (ee_core_stub) and
runs hand-assembled bootstraps from the shared BIOS ROM window. The
IOP side is ahead — it has DMAC ch9 data path, real interrupt
exception entry, BIOS reset, and strict-mode BIOS smoke bring-up. The
EE side's next natural growth (in roughly this order) is:
CPU-side LW/SW to EE RAM.Done (tb_ee_core_memops). EE memory map now routes CPU 32-bit reads and writes into the 128-bitee_ram_stubwith lane-select on reads and byte-enable masking on writes. CPU wins over DMAC on same-cycle RAM-read collisions and over the SIF egress bridge on RAM-write collisions.EE DMAC register access from the core.Done (tb_ee_core_dmac,tb_ee_core_dmac_poll). Chapter 3 added the write-side: EE map decodes a CPU write atphys[28:12] == 17'h1_000A(0x1000_A000-0x1000_AFFF, ch2 GIF) and routes it through a newee_dmac_ch2_wr_*port intodmac_reg_stub. The EE core programs MADR/QWC/CHCR via SW; the DMAC fetches from EE RAM through the map'sdmac_rd_*port and completes with real DMA_START/BEAT/DONE events. Chapter 4 added the read-side:dmac_reg_stubgrew areg_rd_*surface (CHCR/MADR/QWC/TADR + DONE_COUNT monotonic counter at 0x40), and the EE map forwards CPU reads in the same DMAC window via a newee_dmac_ch2_rd_*port. The core polls CHCR.start until the DMAC clears it, then reads DONE_COUNT and writes the witness to RAM — no more fixed NOP padding.EE INTC + exception entry.Done (tb_ee_core_dmac_intc). EE map now decodes the EE INTC register window atphys[28:12] == 17'h1_000F(0x1000_F000/0x1000_F010 for STAT/MASK) and carries both directions through newee_intc_{wr,rd}_*ports. Anintc_stubinstance on the EE side latchesdmac_reg_stub.irq_completion_oand drivesee_core_stub.cpu_irq(which feedscause_ip[2]). Bootstrap enables interrupts (Status = IEc | IM[2]), programs INTC_MASK, kicks the DMAC, and waits on DONE_COUNT; a RAM-resident ISR atEXC_VECTOR=0x80acks INTC_STAT via W1C, MFC0 EPC, JR + RFE. Core takes exactly one exception + one RFE, strictly after DMA_DONE.EE-side strict BIOS smoke.Done (tb_ee_core_bios_smoke). EE mirror of the IOP smoke harness:ee_core_stubinstantiated withSTRICT_UNSUPPORTED=1'b1; synthetic CI bootstrap ends in anAND(SPECIAL func 0x24) that the core doesn't decode, sotrap_o/trap_pc_o/trap_instr_ofire and halt the core loudly. Swap in a real BIOS viamake tb_ee_core_bios_smoke BIOS=/path/to/bios.hex(plusarg-driven$readmemhintou_bios.mem, same convention as the IOP target). Output line includes an inline mnemonic decoder so the iteration loop (drop in BIOS, read output, add the missing opcode) works without a separate disassembler.- Widen the core opcode set, driven by real-BIOS smoke. The
iteration loop is live: drop a BIOS dump in via
make tb_ee_core_bios_smoke BIOS=..., readtrap_instr+mnemonicfrom the output, implement the op, re-run. Progress so far (each step landed a dedicated coverage TB and kept full_checks green):- SLTI / SLTIU (I-type compare, opcodes 0x0A / 0x0B). First
real-BIOS trip at 0xBFC0_0008. TB:
tb_ee_core_slti. - ADDI (opcode 0x08). Implemented as ADDIU (no overflow
trap — real BIOS doesn't emit ADDI where overflow could
actually happen). TB:
tb_ee_core_addi. - ANDI (opcode 0x0C, zero-extended). TB:
tb_ee_core_andi. - AND / OR / XOR / NOR (SPECIAL R-type logic family, func
0x24-0x27; destination = rd). Batched because they share the
R-type ALU plumbing. TB:
tb_ee_core_rtype_logic. - SB (opcode 0x28, byte store with lane broadcast +
one-hot byte-enable on the map write bus). TB:
tb_ee_core_sb. Unlocked a 1500-instruction stretch (retired=180 → 1704). - LB (opcode 0x20, sign-extended byte load via
map_rd_datalane extraction + 24-bit sign-extend inS_MEM_WAIT). TB:tb_ee_core_lb. - JAL (opcode 0x03, jump-and-link; writes
$31 = pc+8). TB:tb_ee_core_jal. - ADDU / SUBU (SPECIAL R-type arith, func 0x21 / 0x23).
Batched, share R-type ALU. TB:
tb_ee_core_rtype_addu. Codex pre-approved the grouping. - SLT / SLTU (SPECIAL R-type compare, func 0x2A / 0x2B).
Batched with the R-type ALU; register-form pair of
SLTI/SLTIU. TB:
tb_ee_core_slt. Unlocked a 5700- instruction stretch (retired=1717 → 7385). - LH / LHU (opcodes 0x21 / 0x25, halfword load with sign-
and zero-extension respectively). Batched — same lane-
extraction plumbing, differ only in fill semantics. Halfword
addressing uses
ea[1](ea[0] must be zero for aligned access). TBs:tb_ee_core_lh,tb_ee_core_lhu(each covers both halfword lanes + the fill discipline for negative high-lane values). Unlocked retired=7385 → 8207. - SLL / SRL / SRA (SPECIAL R-type shifts, func 0x00 /
0x02 / 0x03). Batched per Codex pre-approval. Destination
= rd, operand = rt, shift amount =
shamt(bits [10:6]). SRA uses$signed(rt_val) >>> shamtfor arithmetic right shift (sign fill); SRL usesrt_val >> shamt(zero fill). SLL $0,$0,0 is the canonical NOP encoding and flows through this path harmlessly — the rd_idx=0 writeback guard blocks any phantom write. TB:tb_ee_core_shift(critical probes: SRL vs SRA on the same negative input to catch sign-vs-zero fill bugs). Unlocked a 12,000-instruction stretch (retired=8207 → 20327). - SH (opcode 0x29, halfword store). Store-side mate to
LH/LHU; same lane-broadcast + byte-enable idiom as SB but
at halfword granularity via
ea[1]. 2-of-4 byte-enable (4'b0011for low lane,4'b1100for high lane) preserves the non-addressed halfword. TB:tb_ee_core_sh— two chained probes with register values that have distinctive upper halves (0xCAFE_FACE, 0x1234_5678). If the byte-enable is wrong or the full register leaks into the map_wr_data bus, the preservation check catches it (RAM word ends up 0x5678_FACE after both stores; wrong behavior would corrupt the non-addressed halfword). Unlocked a 56,000- instruction stretch (retired=20327 → 76406) once the RAM-size infra issue was also fixed in the same chapter — see next bullet. - Real-BIOS RAM size (chapter 7.9 infra fix). Before this
chapter,
tb_ee_core_bios_smokeused only 4 KiB of EE RAM — fine for the synthetic CI program (which never writes beyond the first qword), but destructive once the real BIOS copies a large chunk of itself into RAM and jumps there. Addresses beyond 4 KiB silently aliased into the same window, producing 156k "retires" that were actually the core executing a scrambled mix of overwritten bytes, with no trap ever firing because whatever happened to land at the aliased offset decoded to something supported. BumpedEE_RAM_BYTESin the bench to 4 MiB (real PS2 has 32 MiB; 4 MiB covers BIOS init comfortably without ballooning sim memory). After the fix, real-BIOS smoke runs honestly and trapped on JALR at 0xBFC5_29E8. - JALR (SPECIAL func 0x09, register-indirect call). Target
is
rs_val(same path as JR); link address pc+8 is written tord_idx. Unlike JAL's hardcoded$31, JALR's link destination is explicit in the instruction, andrd==0is a valid encoding that suppresses the link write. TB:tb_ee_core_jalr— two probes: canonicaljalr $31, $rs(what the BIOS used) plusjalr $20, $rswith the return viajr $20to prove the rd field is honored and not accidentally hardcoded to $31. Unlocked retired=76406 → 84112 and the BIOS fully jumped into RAM-resident code (next trap_pc is0x0000_060C, a RAM address, not BIOS). - ADD / SUB (SPECIAL R-type, func 0x20 / 0x22). Batched
per Codex's guidance — same pragmatic policy as ADDI vs
ADDIU: this core does not model the Arithmetic Overflow
exception, so ADD behaves as ADDU and SUB behaves as SUBU.
Merged into the existing
rs_val + rt_val/rs_val - rt_valarms ofrtype_alu_wb. TB:tb_ee_core_add_sub— four probes including INT_MAX+1 wrap, which documents the deferred-exception policy (the wrap is the expected outcome, so the TB will fail loudly if overflow trapping ever lands without the TB being updated). - COP0 Count (reg 9) — first machine-state chapter after
the iter-14 transition. Free-running 32-bit counter that
increments every clock and resets to 0. Exposed read-only
through MFC0 $9. MTC0 $9 silently dropped (no reset-to-value
yet; revisit if BIOS depends on it). TB:
tb_ee_core_cop0_count— two probes covering consecutive- MFC0 advance and a canonicalwhile (now < target)poll that must exit. - Enhanced bios_smoke PC sampler with
peek_instr(addr)helper (hierarchical read throughu_bios.mem/u_ee_ram.mem) and a parallelretired_historyarray. Timeout now reports the instruction and retired count at each sample, not just pc. Timeout window bumped 5 ms → 20 ms for BIOS runway. - Sampler pointer snapshots + 80 ms timeout. After the
instruction-aware sampler showed the loop was a linked-list
walk (not a hardware wait), Codex directed "extend timeout
first, then add pointer snapshots only if still stuck".
Timeout bumped 20 ms → 80 ms: retired grew linearly to
2.46 M, still 100% in the same loop (≈350k iterations — way
beyond any plausible BIOS list length). Added
u_core.regfile[5]and[6]hierarchical snapshots at each sample. Finding:$5(sentinel) =0x00000974— plausible low-RAM pointer$6(current) =0xDEADBEEF— the EE map's unmapped- read poison value. The cycle is self-perpetuating:lw $2, 0($6)with$6 = 0xDEADBEEFreads address 0xDEADBEEF, which is unmapped, returning 0xDEADBEEF; thebne $2, $0stays taken forever. The real root cause is an earlier BIOS read from an unmapped address that poisoned a data structure — the traversal followed the poisoned pointer and locked in.
- (next-move call is with Codex: add an unmapped-read tracer to find the first bad address, implement whatever peripheral the BIOS was reading, change the poison value to 0 so the loop exits and exposes further BIOS progress, or something else.)
- Bench-drift note (chapter 7.5): the synthetic BIOS smoke sentinel was originally AND; once AND was added to the R-type ALU, the synthetic test silently stopped tripping and started timing out. Codex caught it; sentinel is now BREAK (SPECIAL func 0x0D). See project memory for the full post-mortem. Lesson: avoid using real opcodes as "unsupported sentinels" in test benches.
- SLTI / SLTIU (I-type compare, opcodes 0x0A / 0x0B). First
real-BIOS trip at 0xBFC0_0008. TB:
Scope boundary
This directory owns EE CPU execution and its immediate coprocessors (COP0 minimum; eventually COP1 FPU and COP2 VU macro mode). It does not own:
- memory map / address decode — that's
rtl/memory/ee_memory_map_stub.sv. - interrupt controller — that's
rtl/intc/(generic; the sameintc_stubmodule already serves the IOP side). - DMAC, VIF/VU, GIF/GS — separate directories.