Files
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00
..

rtl/ee

Emotion Engine-side RTL. Matches docs/contracts/ee.md.

Current contents

  • ee_fetch_stub.sv — minimal sequential fetcher from the early waves. On reset, PC = BIOS reset vector (0xBFC00000). Each cycle while enable is high, issues a read at PC and advances PC += 4. No decode, no branches, no exceptions. Emits EV_RESET once at reset exit and EV_IFETCH for each returned response. Retained for the Milestone-B golden-reference comparison.
  • ee_core_stub.svfirst real EE instruction-decoding core. Structural mirror of iop_core_stub: same multi-cycle FSM, same R3000 subset (LUI/ORI/ADDIU/LW/SW/BEQ/BNE/J/JR/NOP/SYSCALL/MFC0/MTC0/ RFE), same branch-delay-slot discipline, same minimal COP0 + exception entry, same STRICT_UNSUPPORTED trap gate. Separate file from the IOP core because the EE is fundamentally an R5900 and will eventually need 64-bit registers, COP1/COP2, VU-side plumbing the IOP will never grow. Emits traces under SUBSYS_EE (vs. SUBSYS_IOP for the IOP core).

Current status

The EE side has a first real execution primitive (ee_core_stub) and runs hand-assembled bootstraps from the shared BIOS ROM window. The IOP side is ahead — it has DMAC ch9 data path, real interrupt exception entry, BIOS reset, and strict-mode BIOS smoke bring-up. The EE side's next natural growth (in roughly this order) is:

  1. CPU-side LW/SW to EE RAM. Done (tb_ee_core_memops). EE memory map now routes CPU 32-bit reads and writes into the 128-bit ee_ram_stub with lane-select on reads and byte-enable masking on writes. CPU wins over DMAC on same-cycle RAM-read collisions and over the SIF egress bridge on RAM-write collisions.
  2. EE DMAC register access from the core. Done (tb_ee_core_dmac, tb_ee_core_dmac_poll). Chapter 3 added the write-side: EE map decodes a CPU write at phys[28:12] == 17'h1_000A (0x1000_A000-0x1000_AFFF, ch2 GIF) and routes it through a new ee_dmac_ch2_wr_* port into dmac_reg_stub. The EE core programs MADR/QWC/CHCR via SW; the DMAC fetches from EE RAM through the map's dmac_rd_* port and completes with real DMA_START/BEAT/DONE events. Chapter 4 added the read-side: dmac_reg_stub grew a reg_rd_* surface (CHCR/MADR/QWC/TADR + DONE_COUNT monotonic counter at 0x40), and the EE map forwards CPU reads in the same DMAC window via a new ee_dmac_ch2_rd_* port. The core polls CHCR.start until the DMAC clears it, then reads DONE_COUNT and writes the witness to RAM — no more fixed NOP padding.
  3. EE INTC + exception entry. Done (tb_ee_core_dmac_intc). EE map now decodes the EE INTC register window at phys[28:12] == 17'h1_000F (0x1000_F000/0x1000_F010 for STAT/MASK) and carries both directions through new ee_intc_{wr,rd}_* ports. An intc_stub instance on the EE side latches dmac_reg_stub.irq_completion_o and drives ee_core_stub.cpu_irq (which feeds cause_ip[2]). Bootstrap enables interrupts (Status = IEc | IM[2]), programs INTC_MASK, kicks the DMAC, and waits on DONE_COUNT; a RAM-resident ISR at EXC_VECTOR=0x80 acks INTC_STAT via W1C, MFC0 EPC, JR + RFE. Core takes exactly one exception + one RFE, strictly after DMA_DONE.
  4. EE-side strict BIOS smoke. Done (tb_ee_core_bios_smoke). EE mirror of the IOP smoke harness: ee_core_stub instantiated with STRICT_UNSUPPORTED=1'b1; synthetic CI bootstrap ends in an AND (SPECIAL func 0x24) that the core doesn't decode, so trap_o/trap_pc_o/trap_instr_o fire and halt the core loudly. Swap in a real BIOS via make tb_ee_core_bios_smoke BIOS=/path/to/bios.hex (plusarg-driven $readmemh into u_bios.mem, same convention as the IOP target). Output line includes an inline mnemonic decoder so the iteration loop (drop in BIOS, read output, add the missing opcode) works without a separate disassembler.
  5. Widen the core opcode set, driven by real-BIOS smoke. The iteration loop is live: drop a BIOS dump in via make tb_ee_core_bios_smoke BIOS=..., read trap_instr + mnemonic from the output, implement the op, re-run. Progress so far (each step landed a dedicated coverage TB and kept full_checks green):
    • SLTI / SLTIU (I-type compare, opcodes 0x0A / 0x0B). First real-BIOS trip at 0xBFC0_0008. TB: tb_ee_core_slti.
    • ADDI (opcode 0x08). Implemented as ADDIU (no overflow trap — real BIOS doesn't emit ADDI where overflow could actually happen). TB: tb_ee_core_addi.
    • ANDI (opcode 0x0C, zero-extended). TB: tb_ee_core_andi.
    • AND / OR / XOR / NOR (SPECIAL R-type logic family, func 0x24-0x27; destination = rd). Batched because they share the R-type ALU plumbing. TB: tb_ee_core_rtype_logic.
    • SB (opcode 0x28, byte store with lane broadcast + one-hot byte-enable on the map write bus). TB: tb_ee_core_sb. Unlocked a 1500-instruction stretch (retired=180 → 1704).
    • LB (opcode 0x20, sign-extended byte load via map_rd_data lane extraction + 24-bit sign-extend in S_MEM_WAIT). TB: tb_ee_core_lb.
    • JAL (opcode 0x03, jump-and-link; writes $31 = pc+8). TB: tb_ee_core_jal.
    • ADDU / SUBU (SPECIAL R-type arith, func 0x21 / 0x23). Batched, share R-type ALU. TB: tb_ee_core_rtype_addu. Codex pre-approved the grouping.
    • SLT / SLTU (SPECIAL R-type compare, func 0x2A / 0x2B). Batched with the R-type ALU; register-form pair of SLTI/SLTIU. TB: tb_ee_core_slt. Unlocked a 5700- instruction stretch (retired=1717 → 7385).
    • LH / LHU (opcodes 0x21 / 0x25, halfword load with sign- and zero-extension respectively). Batched — same lane- extraction plumbing, differ only in fill semantics. Halfword addressing uses ea[1] (ea[0] must be zero for aligned access). TBs: tb_ee_core_lh, tb_ee_core_lhu (each covers both halfword lanes + the fill discipline for negative high-lane values). Unlocked retired=7385 → 8207.
    • SLL / SRL / SRA (SPECIAL R-type shifts, func 0x00 / 0x02 / 0x03). Batched per Codex pre-approval. Destination = rd, operand = rt, shift amount = shamt (bits [10:6]). SRA uses $signed(rt_val) >>> shamt for arithmetic right shift (sign fill); SRL uses rt_val >> shamt (zero fill). SLL $0,$0,0 is the canonical NOP encoding and flows through this path harmlessly — the rd_idx=0 writeback guard blocks any phantom write. TB: tb_ee_core_shift (critical probes: SRL vs SRA on the same negative input to catch sign-vs-zero fill bugs). Unlocked a 12,000-instruction stretch (retired=8207 → 20327).
    • SH (opcode 0x29, halfword store). Store-side mate to LH/LHU; same lane-broadcast + byte-enable idiom as SB but at halfword granularity via ea[1]. 2-of-4 byte-enable (4'b0011 for low lane, 4'b1100 for high lane) preserves the non-addressed halfword. TB: tb_ee_core_sh — two chained probes with register values that have distinctive upper halves (0xCAFE_FACE, 0x1234_5678). If the byte-enable is wrong or the full register leaks into the map_wr_data bus, the preservation check catches it (RAM word ends up 0x5678_FACE after both stores; wrong behavior would corrupt the non-addressed halfword). Unlocked a 56,000- instruction stretch (retired=20327 → 76406) once the RAM-size infra issue was also fixed in the same chapter — see next bullet.
    • Real-BIOS RAM size (chapter 7.9 infra fix). Before this chapter, tb_ee_core_bios_smoke used only 4 KiB of EE RAM — fine for the synthetic CI program (which never writes beyond the first qword), but destructive once the real BIOS copies a large chunk of itself into RAM and jumps there. Addresses beyond 4 KiB silently aliased into the same window, producing 156k "retires" that were actually the core executing a scrambled mix of overwritten bytes, with no trap ever firing because whatever happened to land at the aliased offset decoded to something supported. Bumped EE_RAM_BYTES in the bench to 4 MiB (real PS2 has 32 MiB; 4 MiB covers BIOS init comfortably without ballooning sim memory). After the fix, real-BIOS smoke runs honestly and trapped on JALR at 0xBFC5_29E8.
    • JALR (SPECIAL func 0x09, register-indirect call). Target is rs_val (same path as JR); link address pc+8 is written to rd_idx. Unlike JAL's hardcoded $31, JALR's link destination is explicit in the instruction, and rd==0 is a valid encoding that suppresses the link write. TB: tb_ee_core_jalr — two probes: canonical jalr $31, $rs (what the BIOS used) plus jalr $20, $rs with the return via jr $20 to prove the rd field is honored and not accidentally hardcoded to $31. Unlocked retired=76406 → 84112 and the BIOS fully jumped into RAM-resident code (next trap_pc is 0x0000_060C, a RAM address, not BIOS).
    • ADD / SUB (SPECIAL R-type, func 0x20 / 0x22). Batched per Codex's guidance — same pragmatic policy as ADDI vs ADDIU: this core does not model the Arithmetic Overflow exception, so ADD behaves as ADDU and SUB behaves as SUBU. Merged into the existing rs_val + rt_val / rs_val - rt_val arms of rtype_alu_wb. TB: tb_ee_core_add_sub — four probes including INT_MAX+1 wrap, which documents the deferred-exception policy (the wrap is the expected outcome, so the TB will fail loudly if overflow trapping ever lands without the TB being updated).
    • COP0 Count (reg 9) — first machine-state chapter after the iter-14 transition. Free-running 32-bit counter that increments every clock and resets to 0. Exposed read-only through MFC0 $9. MTC0 $9 silently dropped (no reset-to-value yet; revisit if BIOS depends on it). TB: tb_ee_core_cop0_count — two probes covering consecutive- MFC0 advance and a canonical while (now < target) poll that must exit.
    • Enhanced bios_smoke PC sampler with peek_instr(addr) helper (hierarchical read through u_bios.mem / u_ee_ram.mem) and a parallel retired_history array. Timeout now reports the instruction and retired count at each sample, not just pc. Timeout window bumped 5 ms → 20 ms for BIOS runway.
    • Sampler pointer snapshots + 80 ms timeout. After the instruction-aware sampler showed the loop was a linked-list walk (not a hardware wait), Codex directed "extend timeout first, then add pointer snapshots only if still stuck". Timeout bumped 20 ms → 80 ms: retired grew linearly to 2.46 M, still 100% in the same loop (≈350k iterations — way beyond any plausible BIOS list length). Added u_core.regfile[5] and [6] hierarchical snapshots at each sample. Finding:
      • $5 (sentinel) = 0x00000974 — plausible low-RAM pointer
      • $6 (current) = 0xDEADBEEFthe EE map's unmapped- read poison value. The cycle is self-perpetuating: lw $2, 0($6) with $6 = 0xDEADBEEF reads address 0xDEADBEEF, which is unmapped, returning 0xDEADBEEF; the bne $2, $0 stays taken forever. The real root cause is an earlier BIOS read from an unmapped address that poisoned a data structure — the traversal followed the poisoned pointer and locked in.
    • (next-move call is with Codex: add an unmapped-read tracer to find the first bad address, implement whatever peripheral the BIOS was reading, change the poison value to 0 so the loop exits and exposes further BIOS progress, or something else.)
    • Bench-drift note (chapter 7.5): the synthetic BIOS smoke sentinel was originally AND; once AND was added to the R-type ALU, the synthetic test silently stopped tripping and started timing out. Codex caught it; sentinel is now BREAK (SPECIAL func 0x0D). See project memory for the full post-mortem. Lesson: avoid using real opcodes as "unsupported sentinels" in test benches.

Scope boundary

This directory owns EE CPU execution and its immediate coprocessors (COP0 minimum; eventually COP1 FPU and COP2 VU macro mode). It does not own:

  • memory map / address decode — that's rtl/memory/ee_memory_map_stub.sv.
  • interrupt controller — that's rtl/intc/ (generic; the same intc_stub module already serves the IOP side).
  • DMAC, VIF/VU, GIF/GS — separate directories.