Files
thejayman77 ec82764bef Initial commit: retroDE_ps2 — first-of-its-kind PS2 GS FPGA core (DE25-Nano / Agilex 5)
RTL (GS rasterizer, EE core stub, platform bridge, LPDDR4B path), sim regression
(272 TBs), docs, and tooling. Copyrighted PS2 content (BIOS, game code, GS dumps,
and all dump-derived textures/traces) is excluded via .gitignore and stays local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 20:10:50 -04:00

30 KiB
Raw Permalink Blame History

top_psmct32_raster_demo on DE25-Nano (current as of Ch165)

This directory is the DE25-Nano synthesis project for rtl/top/de25_nano_psmct32_raster_demo_top.sv (the Ch149 board wrapper, retargeted in Ch159 to instantiate the BRAM-backed top_psmct32_raster_demo_bram instead of the Ch146 legacy top_psmct32_raster_demo). The directory grew across these chapters:

Chapter Added
Ch148 files.f (RTL filelist, Ch123 dep tree only) + this README.
Ch149 (RTL only — de25_nano_psmct32_raster_demo_top.sv lives in rtl/top/).
Ch150 de25_nano_psmct32_raster_demo_top.qsf + .sdc — minimal Quartus scaffold.
Ch151 (RTL only — de25_nano_pll_stub.sv + PLL/lock-gated-reset rework of board top).
Ch152 build_quartus.sh + parse_reports.py — first real Quartus compile (fit FAILED).
Ch159 Board-top swap to top_psmct32_raster_demo_bram (Ch155-Ch158 BRAM/normalize/PCRTC stack); fit + STA succeed. baseline_ch152/ snapshots the prior fit-failed reports for diff.
Ch160 SDC retarget from 50 MHz → 30 MHz (down-clock profile); build_quartus.sh runs quartus_asm on clean STA → first .sof bitstream produced. baseline_ch159/ snapshots the 50 MHz timing-miss reports.
Ch161 Real Quartus IOPLL .ip commit (50 MHz refclk → 30 MHz outclk_0) + USE_PLL_IP=1 macro + ip/ symlink in build_quartus.sh; SDC restored to 20 ns CLOCK2_50 (the IP's auto-generated SDC handles the post-PLL clock). The .sof now genuinely runs at 30 MHz on hardware. baseline_ch160/ snapshots the SDC-profile-only state.
Ch162 STRIP_HW_DIVIDER parameter on ee_core_stub removes the auto-inferred 32-bit DIVU divider from the synth path on hardware builds (default off in sim preserves every existing TB). Fmax 30.74 → 33.6 MHz (+9.4 %); fit ALMs 892, registers 734. New critical path: PCRTC magnification divider in gs_pcrtc_stub (hwin_rel / hmag_factor). baseline_ch161/ snapshots the pre-strip state.
Ch163 STRIP_PCRTC_MAG_DIV parameter on gs_pcrtc_stub strips the PCRTC magnification dividers (constant divisor 1 when MAGH=MAGV=0). Fmax 33.6 → 81.83 MHz at 30 MHz target (+143 %), then PLL .ip retuned 30 MHz → 50 MHz outclk_0; STA closes at 50 MHz with +7.500 ns setup slack and Fmax 80.0 MHz. First .sof that genuinely runs at 50 MHz on the DE25-Nano. baseline_ch162/ + baseline_ch163_30mhz/ snapshots the milestones.
Ch164 First video-PHY shim — pins HDMI_TX_CLK + HDMI_TX_D[23:0] + HS/VS/DE on the DE25-Nano ADV7513 (pinout sourced from retroDE_nes). Wrapper drives them combinationally from VIDEO_* (R in MSBs of HDMI_TX_D). HDMI_TX_CLK = design_clk = 50 MHz post-PLL. ADV7513 I²C wake-up FSM still deferred (Ch165) so a real monitor stays dark — but pixels are now off-chip on the HDMI connector. STA stays clean (+7.536 ns slack); pins 17 → 45. baseline_ch163_50mhz/ snapshots the pre-shim state.
Ch165 ADV7513 I²C wake-up FSM (Terasic-derived; ported from retroDE_splash/rtl/platform/). Adds 4 control pins (HDMI_I2C_SCL/SDA open-drain bus + HDMI_TX_INT interrupt + HDMI_MCLK audio reference); LED[3] = ~hdmi_init_done. The 38-entry LUT walks ADV7513 register writes (power-up + HPD override + AVI InfoFrame + HDMI mode select), turning the chip from standby into "transmitting RGB on the HDMI port". Pins 45 → 49; STA setup slack +7.198 ns; .sof clean. First .sof that should drive a real HDMI monitor. baseline_ch164/ snapshots the pre-wake-up state.

Ch163 strip-PCRTC-divider + 50 MHz close state

The full journey from Ch152's fit failure to a real 50 MHz bitstream:

Metric Ch152 (50 MHz) Ch159 (50 MHz) Ch161 (real PLL @ 30 MHz) Ch162 (strip EE div, 30 MHz) Ch163 (strip both, 50 MHz)
Fit status FAILED Successful (30,364) Successful (30,898) Successful (30,006) Successful (27,543)
Fit RAM blocks 6 14 14 14 14
Fit PLLs 0 0 1 1 1 (50 MHz outclk0)
Setup slack worst (design domain) (did not run) 6.950 ns +0.565 ns @ 30 MHz +3.567 ns @ 30 MHz +7.500 ns @ 50 MHz
Fmax (design domain) (did not run) 37.11 MHz 30.74 MHz 33.6 MHz 80.0 MHz
.sof produced (skipped) (skipped) yes — 30 MHz on hardware yes — Fmax 33.6 MHz / 30 MHz target yes — 50 MHz on hardware

Ch163 lands in two stages:

  • Stage A: with STRIP_PCRTC_MAG_DIV=1 on the board top but the PLL still at 30 MHz output, Fmax jumps from 33.6 MHz (Ch162) to 81.83 MHz at the 30 MHz target — +143 %, well past 50 MHz.
  • Stage B: retune pll.ip from 30 MHz → 50 MHz output (gui_output_clock_frequency0 = 50.0; gui_output_clock_frequency_ps0 = 20000.0), quartus_ipgenerate regenerates the .qip / synth files. CLOCK2_50 stays at the physical 50 MHz period in the SDC; the IOPLL's auto-generated SDC declares the new outclk_0. Quartus rebuild → STA closes at 50 MHz with +7.500 ns of setup slack and Fmax 80.0 MHz.

The Stage B .sof is the first bitstream that genuinely runs at 50 MHz on the DE25-Nano. The Ch161 PLL hardware-real contract carries through; the IOPLL takes 50 MHz CLOCK2_50 in and emits 50 MHz outclk_0, so the chip-internal clock distribution still goes through the dedicated IOPLL clock network even at the 1:1 frequency relation.

Ch162 strip-divider state (vs Ch161 / Ch160 / Ch159 / Ch152 baselines)

Metric Ch152 (50 MHz) Ch159 (50 MHz) Ch160 (30 MHz profile) Ch161 (real PLL @ 30 MHz) Ch162 (real PLL + strip divider)
Fit status FAILED Successful (30,364) Successful (31,056) Successful (30,898) Successful (30,006)
Fit registers 121,176 39,085 37,381 37,352 36,618
Fit RAM blocks 6 14 14 14 14
Fit PLLs 0 0 0 1 1
Setup slack worst (design domain) (did not run) 6.950 ns +0.805 ns +0.565 ns @ iopll_0_outclk0 +3.567 ns @ iopll_0_outclk0
Fmax (design domain) (did not run) 37.11 MHz 30.74 MHz 30.74 MHz 33.6 MHz (+9.4 %)
.sof produced (skipped) (skipped) yes (profile only) yes — 30 MHz on hardware yes — 33.6 MHz Fmax / 30 MHz target

Stripping the EE-core 32-bit DIVU divider freed +3 ns of setup margin and 892 ALMs / 734 registers. The new STA worst path moves to the PCRTC magnification divider in gs_pcrtc_stub.sv (vram_x_unshift = hwin_rel / hmag_factor and the matching y form). Ch163 gates that divider via STRIP_PCRTC_MAG_DIV and retunes the PLL to 50 MHz — see the Ch163 section above for the milestone numbers.

Ch161 real-PLL state (vs Ch160 / Ch159 / Ch152 baselines)

Metric Ch152 (50 MHz) Ch159 (50 MHz) Ch160 (30 MHz profile) Ch161 (real PLL @ 30 MHz)
Fit status FAILED Successful (30,364) Successful (31,056) Successful (30,898)
Fit registers 121,176 39,085 37,381 37,352
Fit RAM blocks 6 14 14 14
Fit PLLs 0 0 0 1 (real IOPLL)
Setup slack worst (design domain) (did not run) 6.950 ns +0.805 ns +0.565 ns @ iopll_0_outclk0
Fmax (design domain) (did not run) 37.11 MHz 30.74 MHz 30.74 MHz
.sof produced (skipped) (skipped) yes (profile only) yes — 30 MHz on hardware

The Ch161 .sof is the first bitstream that genuinely runs at the constrained frequency on the real DE25-Nano: the IOPLL takes the 50 MHz CLOCK2_50 input and divides to 30 MHz inside the chip, so the entire design downstream of u_pll.outclk_0 operates at 30 MHz. Critical path was the EE core's auto- generated 64-bit DIVU divider — closed in Ch162 via STRIP_HW_DIVIDER. New critical path (Ch162 onward): the PCRTC magnification divider in gs_pcrtc_stub.sv (hwin_rel / hmag_factor); see the Ch162 section above.

Ch160 down-clock + first .sof state (vs Ch159 / Ch152 baselines)

Metric Ch152 (50 MHz) Ch159 (50 MHz) Ch160 (30 MHz)
Fit status FAILED (155k / 331 %) Successful (30,364 / 65 %) Successful (31,056 / 66 %)
STA setup slack worst (did not run) 6.950 ns +0.805 ns
Fmax (did not run) 37.11 MHz 30.74 MHz
quartus_asm (.sof produced) (skipped — fit failed) (skipped — STA missed) Successful

The Ch160 SDC profile retargets CLOCK2_50 from 20.000 ns (50 MHz) to 33.333 ns (30 MHz) so the fitter has positive slack on every report. quartus_asm now runs on every clean build, so a real .sof bitstream lands in output_files/de25_nano_psmct32_raster_demo_top.sof. Worst-case path is the EE core's auto-generated 64-bit divider (actually the Ch43 DIVU divider, dead code in the PSMCT32 demo since the bootlet doesn't execute DIVU); closed in Ch162 via STRIP_HW_DIVIDER. Programming the Ch160 .sof onto a real board where CLOCK2_50 is still wired straight through gives a 50 MHz chip clock that may setup-violate the divider path; Ch161+'s PLL-IP commit fixes that.

Ch159 fit-success state (vs Ch152 baseline)

make quartus_compile runs the full syn/fit/sta flow. With the Ch159 BRAM-backed top:

Metric Ch152 (vram_stub) Ch159 (vram_bram_stub)
Synthesis status Successful Successful
Synthesis ALMs estimate 199,103 / 46,800 (425% over) 22,704 / 46,800 (49%)
Synthesis registers 101,457 36,008
Fit status FAILED (155k / 331% over) Successful (30,364 / 65%)
Fit registers 121,176 39,085
Fit RAM blocks 6 / 358 14 / 358
STA status DID NOT RUN Successful (12 warnings)
Setup slack worst (CLOCK2_50) n/a 6.950 ns
Fmax n/a 37.11 MHz

The headline numbers: synth ALMs 88.6 %, fit registers 67.7 %, +8 RAM blocks (the vram_bram_stub 8 KiB dual-port shape that Ch154 exp_c forecast). Fit and STA both run through to completion. Setup slack is negative at the 50 MHz CLOCK2_50 constraint — Fmax is currently 37.11 MHz, so timing closure (PLL down-clock or critical-path pipelining) is the Ch160+ surface; Ch159 deliberately stops at "fits and reaches STA."

The full chapter narrative lives in docs/contracts/gif_gs.md under "Board-top swap to BRAM wrapper + Quartus fit recovery (Ch159)". The Ch152 baseline reports are preserved under baseline_ch152/ for diff and audit.

Ch150's QSF/SDC pin clock + reset + LED I/O for a real Quartus build, but VIDEO_R/G/B/HSYNC/VSYNC/DE are intentionally left as virtual pins (VIRTUAL_PIN ON in the QSF) — they will not toggle real package pins until the PHY shim chapter (Ch151+) maps them to a video output. Without virtualization, Quartus would auto-place an unassigned top-level output on an arbitrary package pin.

Files

File Purpose
files.f Synthesis filelist — Ch123 dep tree + Ch149 board wrapper.
de25_nano_psmct32_raster_demo_top.qsf Ch150 — Quartus pin assignments + IO standards + macros.
de25_nano_psmct32_raster_demo_top.sdc Ch150 — clock + reset-sync + IO false-path constraints.
README.md This file.

The Ch150 .qsf + .sdc together are PHY-light: clock / reset / LED pins are pinned and constrained, but video pins (VIDEO_R/G/B/HSYNC/VSYNC/DE) are not pin-assigned. The PHY shim chapter (Ch151+) maps them to a real connector.

To validate the Quartus scaffold (.qsf + .sdc + filelist + fixtures) without launching Quartus:

make -C sim top_psmct32_raster_demo_quartus_scaffold_check

Top module

Set Quartus's top-level entity to de25_nano_psmct32_raster_demo_top (in rtl/top/de25_nano_psmct32_raster_demo_top.sv). This is the Ch149 board-shaped wrapper, retargeted in Ch159 to instantiate top_psmct32_raster_demo_bram (the BRAM-backed inner module from Ch155+, carrying every Ch155-Ch158 fix). It adds the DE25-Nano-specific plumbing — Terasic-canonical port names, reset-release sequencer, core_go pulse generator, and active-low LED status mapping.

The legacy Ch146 inner module top_psmct32_raster_demo is kept on the project file list for back-compat with sim TBs that still target it; on Quartus only the actually-instantiated top is elaborated. Do NOT set the top entity to either of the inner modules (top_psmct32_raster_demo or top_psmct32_raster_demo_bram) — that bypasses every board adapter and exposes the inner module's clk / rst_n / core_go / r/g/b/hsync/vsync/de / status outputs directly, which is useful for sim and lint but not for an FPGA build.

Ch149 board wrapper ports (de25_nano_psmct32_raster_demo_top)

Port Direction Width Role
CLOCK0_50 / CLOCK1_50 / CLOCK2_50 input 1 ea. DE25-Nano 50 MHz oscillators. Only CLOCK2_50 is used.
KEY[1:0] input 2 Active-LOW push buttons. KEY[0] = soft reset.
SW[3:0] input 4 DIP switches; placeholder, unused.
LED[7:0] output 8 Active-LOW. LED[2:0] = status, LED[7:3] = OFF.
VIDEO_R/G/B output 8 ea. Raw 8-bit RGB; PHY shim deferred to next chapter.
VIDEO_HSYNC/VSYNC/DE output 1 ea. Raw video timing; PHY shim deferred.

LED mapping (active-low; signal asserted lights its LED):

LED Polarity-corrected source
LED[0] ~core_halt
LED[1] ~dma_done_seen
LED[2] ~frame_seen
LED[7:3] tied HIGH (OFF)

Inner module ports (top_psmct32_raster_demo_bram, Ch155+)

For sim / lint use only. The board wrapper above adapts these to the DE25-Nano signal names; a custom integration could re-adapt them for a different board. The legacy Ch146 inner module top_psmct32_raster_demo exposes the same external port shape (just with a vram_stub-backed implementation instead of vram_bram_stub).

Port Direction Width Role
clk input 1 Single clock domain (see "Clock plan" below).
rst_n input 1 Active-low synchronous reset (see "Reset plan").
core_go input 1 Pulsed high to start the EE bootlet (see "core_go strategy").
r/g/b output 8 ea. 8-bit RGB scanout (PCRTC).
hsync/vsync/de output 1 ea. Standard video timing.
core_halt output 1 High once the EE has SYSCALL'd.
dma_done_seen output 1 Sticky: DMAC ch2 fired EV_DMA_DONE.
frame_seen output 1 Sticky: PCRTC end-of-frame fired ≥1 frame.

Required preprocessor macros

Both must be set on the synthesis tool (NOT as module generics — they're \define` macros per the Ch146 iverilog-12 string- parameter forwarding workaround):

Macro Value
TOP_PSMCT32_RASTER_DEMO_BIOS_IMAGE_FILE Absolute path to bios.mem
TOP_PSMCT32_RASTER_DEMO_PAYLOAD_IMAGE_FILE Absolute path to payload.mem

For Quartus: see the set_global_assignment -name VERILOG_MACRO example in docs/contracts/gif_gs.md under "Synthesis-facing macros".

Required .mem fixtures

Both files must exist before synthesis (Quartus's $readmemh runs at elaboration; if the file is missing you get a silent zero-init that produces no payload at all):

File Size Produced by
sim/data/top_psmct32_raster_demo/bios.mem 1024 32-bit words bake.py
sim/data/top_psmct32_raster_demo/payload.mem 256 128-bit qwords bake.py

To produce them:

make -C sim top_psmct32_raster_demo_mem

To verify both files exist + match expected sizes (and the synth filelist resolves cleanly):

make -C sim top_psmct32_raster_demo_synth_check

DE25-Nano board topology (current as of Ch151)

Live wiring (no longer "assumptions" — the board top implements it):

Clock plan

  • DE25-Nano's CLOCK2_50 (50 MHz, PIN_BF23) is the only board oscillator the design uses. CLOCK0_50 / CLOCK1_50 are pin- assigned in the QSF for completeness but the wrapper ties them off internally.
  • Ch151: CLOCK2_50 → de25_nano_pll_stub (sim default, pass-through) or Quartus IOPLL pll (synth, when USE_PLL_IP is defined) → design_clk. Everything in the design (EE core, GIF, GS, PCRTC, core_go sequencer, all three status sticky-bits) is clocked on design_clk.
  • The PCRTC's H/V counters are parameter-driven (default 16×8) and do NOT model real CRTC timing — a first hardware build can run at any sane clock. The PLL chapter (Ch152+) will commit a real outclk_0 frequency.
  • No clock gating, no derived clocks, no CDCs in the design.

Reset plan

  • Ch151: the reset bridge async-asserts on (ninit_done | ~pll_locked) and synchronously deasserts on design_clk through a 2-stage shift register. Both conditions must clear before the design leaves reset:
    1. ninit_done falls when FPGA initialization completes (real Terasic reset_release IP under \ifdef USE_TERASIC_RESET_RELEASE_IP`, else inline 16-cycle stub).
    2. pll_locked rises when the PLL acquires lock (32-cycle stub warm-up; real IP timing depends on the configured output frequency).
  • KEY[0] (active-low push button) is sampled synchronously through the same 2-stage register — async-assert is reserved for the FPGA-init / PLL-lock signals.
  • The Ch146 wrapper's rst_n input is active-low and sampled inside always_ff @(posedge clk) if (!rst_n) — i.e. synchronous reset despite active-low polarity.

core_go strategy

ee_core_stub.go_i is sampled level-sensitively in the core's S_IDLE state (see rtl/ee/ee_core_stub.sv:812-813if (go_i) state <= S_IFETCH_REQ). There is no edge synchronizer; the core enters S_IDLE after reset and transitions to S_IFETCH_REQ on the first cycle go_i is high. Two equivalent hardware paths:

  1. Tie core_go high at the top level. After rst_n deasserts, the core enters S_IDLE, sees go_i == 1, and immediately starts fetching from 0xBFC0_0000. Simplest wiring.
  2. Reset-release sequencer: a small board-level FSM that waits N cycles after rst_n deasserts (giving the PLL + BRAM init time to settle) then drives core_go high. Held- high or one-cycle pulse both work — only the rising-into- S_IDLE matters.

Option 2 is the recommended hardware path because the deliberate post-reset wait avoids fetching during the BRAM $readmemh settle window. A board-level synchronizer also gives a clean place to debounce the user push-button if core_go is also user-controllable.

Video output path

  • DE25-Nano video output is TBD in this chapter — the next chapter will wire the top's r/g/b/hsync/vsync/de to the board's chosen video PHY. Likely candidates:
    • HDMI via on-board ADV7513 (or equivalent). Needs an I²C config sequence, which adds a small bring-up state machine.
    • VGA via on-board ADV7123 / passive DAC. Simpler — just pin-wire the 8-bit RGB + sync signals.
    • Direct GPIO for an external PMOD video adapter (simplest path for a first bring-up).
  • The Ch146 wrapper's PCRTC outputs are 8-bit RGB at one cycle per pixel — compatible with all three options after a clock- rate adjustment.

LEDs / status

The Ch146 wrapper exposes three sticky status bits the DE25- Nano user LEDs can show:

LED Signal Meaning
User LED 0 core_halt EE has SYSCALL'd → bootlet finished.
User LED 1 dma_done_seen DMAC ch2 completed transfer.
User LED 2 frame_seen PCRTC has scanned out ≥1 frame.

All three latch high within ~10 ms of rst_n deasserting on a typical 50 MHz clock. The actual order is frame_seen first (the PCRTC starts scanning empty frames as soon as reset deasserts), then core_halt (after the EE bootlet runs to SYSCALL), then dma_done_seen (after the DMAC channel-2 transfer completes). The Ch146 wrapper's frame_seen is a "PCRTC alive" indicator — it doesn't gate on whether the frame holds rendered content. If you see core_halt low, the bootlet hung; if dma_done_seen is low, the DMAC didn't deliver the GIF payload; if frame_seen is low, the PCRTC isn't scanning out (PLL not locked or reset stuck asserted).

What's still NOT in this chapter

Landed in Ch165Ch167:

  • Accelerated I²C bring-up TBsim/tb/top/tb_hdmi_i2c_wake_smoke.sv instantiates I2C_HDMI_Config with CLK_Freq=2 / I2C_Freq=1 so the LUT walks in microseconds, asserts (1) the 38-entry walk reaches LUT_SIZE-1, (2) READY rises, (3) HDMI_TX_INT retriggers the walk, (4) SDA never resolves to 'x (no driver conflict), (5) Ch166 NACK watchdog stays low on the healthy bus and rises (sticky) when mI2C_ACK is forced HIGH, and (6, Ch167) every one of the 38 transactions on the wire matches the FSM-intent payload byte-for-byte. The byte-sequence lock uses a pullup + phase-aware slave-ACK bus model so released SDA bits resolve to 1; a decoder samples SDA on each SCL rising edge between START and STOP and assembles each transaction as a 24-bit {dev_addr, reg, data} tuple, which is then compared against u_dut.mI2C_DATA[23:0] snapshots taken on mI2C_GO rising edges. Every transaction's dev_addr is asserted to be 8'h72 (ADV7513 write address). At the production divider the LUT walk takes ~125 ms (controller-clock period ~100 µs × 33 phases/byte × 38 bytes), so observing it inside the 5 ms board TB is impractical.

Deferred to Ch168+:

  • Proper set_output_delay constraints for HDMI_TX_* once the ADV7513 setup/hold window is locked from the bring-up datasheet pass — replaces Ch164's set_false_path -to.
  • Make the rendered pattern bigger than Ch123's 16×8 SPRITE so there's something visible to admire on a real screen.
  • xfer-side T4 coverage TB (open from Ch157+).
  • useg_shadow_mem BRAM-shape forensics.

(Ch161 made the PLL hardware-real at 30 MHz. Ch162 retired the EE-core hardware-divider critical path via STRIP_HW_DIVIDER. Ch163 retired the PCRTC magnification divider via STRIP_PCRTC_MAG_DIV and retuned the PLL to 50 MHz outclk_0 — the .sof now genuinely runs at 50 MHz on the DE25-Nano. Ch164 added the HDMI pin shim — pixels reach the DE25-Nano HDMI connector pins. Ch165 ported the ADV7513 wake-up FSM from Terasic's reference design — the chip is now configured to transmit RGB on the HDMI port, so the .sof should drive a real monitor for the first time.)

The point of Ch149 was that the design is now board-shaped — DE25-Nano signal names, Terasic-style reset-release sequencer, and active-low LED status mapping. Ch161 made the PLL hardware-real; Ch162+ makes the timing closure hardware-real at 50 MHz.

Ch149 additions

Artifact Purpose
rtl/top/de25_nano_psmct32_raster_demo_top.sv DE25-Nano board wrapper: CLOCK0/1/2_50 + KEY[1:0] + SW[3:0] + LED[7:0] + raw video pins.
sim/tb/top/tb_de25_nano_psmct32_raster_demo_top.sv Smoke TB: drives CLOCK2_50 + KEY[0] release, asserts core_go pulses exactly once, all 3 status LEDs latch, VIDEO_DE rises.

The board top instantiates top_psmct32_raster_demo_bram (Ch155+ BRAM-backed inner module, retargeted in Ch159 from the Ch146 legacy top_psmct32_raster_demo) and adds:

  • ninit_done source. Default-off \ifdef USE_TERASIC_RESET_RELEASE_IPswaps in Terasic'sreset_releaseIP (fromDE25_Nano_ResourceCD/Demonstration/FPGA/Board_Info_RTL/reset_release/`). When the macro is undefined (sim default), an inline 16-cycle counter mimics the IP's "high until BRAM init completes" shape.
  • Reset synchronizer. KEY[0] (active-low button) and ninit_done feed an async-assert/sync-deassert 2-stage shift register clocked by CLOCK2_50, mirroring the retroDE_nes pattern. Output is the design's core_rst_n.
  • core_go sequencer. After core_rst_n deasserts, waits 16 cycles then drives core_go high for one cycle. Matches the recommended hardware path documented above (the EE core's go_i is sampled level-sensitively in S_IDLE, so a single pulse is sufficient).
  • LED polarity. DE25-Nano LEDs are active-LOW (LED HIGH = OFF); the three status outputs are inverted before driving the pins. LED[7:3] are tied HIGH (OFF).