Reliability: slow ≠ failed — SW nav timeout, slow-boot telemetry, de-bot stats

Root cause of the intermittent white screen: the shell HTML is no-cache
(cf-cache-status: DYNAMIC), so every page-open does a synchronous round-trip
to the residential origin before any pixel renders — and the SW's network-first
navigation only fell back to the cached shell on REJECTION, never on slowness.
A stalled fetch meant staring at white with a perfectly good shell in cache.
The boot seatbelt couldn't see it either: it lives inside the HTML that hadn't
arrived yet, so slow boots left no telemetry.

- service-worker: race navigation fetch vs 2.5s grace timer. Network wins →
  fresh HTML as before; timer/5xx/failure → cached shell instantly, network
  response still refreshes the cache in the background. Safe due to the 14-day
  immutable-chunk grace window. Caps the white screen at ~2.5s for repeat
  visitors on any network.
- app.html: beacon `boot-slow: Nms (html Nms) on 4g` when mount takes >4s —
  the "white screen, then it loaded" glitches finally leave a trace, with
  HTML-arrival timing to separate slow-origin from slow-JS.
- admin: bot UAs (HeadlessChrome/bot/spider/crawl/…) excluded from the
  headline "Load errors today" count — throttled crawlers trip the 10s boot
  check routinely (the one recorded error was HeadlessChrome on X11, not a
  phone). Bots stay visible in the list, tagged + dimmed.

Tests: telemetry test extended for bot flag + filtered counts. 223 pytest +
11 vitest green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-11 19:23:33 -04:00
parent 90da4be083
commit 628cc5722c
6 changed files with 84 additions and 17 deletions
+9
View File
@@ -418,7 +418,16 @@ def test_client_error_telemetry(tmp_path, monkeypatch):
rows = tc.get("/api/admin/client-errors").json()
assert len(rows) == 1 and rows[0]["reason"] == "boot-timeout" and rows[0]["path"] == "/play"
assert rows[0]["user_agent"] # captured from the request header
assert rows[0]["bot"] is False
assert tc.get("/api/admin/stats").json()["client_errors"]["today"] == 1
# A throttled crawler tripping the beacon must NOT inflate the headline count,
# but stays visible (tagged) in the list.
anon.post("/api/client-error", json={"reason": "boot-timeout", "path": "/"},
headers={"user-agent": "Mozilla/5.0 (X11; Linux x86_64) HeadlessChrome/138.0 Safari/537.36"})
rows = tc.get("/api/admin/client-errors").json()
assert len(rows) == 2 and rows[0]["bot"] is True
stats = tc.get("/api/admin/stats").json()["client_errors"]
assert stats["today"] == 1 and stats["window"] == 1 # bot excluded from both
def test_wordsearch_theme_admin(tmp_path, monkeypatch):