- newsimg.purge_source(): when a source leaves 'cache' (permission revoked / re-classified),
the admin image-policy endpoint now deletes that source's re-hosted copies immediately,
rather than leaving them inaccessible-but-on-disk. Endpoint returns {purged}.
- Admin "Engaged readers" carries a warm-up note: tracking began 2026-06-30, so low
rolling windows are partly warm-up, not all bots (compare d7 after a week, the window
after its full span). Guards against misreading "6 engaged vs 135 visits" as 129 bots.
Tests: purge_source removes only the target source's copies; endpoint reports purged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Admin now shows two numbers:
- Recorded visits: the existing raw count (one daily 'visit' beacon; still includes
UA-spoofing bots that slip past the UA filter).
- Engaged readers: distinct visitor-day with DELIBERATE activity — either the new
gesture-gated 'engaged' beacon (fires once/day only after ~8s visible AND a real
scroll/pointer/key/touch) or a deliberate action (source_click, full_story, share,
replace_used, paywall_replace, not_today/less_like_this/hide_topic, game start/
complete/share). Explicitly EXCLUDES auto-fired visit/summary_viewed/open, replace_none,
and game *_arrival (a share-loop landing, not engagement).
armEngaged() in analytics.js (wired in the global layout) + a mirrored vanilla-JS beacon
on the server-rendered /a/<id> share pages. 'engaged' added to the event allowlist and
fired with article_id=0 so the uniqueness constraint dedups it per day. queries.admin_stats
gains engaged_today/d7/d30. Bots are doubly excluded (UA filter at the beacon + the
gesture gate). Tests cover the metric (engaged + deliberate counted; visit/summary/arrival
not). 447 backend + 36 frontend tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fetcher (the two remaining bugs Codex found):
- Real redirects are now followed. _NoRedirect makes urllib RAISE HTTPError on 3xx, so
the old status-branch was dead code (mocked tests masked it). Handle 301/302/303/307/308
HTTPError as redirects (re-validate the destination); classify 4xx≠429 as PERMANENT
(negative-cached), 429/5xx/network as transient. Real-opener redirect + 404/5xx tests.
- The megapixel ceiling is now enforced: explicit `w*h > _MAX_PIXELS` check BEFORE load()
(Pillow only warns at MAX_IMAGE_PIXELS). Test with a lowered ceiling.
Image-rights policy (per Codex + owner decision — only cache what's cleared):
- sources.image_policy: 'cache' (re-host a downscaled copy — license/permission/PD only),
'remote' (hotlink the publisher's image — the conservative DEFAULT), 'none' (no image).
- newsimg.display_url resolves the display URL per policy; applied in Article.from_row so
feed/brief/history return the right URL, and in share.py (og/twitter still reference the
publisher's own image, never re-hosted). warm() + /api/img both gated on 'cache'.
- Frontend uses the server-resolved image_url (reverted the hardcoded /api/img); the
graceful retry covers remote hotlinks too. Admin: per-source image-policy selector +
POST /api/admin/sources/{id}/image-policy. Default 'remote' → nothing re-hosted until
a source is explicitly cleared.
445 backend + 36 frontend tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Blocker fixes for the image cache:
- /api/img/{id} now serves cache HITS ONLY and is restricted to ACCEPTED, CANONICAL
articles. It never fetches — the cycle (newsimg.warm) owns all fetching — so the
public endpoint has no SSRF/worker-exhaustion surface. Dropped 1-year immutable
caching (image_url can change) → public, max-age=86400.
- newsimg._safe_fetch: SSRF-safe (reuses enrich._host_is_public + _NoRedirect, http(s)
only, every redirect hop re-validated, body capped). _FetchError distinguishes
permanent refusals (negative-cached via a .fail marker) from transient errors (retry).
- _encode re-encodes only decoded RASTER images to WebP and REJECTS everything else
(SVG, undecodable, decompression bombs via MAX_IMAGE_PIXELS, pathological dimensions);
originals are never retained. prune() also sweeps stale .fail markers.
- Concurrency: fetching only runs inside the cycle lock; writes stay atomic.
Smaller fixes:
- share.py visible image has onerror→this.remove() (degrade to the text unfurl, no
broken icon when an image isn't cached yet).
- share-page Back follows history only on a SAME-ORIGIN referrer (never bounce to an
external site); menu now honors Escape + resets crossing back to desktop (HubBar parity).
Tests: private host, redirect-to-private, hostile SVG/non-image, transient-vs-permanent
failure, LRU prune, warm (accepted+canonical only, idempotent), cache-only endpoint
(404 on not-cached/unaccepted/duplicate, never fetches), share chrome parity. 441 pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Many modern crawlers (AI scrapers, headless Chrome, link-preview fetchers) run JS and
fire the visit/summary_viewed beacon, inflating "visitors" even though there's no
human discovery channel. Apply queries.is_bot_ua() at /api/events — the same filter
the load-error beacon uses — so honest bot UAs (GPTBot, AhrefsBot, headless Chrome,
python/curl, …) are dropped before recording. Response is identical so a bot can't
detect it. Counts read lower but truer going forward (past rows unchanged). Won't catch
UA-spoofing bots; that needs a heavier heuristic. Tests: bot UAs dropped, real browser
counted; existing event tests send a real UA (default client UA contains "python").
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pages tall enough to scroll showed a ~15px scrollbar; short pages didn't — so the
centered top bar jumped left/right as you navigated. scrollbar-gutter: stable on html
(SPA app.css + the server-rendered share pages) keeps the layout width constant. No-op
on overlay-scrollbar platforms (mobile), which never shifted.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The server-rendered /a/<id> and digest pages predated "HubBar everywhere" and showed
a stripped bar (logo + a bespoke Back pill). They can't run the Svelte component, so
add a hand-kept static replica of HubBar (logo + News/Play/Art nav + account glyph +
mobile burger/drop-panel) plus HubShell's borderless ← Back. A signed-in reader's
avatar paints from the same localStorage cache HubBar uses. /a/<id> now looks like any
detail page (/art, /word). Reusable _top_bar_html/_TOP_BAR_CSS/_TOP_BAR_JS/_back_link
helpers; applied to both share pages. Kept in sync with HubBar.svelte by hand (noted).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stop hotlinking news images from third-party CDNs (the source of the "blank until
you refresh a few times" graphic). New goodnews/newsimg.py caches a downscaled WebP
display copy (≤800px) beside the DB, like art_cache:
- GET/HEAD /api/img/{article_id} — resolves id→image_url (allowlisted to our corpus,
not an open proxy), fetch+cache on first miss, serve local after, immutable headers.
- cycle warms display copies for recent accepted-with-image articles (so the FIRST
view is already local) and prunes to a hard size cap (default 1 GB) by LRU eviction.
Frontend now points at /api/img/<id>: the hub lead, every ArticleCard (feed hero +
cards), and the /a/<id> share page's visible image. og:image/twitter:image stay the
source URL so social crawlers fetch the canonical image directly.
Storage is bounded by construction — over the cap, least-recently-used files are
evicted, so it can't grow without limit regardless of ingest rate. Tests cover
fetch/downscale, cache-hit (no refetch), bad-scheme/non-image rejection, fetch
failure, LRU prune, warm, and the endpoint allowlist.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The load-error log had no way to clear reviewed entries. Add a read_at column to
client_errors and a read/unread model mirroring the feedback inbox:
- GET /api/admin/client-errors?show=unread|read|all (default unread; returns id+read)
- POST /api/admin/client-errors/read-all (mark all unread read)
- POST /api/admin/client-errors/{id}/read {read: bool} (per-row toggle)
Headline stat is now "Unread load errors" (admin_stats.client_errors.unread), so the
red badge clears as you triage. Admin UI: Unread/Read/All tabs, a "Mark all read"
button, and a per-row ✓/↩ toggle; reading an entry drops it from the default view.
14-day auto-prune still bounds the table. Tests cover filter, toggle, mark-all,
404, and gating.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both selectors ordered candidates least-recently-shown, then daily.seeded_order()
ROTATED the whole list and took [0] — an arbitrary date-hashed item, undoing the
ordering. Result: repeats (quote id 2 on 6/28+6/29; word "harmony" on 6/25+6/28),
no guarantee a pool item is shown before it recurs.
Fix: daily.freshest(rows) returns the freshest cohort only — every NEVER-shown
item while any remain, else the oldest-shown group. quote/wotd _candidates use it;
seeded_order now picks deterministically WITHIN that cohort. So every pool item is
featured once before any repeat, then cycles oldest-first. Dropped the unused
_NO_REPEAT_POOL window. Tests: no-repeat-until-exhausted (quote + wotd) + a
freshest() unit test. 428 backend tests green.
(Separate follow-up: expand the QOTD pool from 16 → 90+ vetted public-domain
quotes for a longer no-repeat window.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the logo + brand: the name is upbeatBytes (camelCase). Swept all user-facing
strings — titles/og:site_name/og:title, logo alt text, share pages (share.py),
emails (email_send), classifier prompt (llm), digest/unsubscribe (api), PWA
manifest, game share text, sign-in, the SPA shell + patch-static-heads (play
title) — plus README/publish.sh and the email test fixture. (SMTP From env was
already upbeatBytes.) Domains (upbeatbytes.com) unchanged. 425 BE + 36 FE green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codex's two non-blocking hardening items, folded in before cutover:
- _candidate_articles() now excludes paywalled sources IN-QUERY (before LIMIT 50),
so flagged stories can't consume candidate slots and leave a full brief thin.
Dropped the now-redundant post-fetch filter in build_daily_brief.
- Regressions: history retains a viewed paywalled article; sitemap omits a
paywalled source AND restores it under override="free".
- Aligned test_brief_paywall to the source-level model (paywalled sources carry a
paywalled homepage, as in production) — it had relied on article-URL detection.
425 backend tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
queries.feed was the main chokepoint, but several discovery paths have their own
SQL. Apply the shared source exclusion to all of them so "no paywalls" is truly
site-wide:
- briefs.build_daily_brief: EXCLUDE paywalled candidates (was: demote) — never
stored in a new brief.
- queries.brief: stored-brief retrieval (covers /today + /api/brief) filters the
paywalled source.
- digest.digest_items + followed_digest_items: the morning email + "from what you
follow" omit paywalled sources.
- sitemap(): paywalled article pages excluded from the sitemap.
All reuse queries.paywalled_source_ids (admin override still wins).
Regression tests (test_paywall_exclusion.py): never stored in a new brief; /today
+ digest omit it; followed-source email omits it; Saved retains it; 'free'
override restores eligibility. 423 backend tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Jay: don't surface stories people can't read without paying — it's off-brand
("no paywalls") and pointless. Paywalled is source-level (domain rule, admin-
overridable): just 3 sources today (Nature, New Scientist, MIT Tech Review),
~5.4% of accepted articles.
- queries.paywalled_source_ids(conn): live source set (admin override wins).
- queries.feed gains include_paywalled=False (default) → adds `a.source_id NOT IN
(…)`. One chokepoint covers Latest/tags/sources/moods/topics/search/since AND
the brief top-up. Source-level + SQL → paging stays exact, no frontend change.
- brief(): filter the cached/home pool by the same rule; replacement already
avoids paywalled and now rides the feed exclusion too.
- Dropped the now-moot "paywalled below readable" demotion sort.
- Saved/history keep showing items you saved (their own queries, not excluded).
- test_source_paywall_override updated: paywalled source → excluded from the feed
(was: shown with a badge); 'free' override → returns, no badge. 418 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Wikimedia feed's thumbnail is 330px, which upscales blurry in our hero. Use
originalimage.source instead — it's reliably sharp. (Can't just request a bigger
thumbnail width: for very large source images Wikimedia only serves pre-generated
bucket sizes and 400s on arbitrary widths — e.g. 500px ok, 800/1024px fail.)
- onthisday._best_image() prefers originalimage, falls back to the thumbnail.
- scripts/otd_image_upsize_backfill.py re-fetches each stored MM-DD and upgrades
image_url in onthisday_pool + daily_onthisday in place (ran on host: pool + 6
daily rows now sharp; today's hero verified 200). Only the /onthisday hero
loads this image (home card is text-only), so larger files are a single-page,
one-time load.
- test_best_image locks the prefer-original/fallback behavior.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- daily_art gains blurb + palette columns (idempotent migration).
- art._palette: Pillow median-cut to ~5 hex colors from the cached image (best-
effort → [] on any failure). art._blurb: a warm 2-3 sentence "what you're
looking at" note grounded in the Met catalogue (title/artist/bio/date/medium/
classification/culture/tags). Prompt leans on context/significance and the
title+tags for subject — explicitly NOT asserting literal composition (figure
counts/poses) it can't see, since the model can't view the image. Markdown
stripped from the output.
- pick_daily generates both (client optional → blurb skipped when absent); cycle
+ art CLI pass an LLM client. /api/art/today exposes blurb + palette.
- Backfilled the last 3 days on host (Veteran / Magnolia Vase / Bierstadt).
- scripts/art_blurb_palette_backfill.py for in-place backfill (no re-pick).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the gist-based read-time with the SOURCE article's full read time — the
contrast that sells the gist ("calm 1-min version here; ~10 min for the deep dive").
- goodnews/readtime.py: word_count_from_html (strips script/style/nav/header/
footer/form/button/aside furniture before counting) + source_read_minutes
(~225 wpm, 200-word floor, None when extraction looks failed/too thin).
- articles.source_words + read_checked_at columns (count only, never the body;
fits the privacy posture). Idempotent migration.
- enrich.fetch_source_words + enrich_read_times: a bounded, retry-guarded cycle
step (mirrors the image enrichers) that counts words for recent accepted
articles. Only ever writes a real count; never overwrites good with zero. Wired
into the cycle after recent-image enrichment.
- queries: source_words flows through _ARTICLE_COLUMNS; api exposes
source_read_minutes on Article (null when unknown).
- home3: News card shows "Full story · ~N min", hidden entirely when null (no
misleading "1 min").
- Tests: furniture stripping, threshold/rounding, enrich idempotency + no
zero-overwrite, API null handling. 412 backend.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Codex audit: only accept a polish when there's a gloss AND at least one
example sentence that actually contains the word (case-insensitive). Examples
that don't use the word are dropped; if none remain, fall back to the raw
dictionary def/examples instead of shipping a gloss with empty/irrelevant usage.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Content quality ("LLM polishes, dictionary anchors"):
- New wotd._polish: rewrites the real dictionary gloss into ONE warm plain
sentence + two clear everyday example sentences, grounded in the real
definition (no invented meanings). Stored in new wotd_pool/daily_wotd columns
gloss + usage, alongside the raw definition/examples which stay the anchor.
- harvest() polishes each new word; pick_daily() lazily polishes + caches back
any older pooled word that lacks a gloss (client threaded through run_daily).
- Admin word-add polishes on insert; re-pick passes an LLM client so quote
meaning / word gloss fill on a forced fresh pick.
- /api/word/today now prefers gloss + usage, falling back to the raw dictionary
def/examples when polish is absent (so it's always safe).
- db._migrate adds gloss/usage to wotd_pool + daily_wotd (idempotent ALTER).
Frontend — /word redesigned to CD's "Editorial Asymmetric": faded oversized
initial bleeding off the right, vertical part-of-speech rail, big Newsreader
word, airy definition, left-ruled italic example sentences, outline Listen
button + date. (Uses our self-hosted Newsreader/Hanken stack rather than the
mockup's Google fonts; the made-up syllable respelling is omitted since we only
have real IPA.)
Tests: _polish parse/trim/cap, harvest stores gloss/usage, pick lazy-polishes
older words, admin gloss flows through to /api/word/today. 403 backend + 27 fe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Admin joy item route moved to /api/admin/joys/{kind}/items/{item_id} so the
/add and /repick verbs resolve to their own routes instead of 422-ing as a
non-int item id (the launch blocker). Frontend mutate URL updated to match.
- Re-pick now excludes the currently-shown item: the endpoint reads today's
daily pool_id and passes it as `avoid`, so "Re-pick today" yields a different
item. Added `avoid` to pick_daily/_candidates across wotd/quote/onthisday.
- WOTD sense selection: the LLM now proposes word + intended part of speech, and
_lookup prefers that sense (fixes "serene" returning the archaic noun).
- On This Day tone prompt tightened to favor genuinely uplifting events and
exclude merely procedural/political-administrative ones.
- Caddy @hidden now also noindexes /word /quote /onthisday /admin (+ .html).
- Regression tests: add/repick resolve (401 not 422), add/feature/block/delete,
re-pick excludes current; WOTD pos-preference + proposal parsing units.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- /home3: small-joys rail now reads live /api/word|quote|onthisday/today (placeholders only
as fallback); each cell links to its detail page.
- HubShell component (shared bar/footer/fonts/tokens) for the hub + detail pages.
- /word: big word, IPA, Listen (cached clip + browser-TTS fallback), definition, sentences.
- /quote: the quote, attribution, and the AI "what it means".
- /onthisday: the date, year + fact, image, summary, source.
- Admin "Small Joys" tab: per-pool list with feature/block/delete/add + re-pick, for all
three kinds. New admin API: GET/POST /api/admin/joys/{kind}[/{id}|/add|/repick].
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- /api/art/image/{id} now answers HEAD as well as GET (was 404 on HEAD) — mirrors the
/a/{id} fix. Added tests/test_art_api.py (GET+HEAD+size=full fallback + today payload).
- /textures/* served immutable (long cache) instead of no-cache; excluded from the
revalidate matcher. Live Caddyfile + repo snapshot both updated.
- Lightbox: Escape closes it, and focus moves to it on open (keyboard-friendly).
- Trimmed the gallery's top padding so "Daily Art" sits closer to the bar.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Virtual frames (Walnut/Gold/Silver/None), selectable + remembered in localStorage,
built as a beveled moulding around a cream museum mat.
- Header uses the real /logo.svg wordmark; the "No ads" pill is replaced by an
account icon (the pill doesn't need to follow every page).
- Lightbox now opens a full-resolution copy that fills the screen: art._download_image
caches a hi-res {id}-full copy alongside the web-large display copy, served via
/api/art/image/{id}?size=full (image_url_large in /api/art/today).
- Centered the placard bullet separators (explicit .sep spans, equal margins).
- Image no longer shifts on hover; a quiet "Click to expand" affordance sits on the art.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Hardening before it runs further on the cycle:
- DB-lock/network: all HTTP (metadata + image) happens before any write; the write txn
opens only at the brief INSERT and commits immediately. Images download to a temp file
then atomic os.replace into cache (a reader never sees a half-written file).
- Site-timezone "daily" already used local_today() (same rhythm as the Brief) — confirmed.
- Attribution from day one: store + return title/artist/date/medium/department/credit/
source_url/object_id/source + museum name + is_public_domain license marker + the full-
res source URL (for a richer /art view later). UI can show: Title · Artist · The Met.
- "highlight != always beautiful": added a manual `blocked` flag on art_pool (excluded
from picks) as the cheap curation lever; a featured override can follow.
Schema migrated (existing art tables get the new columns). 373 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The engine for the /art room (design-independent; deploy held for Codex review).
- goodnews/art.py: harvest a curated pool of public-domain HIGHLIGHT artworks from the
Met (isHighlight+isPublicDomain+hasImages -> masterworks, never potsherds; CC0). Daily
deterministic pick from the least-recently-shown (no soon-repeats, same for everyone),
fetch metadata + download the image to OUR cache (data/art_cache) so the homepage never
waits on or hotlinks the museum. Bulletproof: bad object/image falls through candidates;
a failed day keeps the last piece (room never empty). Injectable HTTP for tests.
- Schema: art_pool + daily_art. /api/art/today (edge-cacheable) + /api/art/image/{id}
(served from cache, immutable). CLI `art [--harvest] [--force]` + a non-fatal cycle step.
- Tests (5, mocked HTTP) + verified live against the Met: harvested 1641 works,
picked/cached "Repose" by John White Alexander. 371 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the brand-name standard (camelCase, one word). Updated the SMTP From default and
the digest email body/subject strings. Live env From values (auth.env + goodnews.env)
updated to match. (Web/OG brand strings in share.py + app.html are the remaining sweep.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Hero constraint: _pick_lead now runs only within the CLOSEST non-empty section of a
personalized Brief, so a "gentler" wider-region/world story can never be floated into
the hero slot above a local one. Only widens if the closest section is empty.
- Dial gains a visible Clear (alongside Change) so a reader never feels locked into
personalization; "World" stays the keep-home-but-go-global option.
366 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codex-approved evolution: the reader controls the "emotional radius" of the landing.
- Census-region "Regional" grain (geo.region_of / region_states). Scope-aware tiering
(queries.home_tiers): closest->widest lead, confidence-gated on state + region, never
a hard filter — blends outward so the set is always full. 'world' = the global brief.
- queries.home_brief takes a scope; /api/brief gains a scope param (nearby|region|
country|world). Country-only / non-US homes collapse to country.
- Homepage dial replaces the 2-button toggle: adaptive stops (4 with a US state, else
Country/World), persisted scope, "Good news closest first" framing. Concrete, soft
section labels (Around New Jersey / Across the Northeast / Across the US / Around the
world) so the reader sees the dial worked.
Backend 366 + frontend tests green. (Latest feed still on v1 local-first; aligning it
to the dial is the immediate follow-up.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the owner's call (overrides the earlier "Brief sacred" stance): when a home is
set, the homepage opens with local good news first, not global. This is the hook —
you land and see awesome stories from YOUR corner first.
- queries.home_brief: local-first highlights (high/medium-confidence near, blended
out to country then world so it's always a full, strong set), preferring already-
summarized stories so the calm read stays rich. Recent window, ranked within tier.
- /api/brief gains a `home` param: private/no-store when set; over-fetches + caps so
dismissal/boundary filtering never thins it; falls back to global top-up if needed.
- Landing UI: a Local <-> Global toggle ("📍 Near you / 🌍 Everywhere") when a home
is set, the calm picker invite when not (dismissible), and Change. Default leads
local; one tap back to the global brief. No home set => exactly today's behavior.
Backend + frontend tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codex polish before deploy: anything elevated as Near you / Close to home must have
geo_confidence in (high, medium) — the feature's promise is relevance. Country-only
mode now gates "near" too; since it has no "country" tier, the "world" scope is
widened to absorb low-confidence home-country stories so they surface there instead
of vanishing between tiers (the same edge-case class, fixed). State mode unchanged.
364 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completes the server side of "Closer to Home". /api/feed gains a `home` param
('US' or 'US-NY'); when set the response is private (like prefs) and sectioned:
- Near you (+ Elsewhere in your country when a state is set) is a ONE-TIME lead
block on page 0; the world is the paginated body. next_offset tells the client
where to continue, so the lead block never skews world paging.
- Thin tiers fold down (MIN_TIER=3) so a header is never shown empty (lead, don't trap).
- State match counts only on high/medium geo confidence; the "country" tier excludes
exactly what went to "near", so a low-confidence home-state story still surfaces
(it doesn't vanish between tiers — caught + tested).
- Items carry a `section` tag; paywalled sort is now within-section. No home => exact
prior behavior (section null, default/edge-cached feed unchanged), Brief untouched.
364 tests green. Frontend next: Home picker + sectioned feed rendering.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Foundation for "Closer to Home" (server-side, Codex-approved). No behavior change
yet — geo_scope defaults None, so the default/edge-cached feed is identical.
- queries.feed now returns each article's geo (breadth, confidence, and ISO-coded
places) via a LEFT JOIN + places subquery. Article.from_row parses geo_places
into [{country, state}]. Brief query doesn't select geo, so the Brief stays bare.
- queries.feed gains home-scope filters (home_country/home_state/geo_scope =
near|country|world): STATE match only counts on high/medium geo confidence;
untagged articles fall to 'world' so nothing is lost during backfill.
Next: API composition (home param + near/country/world sectioning with soft/blended
headers + a next_offset pagination model) and the Home picker UI. 360 tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
"Closer to Home" foundation (audit greenlit by Codex). Durable geography, kept
decoupled from volatile scoring.
- Schema: article_geo (breadth/confidence/rationale/geo_version) + article_places
(0..N ISO-coded places), separate from article_scores so re-runs/audits never
disturb scoring or acceptance. "local" is never stored — it's relative to the
reader; the UI computes "Near you" later.
- geo.py: LLM proposes place NAMES, code disposes to ISO codes (country alpha-2,
US state 2-letter); region words like "Europe" can never become a country.
'global'/placeless is first-class, not failure. Confidence calibrated so 'high'
needs an explicit location. Geo is its OWN LLM pass, not merged into the scoring
prompt (durable metadata, re-runnable, keeps the sensitive prompt untouched).
- store_geo replaces places (geo is re-derivable, unlike scores). tag_articles is
idempotent by geo_version, only touches accepted non-duplicate articles.
- CLI `geo` command (cycle-locked, --limit/--reclassify) for backfill, plus a
bounded geo step in the cycle (--geo-limit 60, --no-geo). scripts/geo_audit.py
is the prototype audit tool.
360 tests green; live smoke tagged real articles correctly (Gaza->PS, London->GB,
placeless science->global). No UI / SEO pages yet — ranking/personalization only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sharpen the existing daily-game share loop into something measurable (per Codex's
"instrument what you have, then feed people into it" plan), ahead of a Show HN launch.
Analytics:
- Per-game funnel events <game>_{arrival,started,completed,shared} (article_id=0).
arrival = landed via a shared link (utm_source=game_share); started = first move
(guess/find/flip); completed = solved/cleared/Full Bloom; shared = on share success.
- trackVisit() moved into the global layout so direct /play landings count; the
server-rendered /a/ share page now creates a visitor token + sends a daily visit
beacon (first-time /a/-only visitors were previously dropped).
- Admin "Games funnel" panel: arrivals / engaged / completed / shared, per game.
Sharing:
- Memory Match gains a Share button (it was the only game without one).
- All shares deep-link to the exact game+variant with a full https:// URL +
utm_source=game_share (gameShareUrl helper), instead of a bare /play.
- "shared" is counted only after navigator.share()/clipboard.writeText() succeeds.
/play social metadata:
- /play served homepage canonical/OG (static SPA, ssr=false). postbuild script
patches build/play.html's head to /play canonical/title/description/OG; fails the
build if the homepage tags drift. Caddy try_files now serves {path}.html so /play
is served from the patched file (snapshot in deploy/caddy/).
Tests: backend 352, frontend 27.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The deploy pipeline runs from the working tree, so a wave of shipped features
had never been committed. This snapshots git to what's actually running.
SEO impression recovery (live + verified):
- Duplicate /a/{id} now 301-redirect to their canonical twin instead of 404
(a hard 404 silently dropped already-indexed URLs and tanked impressions).
- Dedup representative selection reworked: accepted/serveable -> established
rep (URL stability) -> quality score, so an accepted page never retires to a
rejected rep and an indexed canonical doesn't churn when a newer twin arrives.
- HEAD /a/{id} returns the same status as GET (api_route GET+HEAD) instead of
falling through to the static mount and 404ing.
- `dedup --force-recluster`: cycle-locked, model-free re-cluster to re-apply the
policy to the existing corpus (shared cycle_lock context manager).
- CLI honors GOODNEWS_DB for its default --db (was silently ignored).
Publishing Desk (admin tool to post highlights to X via Web Intents):
- publishing.py queue/rank/handle-resolution; admin UI; full searchable emoji
picker (bundled data, no CDN) for the blurb editor.
Play games + site:
- Bloom (word-wheel), Memory Match, daily ritual set, Zen Den (dev-gated).
- English-only language gate; source prospecting; paywall + dedup hardening.
Tests: full suite green (349). Ignores tightened (node_modules, data/*.db).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Articles inspector revealed paywall is domain-coarse: nytimes.com is flagged,
so NY Times Learning's free Word-of-the-Day inherits 🔒 — and that flag isn't
cosmetic, it deprioritizes the content in feed sort + lead selection. Add a
per-source override so admins can correct it after inspecting.
- sources.paywall_override: NULL (domain rule) | 'free' | 'paywalled'.
- paywall.py: keep low-level is_paywalled(url) (domain); add is_paywalled_for_source
(url, override) for the EFFECTIVE decision — never patched the domain helper
globally (per Codex), so "domain says X" stays distinguishable from "overridden".
- Threaded everywhere ranking/UI touches paywall, via src.paywall_override on the
shared _ARTICLE_COLUMNS + the source-aware helper: feed sort, /api/since, replace,
lead selection, Article badge, brief composition (briefs.py), digest, source_health
(table 🔒), the Articles inspector, and the review/attention check — so ranking and
UI always agree.
- Endpoint POST /api/admin/sources/{id}/paywall {override}; admin UI: a select in the
inspector header (Use domain rule / Treat as free / Treat as paywalled) + the basis
("ON (domain)" / "OFF (override)"), optimistic so the panel stays open.
Test: domain rule → paywalled in table+inspector+feed badge; 'free' → off in all
three; validation 422 + 404. 242 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New per-row "Articles" button on the Sources table expands a read-only inline
panel of the source's ACTUAL ingested articles — so the automated metrics
(paywall/image/acceptance/duplicate) can be verified against evidence instead of
trusted blind. Distinct from "Check" (which re-samples the LIVE feed for
would-pass quality); this shows what's already in the DB, which is what the table
metrics are computed from.
- Backend: GET /api/admin/sources/{id}/articles?filter=&limit=&offset= (admin,
read-only). queries.source_articles + source_articles_summary — per article:
title, url, date, accepted, reason (the "why"), topic/flavor, paywalled
(domain rule), has_image, duplicate. Summary = counts + source-level paywall
rule.
- Frontend: expandable panel with a summary header ("27 ingested · 18 accepted
· … · paywall rule: ON (domain)"), filter chips (All/Accepted/Rejected/No
image/Duplicates), compact rows with title→link + badges + reason, Load more.
So "100% paywall" or "0% images" becomes clickable evidence: open two articles
to tell a real paywall from a mis-flagged domain, or a true image gap from an
enrichment failure. Test: test_source_articles_inspector. 241 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Play hub: word cards now surface IN-PROGRESS games too (not just won/lost) so
"continue on another device" shows at a glance — card reads "5:3…" and the
selection option says "Continue · 3/6".
- Word Search generator: replace "prefer any crossing" with a SCORED placement —
score = overlap*4 - local crowding (filled neighbours that aren't crossings) —
then pick among the best ~20%. Keeps the organic interlocking but spreads words
across the board instead of clumping around the first-placed (longest) words.
Every word still placed (tests green). NOTE: changes today's grid layouts, so
an in-progress word search resets once.
237 pytest + 11 vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Don't trust client JSON at the storage layer:
- sanitize_game_state() runs before merge AND on the merged result (heals legacy
rows). Word Search: keep only finds whose cells actually spell a real word in
that day's grid (validated when the puzzle exists, shape-only 4-12 alpha +
cell-length otherwise), dedupe, renumber ci. Word: validate status enum, guess
count/length/alpha, colour-row shape, terminal answer/why.
- Completion is now derived from the real puzzle word count (foundWords ==
expected), not a client-sent `ms` — so stats can't be inflated by junk.
- Date validated as YYYY-MM-DD at the API (400 otherwise) — no junk/future rows.
Tests: sanitizer-rejects-junk + bad-date 400; existing tests updated to use
real-shaped data (the sanitizer is a good forcing function). 237 pytest + 11
vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two game polish items:
- Word Search: overlapping cells now multiply-blend the crossing words' colours
(deepening to a darker shade with readable text) instead of the newest colour
stomping the rest — matches the new interlocking grids.
- Cross-device game-state sync (signed-in): per-puzzle progress + stats now
follow you between devices. New game_state table; server-side merge on every
save so two devices converge regardless of push order, tailored per game:
* Word Search → UNION of finds (monotonic; can't un-find), earliest start,
best completion time.
* Word → furthest-progress wins (terminal beats in-progress; more guesses
beats fewer) — picks one device's game whole, never splices guesses.
Stats (streak/distribution/best) derived server-side from the synced states,
so they're consistent instead of per-device counters. Endpoints GET/PUT
/api/games/state + GET /api/games/stats (signed-in; size-capped). Frontend is
local-first: games paint instantly from localStorage, then reconcile in the
background; both game components push debounced on each move and adopt the
merge. Conflict handling unit-tested + an API two-device convergence test.
235→ tests + 11 vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two things found while chasing the recurring ~15min slowness:
- dedup.py: cluster_duplicates re-ran an O(n²) cosine pass over ALL ~3.7k
articles and rewrote duplicate_of for every one of them EVERY cycle — even
when nothing new arrived (embedded=0) — ~53s CPU + a large WAL commit that
starved live API reads (/api/brief 2-7s). Now skip the re-cluster entirely
when nothing new was embedded (clusters can't have changed). Verified: cycle
drops from ~53s to ~1s and /api/brief stays at 20ms through a cycle, vs 2-7s
before. (A real new article still triggers a full re-cluster.)
- games.py _build_grid: word placement took the first random valid spot, so
words rarely crossed. Now gather valid placements and PREFER ones that cross
an already-placed word (shared matching letter), falling back to any valid
spot — so the grid interlocks like a real word search. Every word still
placed (tests green). NOTE: changes today's grid layouts, so an in-progress
word search resets once.
Also added a systemd drop-in (Nice=19/CPUWeight=20/IOWeight=10/ionice-idle) to
deprioritize the batch cycle — minor, the dedup skip is the real fix.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two concrete latency wins found by measuring (server compute is 2-17ms; the time
is in the path, not the box):
- Admin panel fired its 6 API calls SEQUENTIALLY (await chain) — so it paid the
uncached origin round-trip six times back-to-back. Now one Promise.all batch.
This is the admin lag.
- /api/brief (the home "Gathering the good news…" content) wasn't edge-cached, so
a distant anonymous visitor triggered a Cloudflare→residential-origin pull.
Same global/shareable boundary as /api/feed: public s-maxage=45 when no
prefs/exclude, else private,no-store. (Needs /api/brief added to the CF cache
rule path list to take effect at the edge.)
Tests: test_brief_cache_boundary. 228 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
"Gathering the good news…" waits on the home's startup API calls, which were all
DYNAMIC → a round-trip to the residential origin every load (the occasional 2-3s
linger). These responses depend only on the URL, never the session, so they're
safe to share at the edge:
- /api/moods, /api/categories (static config) → public, s-maxage=900
- /api/lanes, /api/families (global, data-derived counts) → public, s-maxage=120
- /api/feed → public, s-maxage=45 ONLY when shareable (no following / prefs /
exclude); the following feed (reads the session) and personal filters stay
private, no-store.
Hard personalization boundary, explicit per-endpoint (no blanket /api/* rule).
Pairs with a Cloudflare cache rule (added separately) making these paths
eligible. Tests assert the global endpoints are public+s-maxage and the feed
boundary (default/topic public; following/prefs/exclude private). 227 pytest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two small server-side tweaks so the endpoint matches the UI policy:
- Rename is refused (409) for promoted/rejected candidates — they're settled
history; the UI already hides Rename for them, now the server enforces it too.
- Name is capped at 160 chars before save, so an accidental pasted paragraph
can't wreck the queue layout.
Tests extended: 300-char name truncates to 160; renaming a promoted candidate
→ 409. 225 pytest + 11 vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A staged candidate could only be renamed by rejecting and re-adding it, which
churns the queue and discards the preview just to fix a typo. Add an inline
Rename on each candidate: a "Rename" pill swaps the name for an input
(Enter saves · Esc cancels), POST /api/admin/candidates/{id}/rename →
sources.rename_candidate(). Empty clears the name (promote then derives one
from the feed host). Preview is preserved; the fixed name carries into promotion.
Tests: test_candidate_rename (rename in place keeps preview, promotes with the
new name, gated + 404). 225 pytest + 11 vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three follow-ups from Codex's audit of the deep-preview/search/dedup work:
- Promote-time duplicate guard: promote_candidate() now re-checks
find_existing_feed() and raises DuplicateFeedError → 409, so an
old/CLI/direct-DB candidate or a race can't bypass the add-time check and
silently overwrite a live source's settings via upsert. (sources scanned
first, so a real source collision wins over the candidate matching itself.)
- postJSON/putJSON/delJSON gain opt-in {timeout} (AbortController, default
none so other calls are unchanged); deep preview uses 120s and surfaces a
calm "timed out" message instead of pinning the button on "Deep-checking…"
if the LAN model stalls.
- feed_key() now lowercases the host only, not the whole URL — paths/queries
can be case-significant; scheme/www/trailing-slash/host-case still collapse.
Tests: test_candidate_deep_preview_and_dedup extended — promote succeeds once,
then a re-promote of the same candidate is refused 409. 224 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three admin Sources upgrades:
- Deep preview: a per-candidate "🔬 Deep preview" button runs the REAL
classifier on an 8-item sample (the same model that judges live articles),
versus the fast keyword heuristic the add/Re-preview path uses. Preview now
carries `classified`, surfaced as a "model-checked" vs "quick estimate"
badge — so the acceptance % is no longer ambiguously heuristic. conn is
released during the ~30-60s model pass; postJSON has no client timeout.
- Search: free-text box over the sources table (name / category / feed URL /
homepage), folded into the existing status filter, with a live match count
and empty state. Makes "is this already added?" a glance.
- Duplicate-add guard: sources.find_existing_feed() + feed_key() normalize
scheme/www/trailing-slash/case, so re-adding a feed that's already a live
source or a queued candidate is refused with a 409 naming where it lives
(DB already enforced exact-URL uniqueness; this catches the near-miss
variants and overwrite-on-promote footgun).
Tests: test_candidate_deep_preview_and_dedup (deep flag wires the model +
uses the small sample; exact/www/slash/case variants all 409). 224 pytest +
11 vitest green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>