Codex's two non-blocking hardening items, folded in before cutover:
- _candidate_articles() now excludes paywalled sources IN-QUERY (before LIMIT 50),
so flagged stories can't consume candidate slots and leave a full brief thin.
Dropped the now-redundant post-fetch filter in build_daily_brief.
- Regressions: history retains a viewed paywalled article; sitemap omits a
paywalled source AND restores it under override="free".
- Aligned test_brief_paywall to the source-level model (paywalled sources carry a
paywalled homepage, as in production) — it had relied on article-URL detection.
425 backend tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
queries.feed was the main chokepoint, but several discovery paths have their own
SQL. Apply the shared source exclusion to all of them so "no paywalls" is truly
site-wide:
- briefs.build_daily_brief: EXCLUDE paywalled candidates (was: demote) — never
stored in a new brief.
- queries.brief: stored-brief retrieval (covers /today + /api/brief) filters the
paywalled source.
- digest.digest_items + followed_digest_items: the morning email + "from what you
follow" omit paywalled sources.
- sitemap(): paywalled article pages excluded from the sitemap.
All reuse queries.paywalled_source_ids (admin override still wins).
Regression tests (test_paywall_exclusion.py): never stored in a new brief; /today
+ digest omit it; followed-source email omits it; Saved retains it; 'free'
override restores eligibility. 423 backend tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Articles inspector revealed paywall is domain-coarse: nytimes.com is flagged,
so NY Times Learning's free Word-of-the-Day inherits 🔒 — and that flag isn't
cosmetic, it deprioritizes the content in feed sort + lead selection. Add a
per-source override so admins can correct it after inspecting.
- sources.paywall_override: NULL (domain rule) | 'free' | 'paywalled'.
- paywall.py: keep low-level is_paywalled(url) (domain); add is_paywalled_for_source
(url, override) for the EFFECTIVE decision — never patched the domain helper
globally (per Codex), so "domain says X" stays distinguishable from "overridden".
- Threaded everywhere ranking/UI touches paywall, via src.paywall_override on the
shared _ARTICLE_COLUMNS + the source-aware helper: feed sort, /api/since, replace,
lead selection, Article badge, brief composition (briefs.py), digest, source_health
(table 🔒), the Articles inspector, and the review/attention check — so ranking and
UI always agree.
- Endpoint POST /api/admin/sources/{id}/paywall {override}; admin UI: a select in the
inspector header (Use domain rule / Treat as free / Treat as paywalled) + the basis
("ON (domain)" / "OFF (override)"), optimistic so the panel stays open.
Test: domain rule → paywalled in table+inspector+feed badge; 'free' → off in all
three; validation 422 + 404. 242 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Codex's plan — introduce a lifecycle without a risky "change the source of
truth everywhere" moment.
* Schema: sources.status (active|paused|retired) + content_visible; migration
backfills status from active (active=1→active, else paused), content_visible=1.
* `active` is kept as a SYNCED MIRROR: status active→active=1, paused/retired→0,
so the scheduler/CLI/legacy code keep working unchanged.
* Retire stops polling but keeps articles visible (non-destructive). Hiding is a
separate, reversible lever: content_visible=0 drops a source's articles from
the public feed + brief (read AND build), behind a confirm. Personal saved/
history are untouched.
* API: /sources/{id}/status (validates, mirrors active) + /visibility, replacing
/active. source_health returns status + content_visible.
* Admin: status column (active/paused/retired + "hidden"), Retired filter,
Pause/Resume · Retire/Restore · Hide/Show actions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Date fix: introduce GOODNEWS_TZ (goodnews/localtime.py) so the brief's "today"
rolls over in a pinned zone (Eastern) instead of UTC — robust to host-clock
resets. The home page now formats the brief's date in each VISITOR's local
timezone (from its UTC freshness stamp), so nobody ever sees "tomorrow."
* Admin "Content served": articles live, fresh (7d), ingested (24h), summaries,
active sources, today's brief size — queries.content_stats().
* Admin "Source health": per active source, the failure streak, last error,
accepted contribution, and computed next-poll time (so backoff / "resting
until" is visible), via queries.source_health() reusing the feeds backoff
math. Failing sources sort to the top; times render in the viewer's zone.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the agreed model: the brief is server-authoritative and a client Replace is
a soft override that yields when genuinely new data arrives.
- build_daily_brief is now idempotent: if the composed selection is unchanged it
leaves the brief (and its created_at) alone, so the timer's 15-min rebuilds are
no-ops when no new data landed.
- /api/brief exposes generated_at (the brief's created_at = a content-change
stamp). The client pins its view against generated_at and keeps it across plain
refreshes, but drops it and shows the fresh server brief when generated_at
advances. Missed stories remain in the mood feeds.
Tests: idempotent rebuild (no-op vs content change) — 93 total.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Hero blur fix: brief enrichment now prefers a page's og:image even when a
feed thumbnail exists (feed thumbs are often tiny; the hero is shown large).
Verified: BBC hero upgrades to the 1024px share image, ScienceDaily to 1920px.
- Today is now 'Highlights from Today' — hero + 6 (brief size 7), which also
makes the secondary grid a balanced 3+3 instead of an orphaned 3+1.
- Replace now excludes every article seen this session (a client-side seen-set),
so it never cycles back to something already shown.
- New session History panel (this tab only, no account): lists everything seen,
including swapped-away stories, so they stay recoverable. Persistent
history/favorites are tabled for sign-in later.
Tests: og:image upgrade of an existing feed image (86 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Composition (Codex's priority — content mix was the louder problem):
- _select_diverse now guards the daily five's emotional tone: at most 1 health,
at most 2 science+health combined, at most 2 of any topic, distinct sources —
so at least three of the five are community/culture/animals/environment when
available. Caps relax (mix, then source) only to fill on thin days.
- Verified live: today's five went to environment x2, health, animals, science.
UI:
- Source moved to its own line below the tags, left-justified, for uniform
rhythm across hero and tiles (was sometimes trailing the tags, right-aligned).
- Watermark kept as-is (intentionally subtle; liked).
Tests updated for the emotional-mix contract (80 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Frontend (the premium baseline):
- The hero is now the ONLY image slot. Soft feed images get an atmospheric
gradient overlay; no over-reliance on inconsistent RSS image quality.
- Every secondary/lane card is a uniform typographic editorial tile: no
thumbnails, equal visual weight, a faint topic wordmark watermark, a slim
sage top accent, consistent source, reason text as the trust signal, visible
Replace with quiet tuning actions. Fixes the jarring mixed-media row rhythm
and removes muddy thumbnails entirely.
Backend (composition):
- _select_diverse now balances topics: no more than 2 of one topic while other
topics have candidates (relaxing source then topic caps only to fill), so the
daily five stop clustering medical/science items. Candidates now carry s.topic.
Tests updated for the topic-balance contract (79 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ArticleCard: derive safeHref from article.url and reset image-failure state
when the article changes, so in-place replacements re-evaluate correctly
(clears the Svelte capture warning; build is warning-free again).
- Downweight paywalled stories below readable ones (stable sort) when composing
the daily five and in feed results — the brief now leads readable and rarely
hands over a locked door.
- review_sources gains a 'paywall-heavy' advisory flag (Nature, New Scientist
flag at 100%); never auto-deactivates.
- New Scientist/Nature kept active but no longer reach the daily five; they
remain browsable with the label + Replace.
- Tests: brief readability preference + paywall-heavy flag (79 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add pytest suite (34 tests) covering scoring thresholds, dedup clustering +
representative selection + time window, brief source/category diversity,
avoid-term phrase matching, and text canonicalization/truncation.
- Rewrite _select_diverse with an explicit, tested contract (best-first, one
per source, backfill, then inject a second category by evicting the
lowest-ranked pick).
- classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so
silent model failures are visible in both the cycle and classify output.
- Fix clean_text truncation to stay within max_len (ellipsis no longer
overshoots).
- New filters.py: canonical FilterPrefs shape (include/mute topics+flavors,
avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding
Calm Filters. Not yet wired into the API.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- LocalModelClient.embed() calls the OpenAI-compatible /embeddings endpoint
(local nomic model); base_url shared with chat, model via GOODNEWS_EMBED_MODEL.
- New article_embeddings table and articles.duplicate_of column (+ migration).
- dedup module: embeds missing articles, clusters near-identical stories within
a date window by cosine similarity (pure-stdlib, vectors normalised once), and
marks all but the highest-ranked member of each cluster as a duplicate.
- 'dedup' CLI command; cycle now runs poll -> classify -> dedup -> brief.
- Feed and brief queries hide duplicates, so a story carried by multiple
outlets shows once.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Briefs now fill from a rolling window (prefer today, backfill up to
window_days) and exclude anything featured in the last 7 days of briefs, so
slow days still produce five items without stories lingering day to day.
- New 'check-feeds' command fetches and parses every feed to catch dead ones.
- Added 16 validated sources (science, environment, animals, culture),
expanding coverage from 12 to 28 feeds to reduce staleness.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>