14 Commits

Author SHA1 Message Date
thejayman77 1c1ecefde8 news: harden paywall exclusion at the candidate query + add the missing regressions
Codex's two non-blocking hardening items, folded in before cutover:
- _candidate_articles() now excludes paywalled sources IN-QUERY (before LIMIT 50),
  so flagged stories can't consume candidate slots and leave a full brief thin.
  Dropped the now-redundant post-fetch filter in build_daily_brief.
- Regressions: history retains a viewed paywalled article; sitemap omits a
  paywalled source AND restores it under override="free".
- Aligned test_brief_paywall to the source-level model (paywalled sources carry a
  paywalled homepage, as in production) — it had relied on article-URL detection.

425 backend tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 18:54:53 -04:00
thejayman77 c600145ba5 news: close the remaining no-paywall bypass paths (Codex audit)
queries.feed was the main chokepoint, but several discovery paths have their own
SQL. Apply the shared source exclusion to all of them so "no paywalls" is truly
site-wide:
- briefs.build_daily_brief: EXCLUDE paywalled candidates (was: demote) — never
  stored in a new brief.
- queries.brief: stored-brief retrieval (covers /today + /api/brief) filters the
  paywalled source.
- digest.digest_items + followed_digest_items: the morning email + "from what you
  follow" omit paywalled sources.
- sitemap(): paywalled article pages excluded from the sitemap.
All reuse queries.paywalled_source_ids (admin override still wins).

Regression tests (test_paywall_exclusion.py): never stored in a new brief; /today
+ digest omit it; followed-source email omits it; Saved retains it; 'free'
override restores eligibility. 423 backend tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 17:22:52 -04:00
thejayman77 2dbe73430c Sources: per-source paywall override (3-state) — fix domain-rule mis-flags
The Articles inspector revealed paywall is domain-coarse: nytimes.com is flagged,
so NY Times Learning's free Word-of-the-Day inherits 🔒 — and that flag isn't
cosmetic, it deprioritizes the content in feed sort + lead selection. Add a
per-source override so admins can correct it after inspecting.

- sources.paywall_override: NULL (domain rule) | 'free' | 'paywalled'.
- paywall.py: keep low-level is_paywalled(url) (domain); add is_paywalled_for_source
  (url, override) for the EFFECTIVE decision — never patched the domain helper
  globally (per Codex), so "domain says X" stays distinguishable from "overridden".
- Threaded everywhere ranking/UI touches paywall, via src.paywall_override on the
  shared _ARTICLE_COLUMNS + the source-aware helper: feed sort, /api/since, replace,
  lead selection, Article badge, brief composition (briefs.py), digest, source_health
  (table 🔒), the Articles inspector, and the review/attention check — so ranking and
  UI always agree.
- Endpoint POST /api/admin/sources/{id}/paywall {override}; admin UI: a select in the
  inspector header (Use domain rule / Treat as free / Treat as paywalled) + the basis
  ("ON (domain)" / "OFF (override)"), optimistic so the panel stays open.

Test: domain rule → paywalled in table+inspector+feed badge; 'free' → off in all
three; validation 422 + 404. 242 pytest + 11 vitest.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 22:10:44 -04:00
thejayman77 9ed817c051 Source Retire lifecycle (Phase 1: status + content_visible, active mirrored)
Per Codex's plan — introduce a lifecycle without a risky "change the source of
truth everywhere" moment.

* Schema: sources.status (active|paused|retired) + content_visible; migration
  backfills status from active (active=1→active, else paused), content_visible=1.
* `active` is kept as a SYNCED MIRROR: status active→active=1, paused/retired→0,
  so the scheduler/CLI/legacy code keep working unchanged.
* Retire stops polling but keeps articles visible (non-destructive). Hiding is a
  separate, reversible lever: content_visible=0 drops a source's articles from
  the public feed + brief (read AND build), behind a confirm. Personal saved/
  history are untouched.
* API: /sources/{id}/status (validates, mirrors active) + /visibility, replacing
  /active. source_health returns status + content_visible.
* Admin: status column (active/paused/retired + "hidden"), Retired filter,
  Pause/Resume · Retire/Restore · Hide/Show actions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 09:58:15 -04:00
thejayman77 d87347b032 Dashboard: content + source-health; per-viewer local dates
* Date fix: introduce GOODNEWS_TZ (goodnews/localtime.py) so the brief's "today"
  rolls over in a pinned zone (Eastern) instead of UTC — robust to host-clock
  resets. The home page now formats the brief's date in each VISITOR's local
  timezone (from its UTC freshness stamp), so nobody ever sees "tomorrow."

* Admin "Content served": articles live, fresh (7d), ingested (24h), summaries,
  active sources, today's brief size — queries.content_stats().

* Admin "Source health": per active source, the failure streak, last error,
  accepted contribution, and computed next-poll time (so backoff / "resting
  until" is visible), via queries.source_health() reusing the feeds backoff
  math. Failing sources sort to the top; times render in the viewer's zone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:34:22 +00:00
thejayman77 68a401eed6 Fresh server data overrides a pinned brief; pin holds otherwise
Per the agreed model: the brief is server-authoritative and a client Replace is
a soft override that yields when genuinely new data arrives.
- build_daily_brief is now idempotent: if the composed selection is unchanged it
  leaves the brief (and its created_at) alone, so the timer's 15-min rebuilds are
  no-ops when no new data landed.
- /api/brief exposes generated_at (the brief's created_at = a content-change
  stamp). The client pins its view against generated_at and keeps it across plain
  refreshes, but drops it and shows the fresh server brief when generated_at
  advances. Missed stories remain in the mood feeds.

Tests: idempotent rebuild (no-op vs content change) — 93 total.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 14:00:08 +00:00
thejayman77 d8d665ee35 Crisp hero (prefer og:image), 7-card Highlights, no-recycle Replace + session History
- Hero blur fix: brief enrichment now prefers a page's og:image even when a
  feed thumbnail exists (feed thumbs are often tiny; the hero is shown large).
  Verified: BBC hero upgrades to the 1024px share image, ScienceDaily to 1920px.
- Today is now 'Highlights from Today' — hero + 6 (brief size 7), which also
  makes the secondary grid a balanced 3+3 instead of an orphaned 3+1.
- Replace now excludes every article seen this session (a client-side seen-set),
  so it never cycles back to something already shown.
- New session History panel (this tab only, no account): lists everything seen,
  including swapped-away stories, so they stay recoverable. Persistent
  history/favorites are tabled for sign-in later.

Tests: og:image upgrade of an existing feed image (86 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:56:57 +00:00
thejayman77 3858380ffe Brief emotional-mix guardrails + source on its own line
Composition (Codex's priority — content mix was the louder problem):
- _select_diverse now guards the daily five's emotional tone: at most 1 health,
  at most 2 science+health combined, at most 2 of any topic, distinct sources —
  so at least three of the five are community/culture/animals/environment when
  available. Caps relax (mix, then source) only to fill on thin days.
- Verified live: today's five went to environment x2, health, animals, science.

UI:
- Source moved to its own line below the tags, left-justified, for uniform
  rhythm across hero and tiles (was sometimes trailing the tags, right-aligned).
- Watermark kept as-is (intentionally subtle; liked).

Tests updated for the emotional-mix contract (80 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:29:02 +00:00
thejayman77 541f59ed6e Option A: typographic editorial tiles + single treated hero image; balance brief topics
Frontend (the premium baseline):
- The hero is now the ONLY image slot. Soft feed images get an atmospheric
  gradient overlay; no over-reliance on inconsistent RSS image quality.
- Every secondary/lane card is a uniform typographic editorial tile: no
  thumbnails, equal visual weight, a faint topic wordmark watermark, a slim
  sage top accent, consistent source, reason text as the trust signal, visible
  Replace with quiet tuning actions. Fixes the jarring mixed-media row rhythm
  and removes muddy thumbnails entirely.

Backend (composition):
- _select_diverse now balances topics: no more than 2 of one topic while other
  topics have candidates (relaxing source then topic caps only to fill), so the
  daily five stop clustering medical/science items. Candidates now carry s.topic.

Tests updated for the topic-balance contract (79 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:10:05 +00:00
thejayman77 ba801d90f6 Make paywalls systemic + fix ArticleCard reactivity
- ArticleCard: derive safeHref from article.url and reset image-failure state
  when the article changes, so in-place replacements re-evaluate correctly
  (clears the Svelte capture warning; build is warning-free again).
- Downweight paywalled stories below readable ones (stable sort) when composing
  the daily five and in feed results — the brief now leads readable and rarely
  hands over a locked door.
- review_sources gains a 'paywall-heavy' advisory flag (Nature, New Scientist
  flag at 100%); never auto-deactivates.
- New Scientist/Nature kept active but no longer reach the daily five; they
  remain browsable with the label + Replace.
- Tests: brief readability preference + paywall-heavy flag (79 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 01:36:53 +00:00
thejayman77 9cdcda5e02 Durability pass: tests, clearer diversity/classify behavior, Calm Filters foundation
- Add pytest suite (34 tests) covering scoring thresholds, dedup clustering +
  representative selection + time window, brief source/category diversity,
  avoid-term phrase matching, and text canonicalization/truncation.
- Rewrite _select_diverse with an explicit, tested contract (best-first, one
  per source, backfill, then inject a second category by evicting the
  lowest-ranked pick).
- classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so
  silent model failures are visible in both the cycle and classify output.
- Fix clean_text truncation to stay within max_len (ellipsis no longer
  overshoots).
- New filters.py: canonical FilterPrefs shape (include/mute topics+flavors,
  avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding
  Calm Filters. Not yet wired into the API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:07:31 +00:00
thejayman77 5d44072fca Add semantic cross-source dedup via local embeddings
- LocalModelClient.embed() calls the OpenAI-compatible /embeddings endpoint
  (local nomic model); base_url shared with chat, model via GOODNEWS_EMBED_MODEL.
- New article_embeddings table and articles.duplicate_of column (+ migration).
- dedup module: embeds missing articles, clusters near-identical stories within
  a date window by cosine similarity (pure-stdlib, vectors normalised once), and
  marks all but the highest-ranked member of each cluster as a duplicate.
- 'dedup' CLI command; cycle now runs poll -> classify -> dedup -> brief.
- Feed and brief queries hide duplicates, so a story carried by multiple
  outlets shows once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:40:55 +00:00
thejayman77 2a9c49e2a9 Sparse-day-proof briefs, feed health check, and 16 new sources
- Briefs now fill from a rolling window (prefer today, backfill up to
  window_days) and exclude anything featured in the last 7 days of briefs, so
  slow days still produce five items without stories lingering day to day.
- New 'check-feeds' command fetches and parses every feed to catch dead ones.
- Added 16 validated sources (science, environment, animals, culture),
  expanding coverage from 12 to 28 feeds to reduce staleness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:30:03 +00:00
thejayman77 068073423f Initial commit: goodNews constructive-news ingestion prototype
Local-first RSS/Atom ingestion pipeline with metadata-only storage,
heuristic + local-LLM scoring, and daily brief builder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 00:48:26 +00:00