"Closer to Home" foundation (audit greenlit by Codex). Durable geography, kept
decoupled from volatile scoring.
- Schema: article_geo (breadth/confidence/rationale/geo_version) + article_places
(0..N ISO-coded places), separate from article_scores so re-runs/audits never
disturb scoring or acceptance. "local" is never stored — it's relative to the
reader; the UI computes "Near you" later.
- geo.py: LLM proposes place NAMES, code disposes to ISO codes (country alpha-2,
US state 2-letter); region words like "Europe" can never become a country.
'global'/placeless is first-class, not failure. Confidence calibrated so 'high'
needs an explicit location. Geo is its OWN LLM pass, not merged into the scoring
prompt (durable metadata, re-runnable, keeps the sensitive prompt untouched).
- store_geo replaces places (geo is re-derivable, unlike scores). tag_articles is
idempotent by geo_version, only touches accepted non-duplicate articles.
- CLI `geo` command (cycle-locked, --limit/--reclassify) for backfill, plus a
bounded geo step in the cycle (--geo-limit 60, --no-geo). scripts/geo_audit.py
is the prototype audit tool.
360 tests green; live smoke tagged real articles correctly (Gaza->PS, London->GB,
placeless science->global). No UI / SEO pages yet — ranking/personalization only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The deploy pipeline runs from the working tree, so a wave of shipped features
had never been committed. This snapshots git to what's actually running.
SEO impression recovery (live + verified):
- Duplicate /a/{id} now 301-redirect to their canonical twin instead of 404
(a hard 404 silently dropped already-indexed URLs and tanked impressions).
- Dedup representative selection reworked: accepted/serveable -> established
rep (URL stability) -> quality score, so an accepted page never retires to a
rejected rep and an indexed canonical doesn't churn when a newer twin arrives.
- HEAD /a/{id} returns the same status as GET (api_route GET+HEAD) instead of
falling through to the static mount and 404ing.
- `dedup --force-recluster`: cycle-locked, model-free re-cluster to re-apply the
policy to the existing corpus (shared cycle_lock context manager).
- CLI honors GOODNEWS_DB for its default --db (was silently ignored).
Publishing Desk (admin tool to post highlights to X via Web Intents):
- publishing.py queue/rank/handle-resolution; admin UI; full searchable emoji
picker (bundled data, no CDN) for the blurb editor.
Play games + site:
- Bloom (word-wheel), Memory Match, daily ritual set, Zen Den (dev-gated).
- English-only language gate; source prospecting; paywall + dedup hardening.
Tests: full suite green (349). Ignores tightened (node_modules, data/*.db).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A calm /play space — "after the brief, a small thing to enjoy." Framework-ready
for more games (Word Search next; zen/coloring later).
* Daily Word (5 letters / 6 guesses) + Long Word (6 / 7) — same Wordle mechanic,
Upbeat Bytes flavor (no "Wordle" in the UI). Hopeful answers; after solving, a
one-line "why this word matters."
* LLM proposes, code disposes: answers are picked deterministically by date-seed
from a hand-curated hopeful pool that's pre-validated ⊆ the guess dictionary
(always typeable), avoiding recent repeats; the LLM only adds the optional
"why" (with fallback). daily_puzzles(date, game, variant, payload) stores them
so everyone gets the same daily; the cycle pre-generates with the "why".
* Bundled guess dictionaries (words-5/6.json, ~12.6k/22.4k) for client-side guess
validation — never the LLM. Answer lightly obfuscated (base64) in the payload.
* Private, gentle stats (played/solved/streak, guess distribution); spoiler-free
emoji-grid share. No leaderboard, no timer, no streak-loss drama.
* Play in the bottom nav (replacing Browse, still on the lane rail) + the header.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make "no blurry images" sustainable, not a one-off cleanup. RSS feed thumbnails
(~44% were ~90px) were stored at ingest and upscaled to mush, so new articles
would reintroduce them. Now image_url is filled ONLY by the quality-gated
og:image enrichment:
* insert_article no longer stores the feed image (was canonicalize_url(item...)).
* enrich_recent_images(): the cycle fetches a quality og:image for the newest
accepted, imageless articles each run (bounded), keeping Latest photo-rich.
* Brief + on-open enrichment unchanged.
Net: every stored image is a validated, ≥450px og:image; the rest are clean
placeholders.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tie image enrichment to attention (per review): when an article earns a summary
(i.e. a reader reached it), best-effort fetch a real og:image if it lacks one —
never blanket-fetch every ingested article. Adds:
* enrich_article_image() — single-article fetch, leaves existing images alone,
retries an imageless article only after 7 days, stamps image_checked_at.
* generate_summary() calls it after caching (wrapped; never breaks summaries).
* enrich_summarized_images() + `goodnews enrich-images` CLI — slow background
backfill of already-summarized, accepted, imageless articles.
* Quality gate: extend the generic-image skip list with data:/tracking-pixel/
spacer markers (on top of the existing logo/placeholder + unbranded-BBC logic).
This is coverage only; display (editorial rhythm, tile treatment) comes next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Date fix: introduce GOODNEWS_TZ (goodnews/localtime.py) so the brief's "today"
rolls over in a pinned zone (Eastern) instead of UTC — robust to host-clock
resets. The home page now formats the brief's date in each VISITOR's local
timezone (from its UTC freshness stamp), so nobody ever sees "tomorrow."
* Admin "Content served": articles live, fresh (7d), ingested (24h), summaries,
active sources, today's brief size — queries.content_stats().
* Admin "Source health": per active source, the failure streak, last error,
accepted contribution, and computed next-poll time (so backoff / "resting
until" is visible), via queries.source_health() reusing the feeds backoff
math. Failing sources sort to the top; times render in the viewer's zone.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make summaries the core reading experience (summary-first, source-forward):
- Cycle pre-warms summaries for Today's 7 (idempotent → only new ones hit the LLM).
- /api/brief items carry their cached summary; Today cards (hero + tiles) show it
inline, so Today reads as a calm briefing.
- Card title/image now open the /a summary page (the canonical artifact), with a
visible "Full story" link straight to the source on every card (the escape hatch).
- /a gains related-grouping chips + a Copy-link/share control.
- Tighten the summary prompt: original, factual, no quotations / no close paraphrase.
Long tail stays lazy+cached. No article bodies stored. 129 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Hero blur fix: brief enrichment now prefers a page's og:image even when a
feed thumbnail exists (feed thumbs are often tiny; the hero is shown large).
Verified: BBC hero upgrades to the 1024px share image, ScienceDaily to 1920px.
- Today is now 'Highlights from Today' — hero + 6 (brief size 7), which also
makes the secondary grid a balanced 3+3 instead of an orphaned 3+1.
- Replace now excludes every article seen this session (a client-side seen-set),
so it never cycles back to something already shown.
- New session History panel (this tab only, no account): lists everything seen,
including swapped-away stories, so they stay recoverable. Persistent
history/favorites are tabled for sign-in later.
Tests: og:image upgrade of an existing feed image (86 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The grid stays typographic; the hero is the one intentional visual slot. At
brief-build time we fetch a hero-quality image for the daily five that lack one:
- enrich.py reads ONLY a page's <head> og:image/twitter:image and stores just
the URL (never the body).
- SSRF-guarded: http(s) only, 6s timeout, 300KB cap, <=3 manual redirects each
re-validated, and hosts rejected if any resolved address is private, loopback,
link-local, multicast, reserved, or unspecified.
- image_checked_at column caches success AND failure, so an article is never
retried forever.
- Wired into build-brief and cycle (brief items only, only if image missing and
unchecked). Everything else stays metadata-only.
- Verified live: today's five all carry images (feed + enriched).
Tests: og:image parser, head-only scope, IP guard across internal ranges, and
enrich success + failure-caching (85 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add source health columns (last_success_at, last_error_at, last_error,
consecutive_failures, review_flag, review_reason) via SCHEMA + migration.
- poll_source maintains them: success resets the failure streak and records the
success time; failure increments it and stores the latest error.
- review_sources() flags active sources that are stale, repeatedly failing,
low-acceptance, duplicate-heavy, or doom-skewed (high cortisol/ragebait) over
a recent window. It is purely advisory: it sets review_flag/review_reason and
never changes the active column (human stays in the loop), clearing the flag
when a source recovers.
- CLI review-sources; cycle runs it as a final step (--no-review to skip);
source-report shows a review line for flagged feeds.
- Tests: healthy/failing/stale/low-acceptance/recovery and never-deactivates.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New source_candidates staging table (status suggested/quarantined/rejected/
promoted, preview_json snapshot) so untrusted/suggested feeds stay out of the
real ingestion path until reviewed.
- sources.py: save_candidate (re-preview never revives a curator's rejection),
list_candidates, reject_candidate, promote_candidate (copies into sources,
inactive by default — active on approval; never automatic).
- CLI: suggest-source / list-candidates / promote-candidate / reject-candidate.
- API: read-only GET /api/candidates (writes stay CLI-only — no unauthenticated
public write surface yet).
- Fix deprecated ElementTree truth-value test in _parse_rss.
- Tests: candidate lifecycle (save/list/promote/reject, status preservation,
name derivation) — 51 total.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- feeds.preview_feed(): fetch + score a sample WITHOUT persisting; returns
freshness, acceptance rate, cortisol/ragebait/PR averages, and example
accepted/rejected items. With an LLM client it also returns topic/flavor mix
and the model's (accurate) acceptance view.
- CLI 'preview-source URL [--sample] [--classify]'.
- API 'GET /api/source-preview?url=&sample=&classify=' with an http(s)-only
guard (SSRF note left for go-public hardening).
- Site 'Suggest a source' panel with Quick check (heuristic, instant) and Deep
check (model, accurate), rendered DOM-safely.
- Tests: network-free preview_feed tests via monkeypatched fetch (45 total).
- README documents the command, endpoint, and updated roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add pytest suite (34 tests) covering scoring thresholds, dedup clustering +
representative selection + time window, brief source/category diversity,
avoid-term phrase matching, and text canonicalization/truncation.
- Rewrite _select_diverse with an explicit, tested contract (best-first, one
per source, backfill, then inject a second category by evicting the
lowest-ranked pick).
- classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so
silent model failures are visible in both the cycle and classify output.
- Fix clean_text truncation to stay within max_len (ellipsis no longer
overshoots).
- New filters.py: canonical FilterPrefs shape (include/mute topics+flavors,
avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding
Calm Filters. Not yet wired into the API.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- cycle now prints per-article classify progress (flushed) so the long step is
clearly alive rather than appearing hung.
- An exclusive flock guards the cycle so a manual run and the systemd timer (or
two timer ticks) cannot overlap and contend on the database and model.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- LocalModelClient.embed() calls the OpenAI-compatible /embeddings endpoint
(local nomic model); base_url shared with chat, model via GOODNEWS_EMBED_MODEL.
- New article_embeddings table and articles.duplicate_of column (+ migration).
- dedup module: embeds missing articles, clusters near-identical stories within
a date window by cosine similarity (pure-stdlib, vectors normalised once), and
marks all but the highest-ranked member of each cluster as a duplicate.
- 'dedup' CLI command; cycle now runs poll -> classify -> dedup -> brief.
- Feed and brief queries hide duplicates, so a story carried by multiple
outlets shows once.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Briefs now fill from a rolling window (prefer today, backfill up to
window_days) and exclude anything featured in the last 7 days of briefs, so
slow days still produce five items without stories lingering day to day.
- New 'check-feeds' command fetches and parses every feed to catch dead ones.
- Added 16 validated sources (science, environment, animals, culture),
expanding coverage from 12 to 28 feeds to reduce staleness.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- poll_due_sources(): polls only sources whose last successful poll is older
than their poll_interval_minutes (or never polled), finally giving that
config field meaning.
- classify gains only_unclassified to spend the LLM solely on new (heuristic)
articles, so a frequent scheduled run stays cheap.
- 'cycle' command runs poll-due -> classify-new -> rebuild-today's-brief, with
each step non-fatal so a down model endpoint or empty day never aborts it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- queries.py: shared read-only query helpers (feed, brief, category counts)
returning plain dicts, used by the API and available to the CLI.
- api.py: FastAPI service with Pydantic response models (the companion-app
contract), CORS, and endpoints for categories, feed, brief, and health;
mounts a static site at /.
- static/index.html: minimal dependency-free site rendering the daily five
and topic/flavor category browsing.
- 'goodnews serve' command launches uvicorn (lazy import; core CLI stays
pure-stdlib). Web deps live behind the optional [web] extra.
- Dockerfile + .dockerignore + build-system metadata so the service installs
and deploys cleanly, with the DB mounted as a shared volume.
- README: web/API and deployment docs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New taxonomy module: single source of truth for 6 topics x 5 flavors,
shared by the LLM response schema (enum-constrained) and validation.
- Classifier now assigns one topic + one flavor per article; json_schema
enums force valid values, with coercion as a safety net.
- article_scores gains topic/flavor columns via an idempotent migration.
- New 'list-category' command to browse by topic and/or flavor, ranked by
composite score.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>