upbeatBytes

Author	SHA1	Message	Date
thejayman77	1c05554a28	Geo Stage 1-2: subject-geography model + classifier + pipeline wiring "Closer to Home" foundation (audit greenlit by Codex). Durable geography, kept decoupled from volatile scoring. - Schema: article_geo (breadth/confidence/rationale/geo_version) + article_places (0..N ISO-coded places), separate from article_scores so re-runs/audits never disturb scoring or acceptance. "local" is never stored — it's relative to the reader; the UI computes "Near you" later. - geo.py: LLM proposes place NAMES, code disposes to ISO codes (country alpha-2, US state 2-letter); region words like "Europe" can never become a country. 'global'/placeless is first-class, not failure. Confidence calibrated so 'high' needs an explicit location. Geo is its OWN LLM pass, not merged into the scoring prompt (durable metadata, re-runnable, keeps the sensitive prompt untouched). - store_geo replaces places (geo is re-derivable, unlike scores). tag_articles is idempotent by geo_version, only touches accepted non-duplicate articles. - CLI `geo` command (cycle-locked, --limit/--reclassify) for backfill, plus a bounded geo step in the cycle (--geo-limit 60, --no-geo). scripts/geo_audit.py is the prototype audit tool. 360 tests green; live smoke tagged real articles correctly (Gaza->PS, London->GB, placeless science->global). No UI / SEO pages yet — ranking/personalization only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-19 16:56:49 -04:00
thejayman77	89c0fbe1f6	Sync repo to deployed state: SEO recovery, Publishing Desk, Play games, emoji picker The deploy pipeline runs from the working tree, so a wave of shipped features had never been committed. This snapshots git to what's actually running. SEO impression recovery (live + verified): - Duplicate /a/{id} now 301-redirect to their canonical twin instead of 404 (a hard 404 silently dropped already-indexed URLs and tanked impressions). - Dedup representative selection reworked: accepted/serveable -> established rep (URL stability) -> quality score, so an accepted page never retires to a rejected rep and an indexed canonical doesn't churn when a newer twin arrives. - HEAD /a/{id} returns the same status as GET (api_route GET+HEAD) instead of falling through to the static mount and 404ing. - `dedup --force-recluster`: cycle-locked, model-free re-cluster to re-apply the policy to the existing corpus (shared cycle_lock context manager). - CLI honors GOODNEWS_DB for its default --db (was silently ignored). Publishing Desk (admin tool to post highlights to X via Web Intents): - publishing.py queue/rank/handle-resolution; admin UI; full searchable emoji picker (bundled data, no CDN) for the blurb editor. Play games + site: - Bloom (word-wheel), Memory Match, daily ritual set, Zen Den (dev-gated). - English-only language gate; source prospecting; paywall + dedup hardening. Tests: full suite green (349). Ignores tightened (node_modules, data/*.db). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-18 11:32:27 -04:00
thejayman77	215a5c4d64	Play hub + Daily Word game (Phase 1 of the games feature) A calm /play space — "after the brief, a small thing to enjoy." Framework-ready for more games (Word Search next; zen/coloring later). * Daily Word (5 letters / 6 guesses) + Long Word (6 / 7) — same Wordle mechanic, Upbeat Bytes flavor (no "Wordle" in the UI). Hopeful answers; after solving, a one-line "why this word matters." * LLM proposes, code disposes: answers are picked deterministically by date-seed from a hand-curated hopeful pool that's pre-validated ⊆ the guess dictionary (always typeable), avoiding recent repeats; the LLM only adds the optional "why" (with fallback). daily_puzzles(date, game, variant, payload) stores them so everyone gets the same daily; the cycle pre-generates with the "why". * Bundled guess dictionaries (words-5/6.json, ~12.6k/22.4k) for client-side guess validation — never the LLM. Answer lightly obfuscated (base64) in the payload. * Private, gentle stats (played/solved/streak, guess distribution); spoiler-free emoji-grid share. No leaderboard, no timer, no streak-loss drama. * Play in the bottom nav (replacing Browse, still on the lane rail) + the header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 16:06:20 -04:00
thejayman77	cf5cbb33c0	Daily digest (opt-in) + finite "you're caught up" ending Reader-retention as ritual, not capture (Codex's framing). Opt-in calm morning email of today's brief; the on-site twin is the finite end-of-feed nudge. * Schema: users.digest_enabled + digest_unsub_token; digest_sends (dedupe + visibility). auth.get_user now returns the digest fields. * goodnews/digest.py: build (dated calm subject, items w/ summary + "why it's here" + UB/source links + one-click unsubscribe, "you're caught up" sign-off) and send_due_digests (morning-window gated, >=4-item floor or skip quietly, deduped, reuses SMTP). No streaks/urgency/"you missed". * API: /auth/me exposes digest_enabled; POST /api/account/digest toggle; GET /api/digest/unsubscribe (token, no login, calm confirmation page). * CLI: cycle gains a morning-gated digest step (--no-digest) + a send-digests command (--force). * Frontend: digest toggle on the Account profile; the Highlights end-cap now says "you're caught up — see you tomorrow" with a one-tap "Get tomorrow's brief by email" (signed-in → enable; anon → sign in). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 16:17:46 -04:00
thejayman77	50dc2167cd	Durable image quality: stop trusting feed thumbnails; cycle enriches Latest Make "no blurry images" sustainable, not a one-off cleanup. RSS feed thumbnails (~44% were ~90px) were stored at ingest and upscaled to mush, so new articles would reintroduce them. Now image_url is filled ONLY by the quality-gated og:image enrichment: * insert_article no longer stores the feed image (was canonicalize_url(item...)). * enrich_recent_images(): the cycle fetches a quality og:image for the newest accepted, imageless articles each run (bounded), keeping Latest photo-rich. * Brief + on-open enrichment unchanged. Net: every stored image is a validated, ≥450px og:image; the rest are clean placeholders. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 15:55:57 -04:00
thejayman77	403749e26f	Images phase 1: attention-triggered og:image coverage Tie image enrichment to attention (per review): when an article earns a summary (i.e. a reader reached it), best-effort fetch a real og:image if it lacks one — never blanket-fetch every ingested article. Adds: * enrich_article_image() — single-article fetch, leaves existing images alone, retries an imageless article only after 7 days, stamps image_checked_at. * generate_summary() calls it after caching (wrapped; never breaks summaries). * enrich_summarized_images() + `goodnews enrich-images` CLI — slow background backfill of already-summarized, accepted, imageless articles. * Quality gate: extend the generic-image skip list with data:/tracking-pixel/ spacer markers (on top of the existing logo/placeholder + unbranded-BBC logic). This is coverage only; display (editorial rhythm, tile treatment) comes next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 12:30:11 -04:00
thejayman77	d87347b032	Dashboard: content + source-health; per-viewer local dates * Date fix: introduce GOODNEWS_TZ (goodnews/localtime.py) so the brief's "today" rolls over in a pinned zone (Eastern) instead of UTC — robust to host-clock resets. The home page now formats the brief's date in each VISITOR's local timezone (from its UTC freshness stamp), so nobody ever sees "tomorrow." * Admin "Content served": articles live, fresh (7d), ingested (24h), summaries, active sources, today's brief size — queries.content_stats(). * Admin "Source health": per active source, the failure streak, last error, accepted contribution, and computed next-poll time (so backoff / "resting until" is visible), via queries.source_health() reusing the feeds backoff math. Failing sources sort to the top; times render in the viewer's zone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 19:34:22 +00:00
thejayman77	cfde4e22db	Summary briefing layer: Today pre-summarized, /a is the canonical read Make summaries the core reading experience (summary-first, source-forward): - Cycle pre-warms summaries for Today's 7 (idempotent → only new ones hit the LLM). - /api/brief items carry their cached summary; Today cards (hero + tiles) show it inline, so Today reads as a calm briefing. - Card title/image now open the /a summary page (the canonical artifact), with a visible "Full story" link straight to the source on every card (the escape hatch). - /a gains related-grouping chips + a Copy-link/share control. - Tighten the summary prompt: original, factual, no quotations / no close paraphrase. Long tail stays lazy+cached. No article bodies stored. 129 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 19:48:32 +00:00
thejayman77	d8d665ee35	Crisp hero (prefer og:image), 7-card Highlights, no-recycle Replace + session History - Hero blur fix: brief enrichment now prefers a page's og:image even when a feed thumbnail exists (feed thumbs are often tiny; the hero is shown large). Verified: BBC hero upgrades to the 1024px share image, ScienceDaily to 1920px. - Today is now 'Highlights from Today' — hero + 6 (brief size 7), which also makes the secondary grid a balanced 3+3 instead of an orphaned 3+1. - Replace now excludes every article seen this session (a client-side seen-set), so it never cycles back to something already shown. - New session History panel (this tab only, no account): lists everything seen, including swapped-away stories, so they stay recoverable. Persistent history/favorites are tabled for sign-in later. Tests: og:image upgrade of an existing feed image (86 total). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:56:57 +00:00
thejayman77	9e8eddf46d	Bounded hero-image enrichment (og:image for brief items only) The grid stays typographic; the hero is the one intentional visual slot. At brief-build time we fetch a hero-quality image for the daily five that lack one: - enrich.py reads ONLY a page's <head> og:image/twitter:image and stores just the URL (never the body). - SSRF-guarded: http(s) only, 6s timeout, 300KB cap, <=3 manual redirects each re-validated, and hosts rejected if any resolved address is private, loopback, link-local, multicast, reserved, or unspecified. - image_checked_at column caches success AND failure, so an article is never retried forever. - Wired into build-brief and cycle (brief items only, only if image missing and unchecked). Everything else stays metadata-only. - Verified live: today's five all carry images (feed + enriched). Tests: og:image parser, head-only scope, IP guard across internal ranges, and enrich success + failure-caching (85 total). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 12:37:41 +00:00
thejayman77	1e190c5e88	Advisory source health: review flags, never auto-deactivate - Add source health columns (last_success_at, last_error_at, last_error, consecutive_failures, review_flag, review_reason) via SCHEMA + migration. - poll_source maintains them: success resets the failure streak and records the success time; failure increments it and stores the latest error. - review_sources() flags active sources that are stale, repeatedly failing, low-acceptance, duplicate-heavy, or doom-skewed (high cortisol/ragebait) over a recent window. It is purely advisory: it sets review_flag/review_reason and never changes the active column (human stays in the loop), clearing the flag when a source recovers. - CLI review-sources; cycle runs it as a final step (--no-review to skip); source-report shows a review line for flagged feeds. - Tests: healthy/failing/stale/low-acceptance/recovery and never-deactivates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 20:28:35 +00:00
thejayman77	aa4125ddec	Supervised source candidates: stage, list, promote, reject - New source_candidates staging table (status suggested/quarantined/rejected/ promoted, preview_json snapshot) so untrusted/suggested feeds stay out of the real ingestion path until reviewed. - sources.py: save_candidate (re-preview never revives a curator's rejection), list_candidates, reject_candidate, promote_candidate (copies into sources, inactive by default — active on approval; never automatic). - CLI: suggest-source / list-candidates / promote-candidate / reject-candidate. - API: read-only GET /api/candidates (writes stay CLI-only — no unauthenticated public write surface yet). - Fix deprecated ElementTree truth-value test in _parse_rss. - Tests: candidate lifecycle (save/list/promote/reject, status preservation, name derivation) — 51 total. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:52:40 +00:00
thejayman77	95195daff8	Track 3: read-only source preview (vet a feed before adding) - feeds.preview_feed(): fetch + score a sample WITHOUT persisting; returns freshness, acceptance rate, cortisol/ragebait/PR averages, and example accepted/rejected items. With an LLM client it also returns topic/flavor mix and the model's (accurate) acceptance view. - CLI 'preview-source URL [--sample] [--classify]'. - API 'GET /api/source-preview?url=&sample=&classify=' with an http(s)-only guard (SSRF note left for go-public hardening). - Site 'Suggest a source' panel with Quick check (heuristic, instant) and Deep check (model, accurate), rendered DOM-safely. - Tests: network-free preview_feed tests via monkeypatched fetch (45 total). - README documents the command, endpoint, and updated roadmap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:37:34 +00:00
thejayman77	9cdcda5e02	Durability pass: tests, clearer diversity/classify behavior, Calm Filters foundation - Add pytest suite (34 tests) covering scoring thresholds, dedup clustering + representative selection + time window, brief source/category diversity, avoid-term phrase matching, and text canonicalization/truncation. - Rewrite _select_diverse with an explicit, tested contract (best-first, one per source, backfill, then inject a second category by evicting the lowest-ranked pick). - classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so silent model failures are visible in both the cycle and classify output. - Fix clean_text truncation to stay within max_len (ellipsis no longer overshoots). - New filters.py: canonical FilterPrefs shape (include/mute topics+flavors, avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding Calm Filters. Not yet wired into the API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 19:07:31 +00:00
thejayman77	470e9ecbf8	Make cycle show classify progress and prevent overlapping runs - cycle now prints per-article classify progress (flushed) so the long step is clearly alive rather than appearing hung. - An exclusive flock guards the cycle so a manual run and the systemd timer (or two timer ticks) cannot overlap and contend on the database and model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 16:15:03 +00:00
thejayman77	5d44072fca	Add semantic cross-source dedup via local embeddings - LocalModelClient.embed() calls the OpenAI-compatible /embeddings endpoint (local nomic model); base_url shared with chat, model via GOODNEWS_EMBED_MODEL. - New article_embeddings table and articles.duplicate_of column (+ migration). - dedup module: embeds missing articles, clusters near-identical stories within a date window by cosine similarity (pure-stdlib, vectors normalised once), and marks all but the highest-ranked member of each cluster as a duplicate. - 'dedup' CLI command; cycle now runs poll -> classify -> dedup -> brief. - Feed and brief queries hide duplicates, so a story carried by multiple outlets shows once. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 15:40:55 +00:00
thejayman77	2a9c49e2a9	Sparse-day-proof briefs, feed health check, and 16 new sources - Briefs now fill from a rolling window (prefer today, backfill up to window_days) and exclude anything featured in the last 7 days of briefs, so slow days still produce five items without stories lingering day to day. - New 'check-feeds' command fetches and parses every feed to catch dead ones. - Added 16 validated sources (science, environment, animals, culture), expanding coverage from 12 to 28 feeds to reduce staleness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 15:30:03 +00:00
thejayman77	2414fd3ccb	Add interval-aware polling and a 'cycle' command for scheduling - poll_due_sources(): polls only sources whose last successful poll is older than their poll_interval_minutes (or never polled), finally giving that config field meaning. - classify gains only_unclassified to spend the LLM solely on new (heuristic) articles, so a frequent scheduled run stays cheap. - 'cycle' command runs poll-due -> classify-new -> rebuild-today's-brief, with each step non-fatal so a down model endpoint or empty day never aborts it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 14:13:00 +00:00
thejayman77	2f4bdf2d00	Add FastAPI web/API layer and static site - queries.py: shared read-only query helpers (feed, brief, category counts) returning plain dicts, used by the API and available to the CLI. - api.py: FastAPI service with Pydantic response models (the companion-app contract), CORS, and endpoints for categories, feed, brief, and health; mounts a static site at /. - static/index.html: minimal dependency-free site rendering the daily five and topic/flavor category browsing. - 'goodnews serve' command launches uvicorn (lazy import; core CLI stays pure-stdlib). Web deps live behind the optional [web] extra. - Dockerfile + .dockerignore + build-system metadata so the service installs and deploys cleanly, with the DB mounted as a shared volume. - README: web/API and deployment docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 13:51:07 +00:00
thejayman77	38057d0354	Add topic/flavor categorization and category browsing - New taxonomy module: single source of truth for 6 topics x 5 flavors, shared by the LLM response schema (enum-constrained) and validation. - Classifier now assigns one topic + one flavor per article; json_schema enums force valid values, with coercion as a safety net. - article_scores gains topic/flavor columns via an idempotent migration. - New 'list-category' command to browse by topic and/or flavor, ranked by composite score. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 11:21:53 +00:00
thejayman77	068073423f	Initial commit: goodNews constructive-news ingestion prototype Local-first RSS/Atom ingestion pipeline with metadata-only storage, heuristic + local-LLM scoring, and daily brief builder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 00:48:26 +00:00

21 Commits