Commit Graph

15 Commits

Author SHA1 Message Date
thejayman77 667b1a82c3 brand: standardize "Upbeat Bytes" → "upbeatBytes" everywhere
Per the logo + brand: the name is upbeatBytes (camelCase). Swept all user-facing
strings — titles/og:site_name/og:title, logo alt text, share pages (share.py),
emails (email_send), classifier prompt (llm), digest/unsubscribe (api), PWA
manifest, game share text, sign-in, the SPA shell + patch-static-heads (play
title) — plus README/publish.sh and the email test fixture. (SMTP From env was
already upbeatBytes.) Domains (upbeatbytes.com) unchanged. 425 BE + 36 FE green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 20:01:20 -04:00
thejayman77 89c0fbe1f6 Sync repo to deployed state: SEO recovery, Publishing Desk, Play games, emoji picker
The deploy pipeline runs from the working tree, so a wave of shipped features
had never been committed. This snapshots git to what's actually running.

SEO impression recovery (live + verified):
- Duplicate /a/{id} now 301-redirect to their canonical twin instead of 404
  (a hard 404 silently dropped already-indexed URLs and tanked impressions).
- Dedup representative selection reworked: accepted/serveable -> established
  rep (URL stability) -> quality score, so an accepted page never retires to a
  rejected rep and an indexed canonical doesn't churn when a newer twin arrives.
- HEAD /a/{id} returns the same status as GET (api_route GET+HEAD) instead of
  falling through to the static mount and 404ing.
- `dedup --force-recluster`: cycle-locked, model-free re-cluster to re-apply the
  policy to the existing corpus (shared cycle_lock context manager).
- CLI honors GOODNEWS_DB for its default --db (was silently ignored).

Publishing Desk (admin tool to post highlights to X via Web Intents):
- publishing.py queue/rank/handle-resolution; admin UI; full searchable emoji
  picker (bundled data, no CDN) for the blurb editor.

Play games + site:
- Bloom (word-wheel), Memory Match, daily ritual set, Zen Den (dev-gated).
- English-only language gate; source prospecting; paywall + dedup hardening.

Tests: full suite green (349). Ignores tightened (node_modules, data/*.db).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:32:27 -04:00
thejayman77 9813af40ed Classifier: don't over-score cortisol for abstract/distant science
Codex review: the body-horror boundary was directionally right but a hair too
broad — black-hole/cosmology, lunar-regolith engineering hazards, and a
microplastics measurement-methodology piece were rejected on dramatic vocabulary
alone (cortisol 4–6). Add scoring guidance: score cortisol by the reader's
personal/visceral/public-health threat, not by dramatic words or subject
grandeur. Distant astronomy, equipment hazards, geological forces, scientific
self-correction, natural-history mechanisms, predator–prey biology, and
historical discoveries are LOW cortisol (0–3) even when worded "deadly"/"lethal".
Reserve high cortisol for disease, contamination, outbreak, parasites, violence,
or immediate suffering.

Verified: black hole / moon / microplastics now accept (cortisol 1–2);
parasite (8), Ebola (6), hantavirus outbreak (6) still reject.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:06:18 -04:00
thejayman77 e7610d2889 Classifier: reject body-horror / disease-threat; anxiety outweighs informative
The flesh-eating-parasite story slipped through as "calm public-health
monitoring" — the gate had no body-horror class and let "informative/public
health" rescue a viscerally alarming subject. Two fixes:

* Reject visceral-threat hooks (outbreaks, parasites, infestations,
  contamination, recalls, poisonings, "flesh-eating" infections) even when
  calmly framed as monitoring/surveillance/awareness/public health — judge the
  reader's gut, not the prose. Keep genuine health wins (treatments, recovery,
  prevention, wellbeing): the line is the hook, not the topic.
* A high cortisol_score is disqualifying on its own — anxiety outweighs how
  informative or constructive a piece is.

Verified: 3 flesh-eating-parasite variants now REJECT (cortisol 8) while calm
health/wellness (diabetes treatment, sleep tips, green-space study) still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 11:42:17 -04:00
thejayman77 8653a46fd4 Classifier: explicit "no AI dread" boundary
Tighten the gate's AI handling per review: accept practical/beneficial/creative/
scientific/humane/bounded AI stories; reject AI framed around loss of control,
cognitive decline, job/surveillance/existential panic, child/social-harm panic,
"falling behind" productivity anxiety, or arms-race. Verified: MIT TR now rejects
"lose control of our brains" + "flood of AI lawsuits" (both previously accepted).
2026-06-06 14:07:31 +00:00
thejayman77 a36b1a098e Retune classifier gate: calm/non-anxiety, absorbing-allowed
Shift the acceptance bar from "must be uplifting" to "will a reader finish this
calm or a little better, never worse." Keep neutral-but-absorbing (discoveries,
explainers, clever builds, useful insight), and reject anxiety-inducing content —
especially the comparison traps (inferior/behind/FOMO/hustle/status). Scores still
back the verdict. Lets us pull from mainstream sources and filter, rather than
relying on niche good-news outlets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 02:03:24 +00:00
thejayman77 ab5caada0b Fix summary LLM call: use raw chat text, not classifier-JSON parsing
client._chat() JSON-parses every response (for the classifier), so the plain-text
summary was rejected ("model did not return JSON") even though the model returned
a perfect summary. Split out _raw_content() and add chat_text() for free-form
output; summaries use it. _chat keeps parsing for classification.
2026-06-03 18:12:20 +00:00
thejayman77 a47a1504c8 Phase B1: multi-tag groupings model (backend)
Three-layer organization: primary topic (one per article, for ranking and
brief balance) + grouping tags (1-4 per article from a controlled vocabulary,
the organic "wandering" axis) + tonal flavor.

- taxonomy: add technology + learning topics; 4 calm tag families
  (Discovery & Wonder, People & Kindness, Solutions & Progress, Mind & Craft)
  defined in code, not the DB; ALLOWED_TAGS union + coerce_tags validation.
- db: article_tags(article_id, tag) join table + tag index.
- llm: tags added to the classifier json_schema (enum-constrained, maxItems 4)
  and system prompt; normalize_scores coerces tags; upsert_article_score
  replaces a row's tags atomically on every (re)classification.
- queries: feed gains a tag filter and exposes tags via group_concat; tag_counts.
- api: Article.tags, feed tag param, and /api/families with per-tag counts.
- tests: coerce/normalize/upsert/tag-filter/reclassify-replace/tag_counts +
  /api/families. 99 passing.

Corpus reclassify (re-tag + new primary topics) runs separately against the
local LLM. Frontend (B2) pairs with this; the live site is unchanged until then.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 18:35:25 +00:00
thejayman77 9cdcda5e02 Durability pass: tests, clearer diversity/classify behavior, Calm Filters foundation
- Add pytest suite (34 tests) covering scoring thresholds, dedup clustering +
  representative selection + time window, brief source/category diversity,
  avoid-term phrase matching, and text canonicalization/truncation.
- Rewrite _select_diverse with an explicit, tested contract (best-first, one
  per source, backfill, then inject a second category by evicting the
  lowest-ranked pick).
- classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so
  silent model failures are visible in both the cycle and classify output.
- Fix clean_text truncation to stay within max_len (ellipsis no longer
  overshoots).
- New filters.py: canonical FilterPrefs shape (include/mute topics+flavors,
  avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding
  Calm Filters. Not yet wired into the API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:07:31 +00:00
thejayman77 470e9ecbf8 Make cycle show classify progress and prevent overlapping runs
- cycle now prints per-article classify progress (flushed) so the long step is
  clearly alive rather than appearing hung.
- An exclusive flock guards the cycle so a manual run and the systemd timer (or
  two timer ticks) cannot overlap and contend on the database and model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 16:15:03 +00:00
thejayman77 5d44072fca Add semantic cross-source dedup via local embeddings
- LocalModelClient.embed() calls the OpenAI-compatible /embeddings endpoint
  (local nomic model); base_url shared with chat, model via GOODNEWS_EMBED_MODEL.
- New article_embeddings table and articles.duplicate_of column (+ migration).
- dedup module: embeds missing articles, clusters near-identical stories within
  a date window by cosine similarity (pure-stdlib, vectors normalised once), and
  marks all but the highest-ranked member of each cluster as a duplicate.
- 'dedup' CLI command; cycle now runs poll -> classify -> dedup -> brief.
- Feed and brief queries hide duplicates, so a story carried by multiple
  outlets shows once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:40:55 +00:00
thejayman77 2414fd3ccb Add interval-aware polling and a 'cycle' command for scheduling
- poll_due_sources(): polls only sources whose last successful poll is older
  than their poll_interval_minutes (or never polled), finally giving that
  config field meaning.
- classify gains only_unclassified to spend the LLM solely on new (heuristic)
  articles, so a frequent scheduled run stays cheap.
- 'cycle' command runs poll-due -> classify-new -> rebuild-today's-brief, with
  each step non-fatal so a down model endpoint or empty day never aborts it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 14:13:00 +00:00
thejayman77 38057d0354 Add topic/flavor categorization and category browsing
- New taxonomy module: single source of truth for 6 topics x 5 flavors,
  shared by the LLM response schema (enum-constrained) and validation.
- Classifier now assigns one topic + one flavor per article; json_schema
  enums force valid values, with coercion as a safety net.
- article_scores gains topic/flavor columns via an idempotent migration.
- New 'list-category' command to browse by topic and/or flavor, ranked by
  composite score.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 11:21:53 +00:00
thejayman77 f4842ed100 Fix LLM classify for newer OpenAI-compatible servers
- Use json_schema structured output (newer LM Studio rejects json_object),
  escalating through json_schema -> json_object -> text and pinning the
  first format the server accepts to avoid wasted round-trips.
- Make per-article failures non-fatal and commit incrementally so a single
  timeout no longer discards the whole batch.
- Raise default timeout to 180s (configurable via GOODNEWS_LLM_TIMEOUT) for
  larger local reasoning models.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 01:21:05 +00:00
thejayman77 068073423f Initial commit: goodNews constructive-news ingestion prototype
Local-first RSS/Atom ingestion pipeline with metadata-only storage,
heuristic + local-LLM scoring, and daily brief builder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 00:48:26 +00:00