Files
upbeatBytes/README.md
T
thejayman77 5601022cf7 Build the SvelteKit frontend: calm home with mood modes
- New frontend/ SvelteKit static SPA (Svelte 5), served by FastAPI from
  frontend/build (falls back to the legacy page if unbuilt).
- Calm design system: cream/sage palette, serif headlines, generous space,
  no urgency colors, gentle motion (respects prefers-reduced-motion).
- Home screen: mood-mode nav (Today/Wonder/People Helping/Solutions/Light
  Only/Grounded), the daily brief as a hero + remaining four, browsable mood
  lanes, an explicit calm end-state, inline Not today / Less like this / Hide
  affordances, and device-local Calm Filters mirroring goodnews/filters.py.
- Backend: moods.py + GET /api/moods (single source of truth for the modes);
  FilterPrefs gains max_cortisol/max_ragebait ceilings (for Light Only).
- Push categorical filters (include/mute topics+flavors, ceilings) into SQL in
  queries.feed so low-ranked-but-matching items (e.g. discovery for Wonder)
  are not truncated by ranking; only avoid-terms stay a Python pass.
- PWA manifest + icon (installable; offline deferred per plan).
- Multi-stage Dockerfile builds the site then serves it from the API.
- Tests: queries.feed categorical filters (63 total). README updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 22:27:46 +00:00

247 lines
9.9 KiB
Markdown

# goodNews
Local-first constructive news ingestion prototype.
The first milestone is intentionally small: collect public RSS/Atom metadata, dedupe it, store short source-provided snippets, and attach early reason-coded heuristic scores. It does not store full article bodies.
## Commands
From this directory:
```bash
python3 -m goodnews init-db
python3 -m goodnews import-sources
python3 -m goodnews poll --limit 3
python3 -m goodnews rescore
python3 -m goodnews check-llm --base-url http://127.0.0.1:1234/v1 --model gpt-oss
python3 -m goodnews classify --limit 10 --base-url http://127.0.0.1:1234/v1 --model gpt-oss
python3 -m goodnews dedup --base-url http://127.0.0.1:1234/v1
python3 -m goodnews check-feeds
python3 -m goodnews preview-source https://example.com/feed/ --classify
python3 -m goodnews suggest-source https://example.com/feed/ --name "Example" --classify
python3 -m goodnews list-candidates
python3 -m goodnews promote-candidate 1 # copies into sources (inactive by default)
python3 -m goodnews reject-candidate 1
python3 -m goodnews review-sources # advisory health flags (never deactivates)
python3 -m goodnews build-brief --date 2026-05-27 --replace
python3 -m goodnews show-brief
python3 -m goodnews list-recent --limit 10
python3 -m goodnews list-recent --accepted-only --limit 10
python3 -m goodnews list-category --topic animals --flavor discovery
python3 -m goodnews list-category --topic environment --flavor solution
python3 -m goodnews source-report
python3 -m goodnews list-runs
```
The SQLite database lives at:
```txt
data/goodnews.sqlite3
```
Sources live at:
```txt
config/sources.toml
```
## Categories
When classified by the local model, each article is tagged with one **topic**
and one **flavor**, allowing browsable category feeds (e.g. "feel-good animals",
"environment solutions") via `list-category`:
- **Topics:** science, environment, health, community, culture, animals
- **Flavors:** breakthrough, discovery, solution, feelgood, perspective
The allowed values live in `goodnews/taxonomy.py`. The accept/reject gate is kept
deliberately broad ("not dreary"); ranking and category filters do the curation.
## Deduplication
Two layers:
- **Exact**: a URL hash UNIQUE constraint drops the literal same link at ingest.
- **Semantic**: `dedup` embeds each article's title+snippet with the local
embedding model, clusters near-identical stories within a few-day window
(cosine similarity), and marks all but the highest-ranked in each cluster as
`duplicate_of` the representative. Feed and brief queries hide duplicates, so
the same story carried by several outlets appears once. This runs as part of
`cycle`, so the scheduler keeps the corpus deduped automatically.
## Stored Article Data
For each article, the database stores:
- source
- canonical URL
- title
- short RSS/Atom description or summary
- author, if present
- published timestamp, if present
- image URL, if present
- language, if present
- hashes used for dedupe
- heuristic scores and reason codes
## Web / API
The optional `web` extra adds a FastAPI service and a small static site that
consumes it. The same JSON API backs both the website and any future companion
app; its auto-generated OpenAPI docs at `/docs` are the shared contract.
```bash
pip install -e '.[web]' # or: .venv/bin/pip install -e '.[web]'
python3 -m goodnews serve # http://127.0.0.1:8000
python3 -m goodnews serve --host 0.0.0.0 # expose on the network
```
Endpoints:
- `GET /` — the static site (daily five + topic/flavor browsing)
- `GET /healthz` — liveness + scored-article count
- `GET /api/categories` — the topic/flavor taxonomy
- `GET /api/moods` — mood modes (the humane front door: Today, Wonder, People Helping, Solutions, Light Only, Grounded)
- `GET /api/category-counts` — article counts per topic/flavor
- `GET /api/feed?topic=&flavor=&limit=&offset=` — ranked, filtered articles
- `GET /api/brief?date=&limit=` — a daily brief (latest if no date)
- `GET /api/brief-dates` — available brief dates
- `GET /api/source-preview?url=&classify=` — read-only scored sample of a feed (vet before adding)
- `GET /api/candidates?status=` — staged source candidates (read-only; curation is CLI-only for now)
- `GET /docs` — interactive OpenAPI documentation
The ingestion CLI stays pure-stdlib; only the `web` extra pulls in FastAPI/uvicorn,
so the two halves can be deployed and upgraded independently.
### Frontend
The site is a SvelteKit static SPA in `frontend/` (calm editorial design, mood-mode
navigation, the daily brief as a hero, browsable lanes, inline Calm Filters, PWA
manifest). It consumes the JSON API above, so the website and a future companion
app share one contract. Build it once and FastAPI serves the output:
```bash
cd frontend && npm install && npm run build # -> frontend/build
cd .. && python3 -m goodnews serve # serves frontend/build at /
```
If `frontend/build` is absent, the server falls back to the legacy single-page
harness in `goodnews/static/`. The Docker image builds the frontend automatically
(multi-stage), so deployment is just `docker build`.
## Calm Filters
Personal, device-local controls so a reader can stay informed without subjects
they'd rather not see right now. Preferences live in the browser (localStorage),
are sent to the read endpoints as a `prefs` JSON query param, and are applied
identically to the feed, the brief, and the category counts so the numbers always
match what's shown. The canonical shape (`goodnews/filters.py`):
```json
{
"include_topics": [], "include_flavors": [],
"mute_topics": [], "mute_flavors": [],
"avoid_terms": ["election", "stock market"],
"pauses": [{"kind": "topic", "value": "health", "until": "2026-06-02T00:00:00Z"}]
}
```
The site surfaces a humane ladder rather than a settings panel of dread:
- **Not today** → pause that article's topic for 24h.
- **Less like this** → ease off that flavor for ~3 days.
- **Always hide …** → a standing mute (undoable in the Calm filters panel).
Avoid-terms match whole words/phrases (case- and punctuation-insensitive, no
substring surprises like "pan" matching "pandemic"). The brief is filtered *down*
for MVP (no refill from outside the stored brief). No accounts; the same `prefs`
object is the clean migration path to server-side, multi-user preferences later.
## Deployment
The database is never baked into the image — the API and the ingestion CLI share
one SQLite file via a mounted volume. Run ingestion (`poll`, `classify`,
`build-brief`) on a schedule against the same file.
```bash
docker build -t goodnews .
docker run -p 8000:8000 -v /srv/goodnews/data:/data goodnews
```
`GOODNEWS_DB` controls the database path (defaults to `data/goodnews.sqlite3`).
Put a reverse proxy (Caddy/nginx) in front for TLS once a domain is attached.
## Scheduling
A single idempotent command runs the whole pipeline and is safe to invoke as
often as you like — it only polls sources that are *due* (per each source's
`poll_interval_minutes`), only classifies articles the model hasn't seen, and
rebuilds the current day's brief:
```bash
python3 -m goodnews cycle # poll due -> classify -> dedup -> brief -> review flags
python3 -m goodnews cycle --force # poll every active source regardless of interval
python3 -m goodnews cycle --no-classify # skip the LLM step (e.g. model box offline)
```
A systemd timer runs it every 15 minutes. Unit files live in `deploy/`:
```bash
sudo install -d /etc/goodnews
sudo install -m 644 deploy/goodnews.env.example /etc/goodnews/goodnews.env # then edit
sudo install -m 644 deploy/goodnews.service deploy/goodnews.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now goodnews.timer
systemctl list-timers goodnews.timer # when it next runs
journalctl -u goodnews.service -f # watch cycle output
```
`/etc/goodnews/goodnews.env` supplies `GOODNEWS_LLM_BASE_URL`, `GOODNEWS_LLM_MODEL`,
and `GOODNEWS_DB` to the scheduled run. The timer uses `Persistent=true`, so a
run missed while the machine was off is caught up on the next boot.
## Next Steps
Done so far: RSS/Atom ingestion with exact + semantic dedup, heuristic + local-LLM
classification with topic/flavor tagging, the daily brief, the FastAPI web/API layer
and site, scheduled `cycle` via systemd, a pytest suite, and device-local Calm Filters.
Still ahead:
1. **Supervised source pipeline** — preview + staging are done: `suggest-source`
previews a feed and stages it in the `source_candidates` table (status
suggested/quarantined/rejected/promoted); `promote-candidate` copies it into
`sources` (inactive by default — active on approval); promotion is never
automatic. Advisory health is done too: `review-sources` (also run at the end
of `cycle`) flags stale, failing, low-acceptance, duplicate-heavy, or
doom-skewed feeds for human review — it never deactivates anything. Still
ahead: an authenticated POST surface so the website can accept public
suggestions once accounts exist.
2. **Learned "Less like this" weighting** — replace the interim flavor-pause with
real preference down-ranking.
3. **Corpus rebalancing** — add calm/feelgood sources (currently science-heavy).
4. **Retention/pruning** — soft-delete + time-window indexes as the corpus grows
toward ~10k articles (don't rush; not yet needed).
5. **Go-public hardening** — TLS via a reverse proxy, then a domain.
## Local Model Configuration
The `classify` command expects an OpenAI-compatible local chat-completions server.
You can pass settings directly:
```bash
python3 -m goodnews classify --base-url http://127.0.0.1:1234/v1 --model gpt-oss --limit 10
```
Or use environment variables:
```bash
export GOODNEWS_LLM_BASE_URL=http://127.0.0.1:1234/v1
export GOODNEWS_LLM_MODEL=gpt-oss
python3 -m goodnews classify --limit 10
```
`classify` rewrites the current score/reason row for selected candidates. `rescore` can restore the fast heuristic scores.