5601022cf7
- New frontend/ SvelteKit static SPA (Svelte 5), served by FastAPI from frontend/build (falls back to the legacy page if unbuilt). - Calm design system: cream/sage palette, serif headlines, generous space, no urgency colors, gentle motion (respects prefers-reduced-motion). - Home screen: mood-mode nav (Today/Wonder/People Helping/Solutions/Light Only/Grounded), the daily brief as a hero + remaining four, browsable mood lanes, an explicit calm end-state, inline Not today / Less like this / Hide affordances, and device-local Calm Filters mirroring goodnews/filters.py. - Backend: moods.py + GET /api/moods (single source of truth for the modes); FilterPrefs gains max_cortisol/max_ragebait ceilings (for Light Only). - Push categorical filters (include/mute topics+flavors, ceilings) into SQL in queries.feed so low-ranked-but-matching items (e.g. discovery for Wonder) are not truncated by ranking; only avoid-terms stay a Python pass. - PWA manifest + icon (installable; offline deferred per plan). - Multi-stage Dockerfile builds the site then serves it from the API. - Tests: queries.feed categorical filters (63 total). README updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
247 lines
9.9 KiB
Markdown
247 lines
9.9 KiB
Markdown
# goodNews
|
|
|
|
Local-first constructive news ingestion prototype.
|
|
|
|
The first milestone is intentionally small: collect public RSS/Atom metadata, dedupe it, store short source-provided snippets, and attach early reason-coded heuristic scores. It does not store full article bodies.
|
|
|
|
## Commands
|
|
|
|
From this directory:
|
|
|
|
```bash
|
|
python3 -m goodnews init-db
|
|
python3 -m goodnews import-sources
|
|
python3 -m goodnews poll --limit 3
|
|
python3 -m goodnews rescore
|
|
python3 -m goodnews check-llm --base-url http://127.0.0.1:1234/v1 --model gpt-oss
|
|
python3 -m goodnews classify --limit 10 --base-url http://127.0.0.1:1234/v1 --model gpt-oss
|
|
python3 -m goodnews dedup --base-url http://127.0.0.1:1234/v1
|
|
python3 -m goodnews check-feeds
|
|
python3 -m goodnews preview-source https://example.com/feed/ --classify
|
|
python3 -m goodnews suggest-source https://example.com/feed/ --name "Example" --classify
|
|
python3 -m goodnews list-candidates
|
|
python3 -m goodnews promote-candidate 1 # copies into sources (inactive by default)
|
|
python3 -m goodnews reject-candidate 1
|
|
python3 -m goodnews review-sources # advisory health flags (never deactivates)
|
|
python3 -m goodnews build-brief --date 2026-05-27 --replace
|
|
python3 -m goodnews show-brief
|
|
python3 -m goodnews list-recent --limit 10
|
|
python3 -m goodnews list-recent --accepted-only --limit 10
|
|
python3 -m goodnews list-category --topic animals --flavor discovery
|
|
python3 -m goodnews list-category --topic environment --flavor solution
|
|
python3 -m goodnews source-report
|
|
python3 -m goodnews list-runs
|
|
```
|
|
|
|
The SQLite database lives at:
|
|
|
|
```txt
|
|
data/goodnews.sqlite3
|
|
```
|
|
|
|
Sources live at:
|
|
|
|
```txt
|
|
config/sources.toml
|
|
```
|
|
|
|
## Categories
|
|
|
|
When classified by the local model, each article is tagged with one **topic**
|
|
and one **flavor**, allowing browsable category feeds (e.g. "feel-good animals",
|
|
"environment solutions") via `list-category`:
|
|
|
|
- **Topics:** science, environment, health, community, culture, animals
|
|
- **Flavors:** breakthrough, discovery, solution, feelgood, perspective
|
|
|
|
The allowed values live in `goodnews/taxonomy.py`. The accept/reject gate is kept
|
|
deliberately broad ("not dreary"); ranking and category filters do the curation.
|
|
|
|
## Deduplication
|
|
|
|
Two layers:
|
|
|
|
- **Exact**: a URL hash UNIQUE constraint drops the literal same link at ingest.
|
|
- **Semantic**: `dedup` embeds each article's title+snippet with the local
|
|
embedding model, clusters near-identical stories within a few-day window
|
|
(cosine similarity), and marks all but the highest-ranked in each cluster as
|
|
`duplicate_of` the representative. Feed and brief queries hide duplicates, so
|
|
the same story carried by several outlets appears once. This runs as part of
|
|
`cycle`, so the scheduler keeps the corpus deduped automatically.
|
|
|
|
## Stored Article Data
|
|
|
|
For each article, the database stores:
|
|
|
|
- source
|
|
- canonical URL
|
|
- title
|
|
- short RSS/Atom description or summary
|
|
- author, if present
|
|
- published timestamp, if present
|
|
- image URL, if present
|
|
- language, if present
|
|
- hashes used for dedupe
|
|
- heuristic scores and reason codes
|
|
|
|
## Web / API
|
|
|
|
The optional `web` extra adds a FastAPI service and a small static site that
|
|
consumes it. The same JSON API backs both the website and any future companion
|
|
app; its auto-generated OpenAPI docs at `/docs` are the shared contract.
|
|
|
|
```bash
|
|
pip install -e '.[web]' # or: .venv/bin/pip install -e '.[web]'
|
|
python3 -m goodnews serve # http://127.0.0.1:8000
|
|
python3 -m goodnews serve --host 0.0.0.0 # expose on the network
|
|
```
|
|
|
|
Endpoints:
|
|
|
|
- `GET /` — the static site (daily five + topic/flavor browsing)
|
|
- `GET /healthz` — liveness + scored-article count
|
|
- `GET /api/categories` — the topic/flavor taxonomy
|
|
- `GET /api/moods` — mood modes (the humane front door: Today, Wonder, People Helping, Solutions, Light Only, Grounded)
|
|
- `GET /api/category-counts` — article counts per topic/flavor
|
|
- `GET /api/feed?topic=&flavor=&limit=&offset=` — ranked, filtered articles
|
|
- `GET /api/brief?date=&limit=` — a daily brief (latest if no date)
|
|
- `GET /api/brief-dates` — available brief dates
|
|
- `GET /api/source-preview?url=&classify=` — read-only scored sample of a feed (vet before adding)
|
|
- `GET /api/candidates?status=` — staged source candidates (read-only; curation is CLI-only for now)
|
|
- `GET /docs` — interactive OpenAPI documentation
|
|
|
|
The ingestion CLI stays pure-stdlib; only the `web` extra pulls in FastAPI/uvicorn,
|
|
so the two halves can be deployed and upgraded independently.
|
|
|
|
### Frontend
|
|
|
|
The site is a SvelteKit static SPA in `frontend/` (calm editorial design, mood-mode
|
|
navigation, the daily brief as a hero, browsable lanes, inline Calm Filters, PWA
|
|
manifest). It consumes the JSON API above, so the website and a future companion
|
|
app share one contract. Build it once and FastAPI serves the output:
|
|
|
|
```bash
|
|
cd frontend && npm install && npm run build # -> frontend/build
|
|
cd .. && python3 -m goodnews serve # serves frontend/build at /
|
|
```
|
|
|
|
If `frontend/build` is absent, the server falls back to the legacy single-page
|
|
harness in `goodnews/static/`. The Docker image builds the frontend automatically
|
|
(multi-stage), so deployment is just `docker build`.
|
|
|
|
## Calm Filters
|
|
|
|
Personal, device-local controls so a reader can stay informed without subjects
|
|
they'd rather not see right now. Preferences live in the browser (localStorage),
|
|
are sent to the read endpoints as a `prefs` JSON query param, and are applied
|
|
identically to the feed, the brief, and the category counts so the numbers always
|
|
match what's shown. The canonical shape (`goodnews/filters.py`):
|
|
|
|
```json
|
|
{
|
|
"include_topics": [], "include_flavors": [],
|
|
"mute_topics": [], "mute_flavors": [],
|
|
"avoid_terms": ["election", "stock market"],
|
|
"pauses": [{"kind": "topic", "value": "health", "until": "2026-06-02T00:00:00Z"}]
|
|
}
|
|
```
|
|
|
|
The site surfaces a humane ladder rather than a settings panel of dread:
|
|
|
|
- **Not today** → pause that article's topic for 24h.
|
|
- **Less like this** → ease off that flavor for ~3 days.
|
|
- **Always hide …** → a standing mute (undoable in the Calm filters panel).
|
|
|
|
Avoid-terms match whole words/phrases (case- and punctuation-insensitive, no
|
|
substring surprises like "pan" matching "pandemic"). The brief is filtered *down*
|
|
for MVP (no refill from outside the stored brief). No accounts; the same `prefs`
|
|
object is the clean migration path to server-side, multi-user preferences later.
|
|
|
|
## Deployment
|
|
|
|
The database is never baked into the image — the API and the ingestion CLI share
|
|
one SQLite file via a mounted volume. Run ingestion (`poll`, `classify`,
|
|
`build-brief`) on a schedule against the same file.
|
|
|
|
```bash
|
|
docker build -t goodnews .
|
|
docker run -p 8000:8000 -v /srv/goodnews/data:/data goodnews
|
|
```
|
|
|
|
`GOODNEWS_DB` controls the database path (defaults to `data/goodnews.sqlite3`).
|
|
Put a reverse proxy (Caddy/nginx) in front for TLS once a domain is attached.
|
|
|
|
## Scheduling
|
|
|
|
A single idempotent command runs the whole pipeline and is safe to invoke as
|
|
often as you like — it only polls sources that are *due* (per each source's
|
|
`poll_interval_minutes`), only classifies articles the model hasn't seen, and
|
|
rebuilds the current day's brief:
|
|
|
|
```bash
|
|
python3 -m goodnews cycle # poll due -> classify -> dedup -> brief -> review flags
|
|
python3 -m goodnews cycle --force # poll every active source regardless of interval
|
|
python3 -m goodnews cycle --no-classify # skip the LLM step (e.g. model box offline)
|
|
```
|
|
|
|
A systemd timer runs it every 15 minutes. Unit files live in `deploy/`:
|
|
|
|
```bash
|
|
sudo install -d /etc/goodnews
|
|
sudo install -m 644 deploy/goodnews.env.example /etc/goodnews/goodnews.env # then edit
|
|
sudo install -m 644 deploy/goodnews.service deploy/goodnews.timer /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now goodnews.timer
|
|
|
|
systemctl list-timers goodnews.timer # when it next runs
|
|
journalctl -u goodnews.service -f # watch cycle output
|
|
```
|
|
|
|
`/etc/goodnews/goodnews.env` supplies `GOODNEWS_LLM_BASE_URL`, `GOODNEWS_LLM_MODEL`,
|
|
and `GOODNEWS_DB` to the scheduled run. The timer uses `Persistent=true`, so a
|
|
run missed while the machine was off is caught up on the next boot.
|
|
|
|
## Next Steps
|
|
|
|
Done so far: RSS/Atom ingestion with exact + semantic dedup, heuristic + local-LLM
|
|
classification with topic/flavor tagging, the daily brief, the FastAPI web/API layer
|
|
and site, scheduled `cycle` via systemd, a pytest suite, and device-local Calm Filters.
|
|
|
|
Still ahead:
|
|
|
|
1. **Supervised source pipeline** — preview + staging are done: `suggest-source`
|
|
previews a feed and stages it in the `source_candidates` table (status
|
|
suggested/quarantined/rejected/promoted); `promote-candidate` copies it into
|
|
`sources` (inactive by default — active on approval); promotion is never
|
|
automatic. Advisory health is done too: `review-sources` (also run at the end
|
|
of `cycle`) flags stale, failing, low-acceptance, duplicate-heavy, or
|
|
doom-skewed feeds for human review — it never deactivates anything. Still
|
|
ahead: an authenticated POST surface so the website can accept public
|
|
suggestions once accounts exist.
|
|
2. **Learned "Less like this" weighting** — replace the interim flavor-pause with
|
|
real preference down-ranking.
|
|
3. **Corpus rebalancing** — add calm/feelgood sources (currently science-heavy).
|
|
4. **Retention/pruning** — soft-delete + time-window indexes as the corpus grows
|
|
toward ~10k articles (don't rush; not yet needed).
|
|
5. **Go-public hardening** — TLS via a reverse proxy, then a domain.
|
|
|
|
## Local Model Configuration
|
|
|
|
The `classify` command expects an OpenAI-compatible local chat-completions server.
|
|
|
|
You can pass settings directly:
|
|
|
|
```bash
|
|
python3 -m goodnews classify --base-url http://127.0.0.1:1234/v1 --model gpt-oss --limit 10
|
|
```
|
|
|
|
Or use environment variables:
|
|
|
|
```bash
|
|
export GOODNEWS_LLM_BASE_URL=http://127.0.0.1:1234/v1
|
|
export GOODNEWS_LLM_MODEL=gpt-oss
|
|
python3 -m goodnews classify --limit 10
|
|
```
|
|
|
|
`classify` rewrites the current score/reason row for selected candidates. `rescore` can restore the fast heuristic scores.
|