thejayman77 2a9c49e2a9 Sparse-day-proof briefs, feed health check, and 16 new sources
- Briefs now fill from a rolling window (prefer today, backfill up to
  window_days) and exclude anything featured in the last 7 days of briefs, so
  slow days still produce five items without stories lingering day to day.
- New 'check-feeds' command fetches and parses every feed to catch dead ones.
- Added 16 validated sources (science, environment, animals, culture),
  expanding coverage from 12 to 28 feeds to reduce staleness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 15:30:03 +00:00

goodNews

Local-first constructive news ingestion prototype.

The first milestone is intentionally small: collect public RSS/Atom metadata, dedupe it, store short source-provided snippets, and attach early reason-coded heuristic scores. It does not store full article bodies.

Commands

From this directory:

python3 -m goodnews init-db
python3 -m goodnews import-sources
python3 -m goodnews poll --limit 3
python3 -m goodnews rescore
python3 -m goodnews check-llm --base-url http://127.0.0.1:1234/v1 --model gpt-oss
python3 -m goodnews classify --limit 10 --base-url http://127.0.0.1:1234/v1 --model gpt-oss
python3 -m goodnews build-brief --date 2026-05-27 --replace
python3 -m goodnews show-brief
python3 -m goodnews list-recent --limit 10
python3 -m goodnews list-recent --accepted-only --limit 10
python3 -m goodnews list-category --topic animals --flavor discovery
python3 -m goodnews list-category --topic environment --flavor solution
python3 -m goodnews source-report
python3 -m goodnews list-runs

The SQLite database lives at:

data/goodnews.sqlite3

Sources live at:

config/sources.toml

Categories

When classified by the local model, each article is tagged with one topic and one flavor, allowing browsable category feeds (e.g. "feel-good animals", "environment solutions") via list-category:

  • Topics: science, environment, health, community, culture, animals
  • Flavors: breakthrough, discovery, solution, feelgood, perspective

The allowed values live in goodnews/taxonomy.py. The accept/reject gate is kept deliberately broad ("not dreary"); ranking and category filters do the curation.

Stored Article Data

For each article, the database stores:

  • source
  • canonical URL
  • title
  • short RSS/Atom description or summary
  • author, if present
  • published timestamp, if present
  • image URL, if present
  • language, if present
  • hashes used for dedupe
  • heuristic scores and reason codes

Web / API

The optional web extra adds a FastAPI service and a small static site that consumes it. The same JSON API backs both the website and any future companion app; its auto-generated OpenAPI docs at /docs are the shared contract.

pip install -e '.[web]'          # or: .venv/bin/pip install -e '.[web]'
python3 -m goodnews serve                  # http://127.0.0.1:8000
python3 -m goodnews serve --host 0.0.0.0   # expose on the network

Endpoints:

  • GET / — the static site (daily five + topic/flavor browsing)
  • GET /healthz — liveness + scored-article count
  • GET /api/categories — the topic/flavor taxonomy
  • GET /api/category-counts — article counts per topic/flavor
  • GET /api/feed?topic=&flavor=&limit=&offset= — ranked, filtered articles
  • GET /api/brief?date=&limit= — a daily brief (latest if no date)
  • GET /api/brief-dates — available brief dates
  • GET /docs — interactive OpenAPI documentation

The ingestion CLI stays pure-stdlib; only the web extra pulls in FastAPI/uvicorn, so the two halves can be deployed and upgraded independently.

Deployment

The database is never baked into the image — the API and the ingestion CLI share one SQLite file via a mounted volume. Run ingestion (poll, classify, build-brief) on a schedule against the same file.

docker build -t goodnews .
docker run -p 8000:8000 -v /srv/goodnews/data:/data goodnews

GOODNEWS_DB controls the database path (defaults to data/goodnews.sqlite3). Put a reverse proxy (Caddy/nginx) in front for TLS once a domain is attached.

Scheduling

A single idempotent command runs the whole pipeline and is safe to invoke as often as you like — it only polls sources that are due (per each source's poll_interval_minutes), only classifies articles the model hasn't seen, and rebuilds the current day's brief:

python3 -m goodnews cycle                 # poll due -> classify new -> rebuild today's brief
python3 -m goodnews cycle --force         # poll every active source regardless of interval
python3 -m goodnews cycle --no-classify   # skip the LLM step (e.g. model box offline)

A systemd timer runs it every 15 minutes. Unit files live in deploy/:

sudo install -d /etc/goodnews
sudo install -m 644 deploy/goodnews.env.example /etc/goodnews/goodnews.env  # then edit
sudo install -m 644 deploy/goodnews.service deploy/goodnews.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now goodnews.timer

systemctl list-timers goodnews.timer          # when it next runs
journalctl -u goodnews.service -f             # watch cycle output

/etc/goodnews/goodnews.env supplies GOODNEWS_LLM_BASE_URL, GOODNEWS_LLM_MODEL, and GOODNEWS_DB to the scheduled run. The timer uses Persistent=true, so a run missed while the machine was off is caught up on the next boot.

Next Steps

  1. Run the poller for a few days and inspect which sources produce useful candidates.
  2. Add source-level quality notes and deactivate noisy feeds.
  3. Replace or supplement heuristic-v0 with a local model classifier.
  4. Add a daily brief builder that selects 5 items using scores and source diversity.
  5. Add a small web/API layer once the ingest data looks trustworthy.

Local Model Configuration

The classify command expects an OpenAI-compatible local chat-completions server.

You can pass settings directly:

python3 -m goodnews classify --base-url http://127.0.0.1:1234/v1 --model gpt-oss --limit 10

Or use environment variables:

export GOODNEWS_LLM_BASE_URL=http://127.0.0.1:1234/v1
export GOODNEWS_LLM_MODEL=gpt-oss
python3 -m goodnews classify --limit 10

classify rewrites the current score/reason row for selected candidates. rescore can restore the fast heuristic scores.

S
Description
No description provided
Readme 49 MiB
Languages
Python 59.1%
Svelte 31.7%
JavaScript 6.9%
HTML 1.8%
Shell 0.2%
Other 0.2%