Add systemd timer deployment for scheduled ingestion cycle

- deploy/goodnews.service: oneshot unit running 'goodnews cycle' with a
  generous TimeoutStartSec so long classify runs are not killed.
- deploy/goodnews.timer: every 15 min, Persistent=true to catch missed runs.
- deploy/goodnews.env.example: LLM endpoint + DB path for the scheduled run.
- README: scheduling/install docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
jay
2026-05-30 14:28:30 +00:00
parent 2414fd3ccb
commit cef272a8fc
4 changed files with 72 additions and 0 deletions
+30
View File
@@ -104,6 +104,36 @@ docker run -p 8000:8000 -v /srv/goodnews/data:/data goodnews
`GOODNEWS_DB` controls the database path (defaults to `data/goodnews.sqlite3`). `GOODNEWS_DB` controls the database path (defaults to `data/goodnews.sqlite3`).
Put a reverse proxy (Caddy/nginx) in front for TLS once a domain is attached. Put a reverse proxy (Caddy/nginx) in front for TLS once a domain is attached.
## Scheduling
A single idempotent command runs the whole pipeline and is safe to invoke as
often as you like — it only polls sources that are *due* (per each source's
`poll_interval_minutes`), only classifies articles the model hasn't seen, and
rebuilds the current day's brief:
```bash
python3 -m goodnews cycle # poll due -> classify new -> rebuild today's brief
python3 -m goodnews cycle --force # poll every active source regardless of interval
python3 -m goodnews cycle --no-classify # skip the LLM step (e.g. model box offline)
```
A systemd timer runs it every 15 minutes. Unit files live in `deploy/`:
```bash
sudo install -d /etc/goodnews
sudo install -m 644 deploy/goodnews.env.example /etc/goodnews/goodnews.env # then edit
sudo install -m 644 deploy/goodnews.service deploy/goodnews.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now goodnews.timer
systemctl list-timers goodnews.timer # when it next runs
journalctl -u goodnews.service -f # watch cycle output
```
`/etc/goodnews/goodnews.env` supplies `GOODNEWS_LLM_BASE_URL`, `GOODNEWS_LLM_MODEL`,
and `GOODNEWS_DB` to the scheduled run. The timer uses `Persistent=true`, so a
run missed while the machine was off is caught up on the next boot.
## Next Steps ## Next Steps
1. Run the poller for a few days and inspect which sources produce useful candidates. 1. Run the poller for a few days and inspect which sources produce useful candidates.
+6
View File
@@ -0,0 +1,6 @@
# Copy to /etc/goodnews/goodnews.env and adjust. Read by goodnews.service.
# These let the scheduled cycle reach the local model and the shared database.
GOODNEWS_LLM_BASE_URL=http://192.168.50.100:1234/v1
GOODNEWS_LLM_MODEL=qwen/qwen3-14b
GOODNEWS_DB=/home/jay/goodNews/data/goodnews.sqlite3
+21
View File
@@ -0,0 +1,21 @@
[Unit]
Description=goodNews ingestion cycle (poll due sources, classify, rebuild brief)
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
User=jay
Group=jay
WorkingDirectory=/home/jay/goodNews
# Optional config (LLM endpoint, DB path). The leading '-' makes it non-fatal
# if the file is absent.
EnvironmentFile=-/etc/goodnews/goodnews.env
ExecStart=/home/jay/goodNews/.venv/bin/python -m goodnews cycle
# A cycle may classify dozens of articles through a local model; give it room
# so systemd does not kill a long run (oneshot defaults to 90s).
TimeoutStartSec=1200
Nice=10
[Install]
WantedBy=multi-user.target
+15
View File
@@ -0,0 +1,15 @@
[Unit]
Description=Run the goodNews ingestion cycle periodically
[Timer]
# First run shortly after boot, then every 15 minutes. Per-source
# poll_interval_minutes still governs which feeds are actually hit each tick.
OnBootSec=2min
OnUnitActiveSec=15min
# Catch up a missed run if the machine was off when one was due.
Persistent=true
AccuracySec=1min
Unit=goodnews.service
[Install]
WantedBy=timers.target