Durability pass: tests, clearer diversity/classify behavior, Calm Filters foundation

- Add pytest suite (34 tests) covering scoring thresholds, dedup clustering +
  representative selection + time window, brief source/category diversity,
  avoid-term phrase matching, and text canonicalization/truncation.
- Rewrite _select_diverse with an explicit, tested contract (best-first, one
  per source, backfill, then inject a second category by evicting the
  lowest-ranked pick).
- classify_articles now returns attempted/succeeded/skipped (ClassifyReport) so
  silent model failures are visible in both the cycle and classify output.
- Fix clean_text truncation to stay within max_len (ellipsis no longer
  overshoots).
- New filters.py: canonical FilterPrefs shape (include/mute topics+flavors,
  avoid_terms, pauses) and pure word/phrase-boundary matching engine seeding
  Calm Filters. Not yet wired into the API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
jay
2026-05-30 19:07:31 +00:00
parent 470e9ecbf8
commit 9cdcda5e02
12 changed files with 479 additions and 18 deletions
+3
View File
@@ -16,6 +16,9 @@ web = [
"fastapi>=0.110",
"uvicorn[standard]>=0.29",
]
test = [
"pytest>=8",
]
[project.scripts]
goodnews = "goodnews.cli:main"