Sources: LLM deep-preview, source search, duplicate-add guard

Three admin Sources upgrades:
- Deep preview: a per-candidate "🔬 Deep preview" button runs the REAL
  classifier on an 8-item sample (the same model that judges live articles),
  versus the fast keyword heuristic the add/Re-preview path uses. Preview now
  carries `classified`, surfaced as a "model-checked" vs "quick estimate"
  badge — so the acceptance % is no longer ambiguously heuristic. conn is
  released during the ~30-60s model pass; postJSON has no client timeout.
- Search: free-text box over the sources table (name / category / feed URL /
  homepage), folded into the existing status filter, with a live match count
  and empty state. Makes "is this already added?" a glance.
- Duplicate-add guard: sources.find_existing_feed() + feed_key() normalize
  scheme/www/trailing-slash/case, so re-adding a feed that's already a live
  source or a queued candidate is refused with a 409 naming where it lives
  (DB already enforced exact-URL uniqueness; this catches the near-miss
  variants and overwrite-on-promote footgun).

Tests: test_candidate_deep_preview_and_dedup (deep flag wires the model +
uses the small sample; exact/www/slash/case variants all 409). 224 pytest +
11 vitest green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-11 21:19:15 -04:00
parent ba1a29d12a
commit e1ac19351e
4 changed files with 126 additions and 11 deletions
+25
View File
@@ -141,6 +141,31 @@ def test_candidate_suggest_promote_paused(tmp_path, monkeypatch):
assert any(s["name"] == "Good Feed" for s in tc.get("/api/admin/stats").json()["sources"])
def test_candidate_deep_preview_and_dedup(tmp_path, monkeypatch):
app, api = _make(tmp_path, monkeypatch, admin_email="boss@x.com")
def fake_preview(url, **k):
# Echo back whether the LLM client was wired in + the sample size used.
return {"url": url, "sampled": k.get("sample"), "accepted": 4,
"classified": k.get("client") is not None}
monkeypatch.setattr(api.feeds, "preview_feed", fake_preview)
# Deep preview builds a model client; stub it so we never touch the real LAN model.
monkeypatch.setattr(api, "LocalModelClient", type("C", (), {"from_env": staticmethod(lambda: object())}))
tc = _signin(app, api, "boss@x.com")
cand = tc.post("/api/admin/candidates", json={"feed_url": "https://news.test/feed"}).json()
assert cand["preview"]["classified"] is False # add uses the fast heuristic
# Deep preview runs the real classifier on the smaller sample.
deep = tc.post(f"/api/admin/candidates/{cand['id']}/preview?deep=true").json()
assert deep["preview"]["classified"] is True and deep["preview"]["sampled"] == 8
# Dedup: exact + trivial variants (scheme / www / trailing slash / case) are refused.
assert tc.post("/api/admin/candidates", json={"feed_url": "https://news.test/feed"}).status_code == 409
assert tc.post("/api/admin/candidates", json={"feed_url": "http://www.news.test/feed/"}).status_code == 409
# Once promoted to a live source, re-adding is still refused.
tc.post(f"/api/admin/candidates/{cand['id']}/promote", json={})
assert tc.post("/api/admin/candidates", json={"feed_url": "https://NEWS.test/feed"}).status_code == 409
def test_candidate_reject_and_gating(tmp_path, monkeypatch):
app, api = _make(tmp_path, monkeypatch, admin_email="boss@x.com")
monkeypatch.setattr(api.feeds, "preview_feed", lambda url, **k: {"url": url, "sampled": 1, "accepted": 0})