Sources hardening (Codex audit): promote-time dedup, postJSON timeout, host-only feed_key

Three follow-ups from Codex's audit of the deep-preview/search/dedup work:
- Promote-time duplicate guard: promote_candidate() now re-checks
  find_existing_feed() and raises DuplicateFeedError → 409, so an
  old/CLI/direct-DB candidate or a race can't bypass the add-time check and
  silently overwrite a live source's settings via upsert. (sources scanned
  first, so a real source collision wins over the candidate matching itself.)
- postJSON/putJSON/delJSON gain opt-in {timeout} (AbortController, default
  none so other calls are unchanged); deep preview uses 120s and surfaces a
  calm "timed out" message instead of pinning the button on "Deep-checking…"
  if the LAN model stalls.
- feed_key() now lowercases the host only, not the whole URL — paths/queries
  can be case-significant; scheme/www/trailing-slash/host-case still collapse.

Tests: test_candidate_deep_preview_and_dedup extended — promote succeeds once,
then a re-promote of the same candidate is refused 409. 224 pytest + 11 vitest.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-11 21:31:39 -04:00
parent e1ac19351e
commit 3afc1ed37e
5 changed files with 75 additions and 27 deletions
+6 -3
View File
@@ -158,12 +158,15 @@ def test_candidate_deep_preview_and_dedup(tmp_path, monkeypatch):
# Deep preview runs the real classifier on the smaller sample.
deep = tc.post(f"/api/admin/candidates/{cand['id']}/preview?deep=true").json()
assert deep["preview"]["classified"] is True and deep["preview"]["sampled"] == 8
# Dedup: exact + trivial variants (scheme / www / trailing slash / case) are refused.
# Dedup at ADD: exact + trivial variants (scheme / www / trailing slash / host case).
assert tc.post("/api/admin/candidates", json={"feed_url": "https://news.test/feed"}).status_code == 409
assert tc.post("/api/admin/candidates", json={"feed_url": "http://www.news.test/feed/"}).status_code == 409
# Once promoted to a live source, re-adding is still refused.
tc.post(f"/api/admin/candidates/{cand['id']}/promote", json={})
# Promote succeeds the first time and creates the live source.
assert tc.post(f"/api/admin/candidates/{cand['id']}/promote", json={}).status_code == 200
assert tc.post("/api/admin/candidates", json={"feed_url": "https://NEWS.test/feed"}).status_code == 409
# Dedup at PROMOTE too: a stale/duplicate candidate (here, re-promoting the
# same one) can't bypass add and overwrite the live source's settings.
assert tc.post(f"/api/admin/candidates/{cand['id']}/promote", json={}).status_code == 409
def test_candidate_reject_and_gating(tmp_path, monkeypatch):