Sources hardening (Codex audit): promote-time dedup, postJSON timeout, host-only feed_key

Three follow-ups from Codex's audit of the deep-preview/search/dedup work:
- Promote-time duplicate guard: promote_candidate() now re-checks
  find_existing_feed() and raises DuplicateFeedError → 409, so an
  old/CLI/direct-DB candidate or a race can't bypass the add-time check and
  silently overwrite a live source's settings via upsert. (sources scanned
  first, so a real source collision wins over the candidate matching itself.)
- postJSON/putJSON/delJSON gain opt-in {timeout} (AbortController, default
  none so other calls are unchanged); deep preview uses 120s and surfaces a
  calm "timed out" message instead of pinning the button on "Deep-checking…"
  if the LAN model stalls.
- feed_key() now lowercases the host only, not the whole URL — paths/queries
  can be case-significant; scheme/www/trailing-slash/host-case still collapse.

Tests: test_candidate_deep_preview_and_dedup extended — promote succeeds once,
then a re-promote of the same candidate is refused 409. 224 pytest + 11 vitest.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-11 21:31:39 -04:00
parent e1ac19351e
commit 3afc1ed37e
5 changed files with 75 additions and 27 deletions
+7
View File
@@ -1207,6 +1207,13 @@ def create_app() -> FastAPI:
trust_score=body.trust_score, pr_risk_score=body.pr_risk_score,
poll_interval_minutes=body.poll_interval_minutes,
)
except sources.DuplicateFeedError as exc:
ex = exc.existing
raise HTTPException(
status_code=409,
detail=f"{ex['name']}” is already a source ({ex['status']}) — "
"promote skipped so its settings aren't overwritten.",
)
except ValueError:
raise HTTPException(status_code=404, detail="candidate not found")
src = conn.execute(
+22 -4
View File
@@ -70,13 +70,23 @@ def upsert_sources(conn: sqlite3.Connection, source_defs: list[dict]) -> int:
# --- Duplicate detection (catch the same feed added twice) --------------------
class DuplicateFeedError(Exception):
"""Raised when an operation would create a second copy of an existing feed.
Carries the existing match so the caller can name it in the response."""
def __init__(self, existing: dict):
self.existing = existing
super().__init__(f"feed already exists as {existing['kind']}{existing['name']}")
def feed_key(url: str) -> str:
"""A loose comparison key for spotting the same feed added twice despite
trivial differences (scheme, www, trailing slash, case). Compare-only — the
feed_url is always STORED exactly as entered; this just powers dup warnings."""
trivial differences (scheme, www, trailing slash, host case). Compare-only —
the feed_url is always STORED exactly as entered; this just powers dup checks.
Only the host is lowercased: URL paths/queries can be case-significant."""
try:
p = urlsplit((url or "").strip().lower())
host = p.netloc.removeprefix("www.")
p = urlsplit((url or "").strip())
host = p.netloc.lower().removeprefix("www.")
path = p.path.rstrip("/")
return host + path + (("?" + p.query) if p.query else "")
except Exception: # noqa: BLE001 — never let a weird URL break add
@@ -170,6 +180,14 @@ def promote_candidate(
if cand is None:
raise ValueError(f"no candidate with id {candidate_id}")
# Re-check duplicates at promote time too — the add-time guard can be bypassed
# by old/CLI/direct-DB candidates or a race, and upsert_sources would silently
# overwrite the existing source's settings. (sources are scanned first, so a
# real source collision wins over this candidate matching itself.)
existing = find_existing_feed(conn, cand["feed_url"])
if existing and existing["kind"] == "source":
raise DuplicateFeedError(existing)
name = cand["name"] or urlsplit(cand["feed_url"]).netloc or cand["feed_url"]
upsert_sources(
conn,