Sources hardening (Codex audit): promote-time dedup, postJSON timeout, host-only feed_key
Three follow-ups from Codex's audit of the deep-preview/search/dedup work:
- Promote-time duplicate guard: promote_candidate() now re-checks
find_existing_feed() and raises DuplicateFeedError → 409, so an
old/CLI/direct-DB candidate or a race can't bypass the add-time check and
silently overwrite a live source's settings via upsert. (sources scanned
first, so a real source collision wins over the candidate matching itself.)
- postJSON/putJSON/delJSON gain opt-in {timeout} (AbortController, default
none so other calls are unchanged); deep preview uses 120s and surfaces a
calm "timed out" message instead of pinning the button on "Deep-checking…"
if the LAN model stalls.
- feed_key() now lowercases the host only, not the whole URL — paths/queries
can be case-significant; scheme/www/trailing-slash/host-case still collapse.
Tests: test_candidate_deep_preview_and_dedup extended — promote succeeds once,
then a re-promote of the same candidate is refused 409. 224 pytest + 11 vitest.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -1207,6 +1207,13 @@ def create_app() -> FastAPI:
|
||||
trust_score=body.trust_score, pr_risk_score=body.pr_risk_score,
|
||||
poll_interval_minutes=body.poll_interval_minutes,
|
||||
)
|
||||
except sources.DuplicateFeedError as exc:
|
||||
ex = exc.existing
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail=f"“{ex['name']}” is already a source ({ex['status']}) — "
|
||||
"promote skipped so its settings aren't overwritten.",
|
||||
)
|
||||
except ValueError:
|
||||
raise HTTPException(status_code=404, detail="candidate not found")
|
||||
src = conn.execute(
|
||||
|
||||
+22
-4
@@ -70,13 +70,23 @@ def upsert_sources(conn: sqlite3.Connection, source_defs: list[dict]) -> int:
|
||||
# --- Duplicate detection (catch the same feed added twice) --------------------
|
||||
|
||||
|
||||
class DuplicateFeedError(Exception):
|
||||
"""Raised when an operation would create a second copy of an existing feed.
|
||||
Carries the existing match so the caller can name it in the response."""
|
||||
|
||||
def __init__(self, existing: dict):
|
||||
self.existing = existing
|
||||
super().__init__(f"feed already exists as {existing['kind']} “{existing['name']}”")
|
||||
|
||||
|
||||
def feed_key(url: str) -> str:
|
||||
"""A loose comparison key for spotting the same feed added twice despite
|
||||
trivial differences (scheme, www, trailing slash, case). Compare-only — the
|
||||
feed_url is always STORED exactly as entered; this just powers dup warnings."""
|
||||
trivial differences (scheme, www, trailing slash, host case). Compare-only —
|
||||
the feed_url is always STORED exactly as entered; this just powers dup checks.
|
||||
Only the host is lowercased: URL paths/queries can be case-significant."""
|
||||
try:
|
||||
p = urlsplit((url or "").strip().lower())
|
||||
host = p.netloc.removeprefix("www.")
|
||||
p = urlsplit((url or "").strip())
|
||||
host = p.netloc.lower().removeprefix("www.")
|
||||
path = p.path.rstrip("/")
|
||||
return host + path + (("?" + p.query) if p.query else "")
|
||||
except Exception: # noqa: BLE001 — never let a weird URL break add
|
||||
@@ -170,6 +180,14 @@ def promote_candidate(
|
||||
if cand is None:
|
||||
raise ValueError(f"no candidate with id {candidate_id}")
|
||||
|
||||
# Re-check duplicates at promote time too — the add-time guard can be bypassed
|
||||
# by old/CLI/direct-DB candidates or a race, and upsert_sources would silently
|
||||
# overwrite the existing source's settings. (sources are scanned first, so a
|
||||
# real source collision wins over this candidate matching itself.)
|
||||
existing = find_existing_feed(conn, cand["feed_url"])
|
||||
if existing and existing["kind"] == "source":
|
||||
raise DuplicateFeedError(existing)
|
||||
|
||||
name = cand["name"] or urlsplit(cand["feed_url"]).netloc or cand["feed_url"]
|
||||
upsert_sources(
|
||||
conn,
|
||||
|
||||
Reference in New Issue
Block a user