Replaces the gist-based read-time with the SOURCE article's full read time — the
contrast that sells the gist ("calm 1-min version here; ~10 min for the deep dive").
- goodnews/readtime.py: word_count_from_html (strips script/style/nav/header/
footer/form/button/aside furniture before counting) + source_read_minutes
(~225 wpm, 200-word floor, None when extraction looks failed/too thin).
- articles.source_words + read_checked_at columns (count only, never the body;
fits the privacy posture). Idempotent migration.
- enrich.fetch_source_words + enrich_read_times: a bounded, retry-guarded cycle
step (mirrors the image enrichers) that counts words for recent accepted
articles. Only ever writes a real count; never overwrites good with zero. Wired
into the cycle after recent-image enrichment.
- queries: source_words flows through _ARTICLE_COLUMNS; api exposes
source_read_minutes on Article (null when unknown).
- home3: News card shows "Full story · ~N min", hidden entirely when null (no
misleading "1 min").
- Tests: furniture stripping, threshold/rounding, enrich idempotency + no
zero-overwrite, API null handling. 412 backend.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make "no blurry images" sustainable, not a one-off cleanup. RSS feed thumbnails
(~44% were ~90px) were stored at ingest and upscaled to mush, so new articles
would reintroduce them. Now image_url is filled ONLY by the quality-gated
og:image enrichment:
* insert_article no longer stores the feed image (was canonicalize_url(item...)).
* enrich_recent_images(): the cycle fetches a quality og:image for the newest
accepted, imageless articles each run (bounded), keeping Latest photo-rich.
* Brief + on-open enrichment unchanged.
Net: every stored image is a validated, ≥450px og:image; the rest are clean
placeholders.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Some cards showed blurry photos — feed RSS thumbnails (~90×90, e.g. Phys.org's
/tmb/ path) that load fine but upscale to mush in the banner. Add a header-based
dimension parser (PNG/GIF/JPEG/WebP, stdlib only) and fold a minimum-size gate
(450×250) into the image validation, alongside the existing load check. Images
we can't measure (SVG/AVIF) still pass on content-type. A re-prune clears the
small ones already stored so those cards fall back to the clean placeholder.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A stored og:image isn't proof it renders: signed/hotlink-protected URLs (e.g.
the Guardian's i.guim.co.uk) 401 on a direct browser load, so they counted
toward coverage yet always fell back. Now fetch_og_image confirms the image
truly returns 200 + image/* (requested no-referrer, same SSRF-safe redirect
handling) before storing it. Add prune_broken_images() to clear already-stored
URLs that no longer load, so coverage is honest and those cards show the
placeholder cleanly. The browser onerror→placeholder remains the final safety net.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tie image enrichment to attention (per review): when an article earns a summary
(i.e. a reader reached it), best-effort fetch a real og:image if it lacks one —
never blanket-fetch every ingested article. Adds:
* enrich_article_image() — single-article fetch, leaves existing images alone,
retries an imageless article only after 7 days, stamps image_checked_at.
* generate_summary() calls it after caching (wrapped; never breaks summaries).
* enrich_summarized_images() + `goodnews enrich-images` CLI — slow background
backfill of already-summarized, accepted, imageless articles.
* Quality gate: extend the generic-image skip list with data:/tracking-pixel/
spacer markers (on top of the existing logo/placeholder + unbranded-BBC logic).
This is coverage only; display (editorial rhythm, tile treatment) comes next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
BBC's og:image comes from the "branded_news" CDN path with a "BBC NEWS" logo
baked into the picture (shows as "…EWS" once the hero crops it). The identical
photo is served under "cpsprodpb" with no logo, so rewrite branded_news →
cpsprodpb. Best of both: full-resolution hero, no burned-in branding. Re-enriched
recent briefs so live images swap over. 99 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
og:image extraction rejected any URL containing "branded_news" as a generic share
image, but that's BBC's normal CDN path for real article photos. So every BBC hero
fell back to the 240px RSS thumbnail (blurry when shown large). Drop that marker;
keep the genuine placeholder markers (facebook-default, og-default, etc.). Updated
the test to assert BBC branded_news paths pass through. 99 tests pass.
(One-time: cleared image_checked_at on the 57 previously-checked articles and
re-enriched recent briefs so existing thumbnails upgrade to og:images.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause (Codex audit): the client pins the brief by generated_at, but image
enrichment populates image_url AFTER the brief is built without bumping
generated_at — so a verbatim pinned copy stays imageless even once the server
has the image. The reclassify rebuilt the brief and the early pin stuck.
- Frontend: when reusing a pinned brief (same generated_at), refresh server-owned
metadata by article id (esp. image_url) while preserving the user's order and
replacements. Re-saves the merged view so it stays current.
- enrich_brief_images: default limit 5 -> 7 (any brief item can become the hero
via the client fallback or a replace, so cover the whole brief).
- Don't cache image failures forever: retry brief items still missing an image
after a TTL (retry_days=2) instead of stamping them imageless permanently.
Pairs with the hero image fallback (dd0087b). 99 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- og:image enrichment now skips branded/generic share images (BBC 'branded_news'
with its burned-in logo, NPR 'facebook-default', etc.) and keeps the first
real article image — so no competitor logo lands on our hero. Cleared the few
already-stored branded URLs so they re-enrich.
- Hero selection now prefers a gentle + readable story that also HAS a (clean)
image, falling back to gentle-readable, then gentle. The lead is visual when
possible, typographic otherwise — never branded.
(The '7 cards' report was a stale browser cache: the brief stores 7 and the
built JS requests 7; a hard refresh shows all seven.)
Tests: branded/generic image rejection (87 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Hero blur fix: brief enrichment now prefers a page's og:image even when a
feed thumbnail exists (feed thumbs are often tiny; the hero is shown large).
Verified: BBC hero upgrades to the 1024px share image, ScienceDaily to 1920px.
- Today is now 'Highlights from Today' — hero + 6 (brief size 7), which also
makes the secondary grid a balanced 3+3 instead of an orphaned 3+1.
- Replace now excludes every article seen this session (a client-side seen-set),
so it never cycles back to something already shown.
- New session History panel (this tab only, no account): lists everything seen,
including swapped-away stories, so they stay recoverable. Persistent
history/favorites are tabled for sign-in later.
Tests: og:image upgrade of an existing feed image (86 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The grid stays typographic; the hero is the one intentional visual slot. At
brief-build time we fetch a hero-quality image for the daily five that lack one:
- enrich.py reads ONLY a page's <head> og:image/twitter:image and stores just
the URL (never the body).
- SSRF-guarded: http(s) only, 6s timeout, 300KB cap, <=3 manual redirects each
re-validated, and hosts rejected if any resolved address is private, loopback,
link-local, multicast, reserved, or unspecified.
- image_checked_at column caches success AND failure, so an article is never
retried forever.
- Wired into build-brief and cycle (brief items only, only if image missing and
unchecked). Everything else stays metadata-only.
- Verified live: today's five all carry images (feed + enriched).
Tests: og:image parser, head-only scope, IP guard across internal ranges, and
enrich success + failure-caching (85 total).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>