On This Day: serve sharp images (originalimage, not the 330px thumbnail)

The Wikimedia feed's thumbnail is 330px, which upscales blurry in our hero. Use
originalimage.source instead — it's reliably sharp. (Can't just request a bigger
thumbnail width: for very large source images Wikimedia only serves pre-generated
bucket sizes and 400s on arbitrary widths — e.g. 500px ok, 800/1024px fail.)

- onthisday._best_image() prefers originalimage, falls back to the thumbnail.
- scripts/otd_image_upsize_backfill.py re-fetches each stored MM-DD and upgrades
  image_url in onthisday_pool + daily_onthisday in place (ran on host: pool + 6
  daily rows now sharp; today's hero verified 200). Only the /onthisday hero
  loads this image (home card is text-only), so larger files are a single-page,
  one-time load.
- test_best_image locks the prefer-original/fallback behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-27 17:07:37 -04:00
parent e3e6f24753
commit 6c10ad99a9
3 changed files with 75 additions and 1 deletions
+13 -1
View File
@@ -31,6 +31,18 @@ _NEG = (
)
# Wikimedia's feed hands us a 330px `thumbnail`, which upscales (blurry) in our hero. It also
# gives `originalimage` — a sharp, full-size URL that's always valid. We can't just request a
# bigger thumbnail width: for very large source images Wikimedia only serves pre-generated
# bucket sizes and 400s on arbitrary widths (e.g. 500px ok, 800/1024px fail, 1280px ok). So
# prefer the originalimage (reliably sharp), falling back to the thumbnail.
def _best_image(page: dict) -> str | None:
"""The sharpest reliably-served image URL: originalimage, else the 330px thumbnail."""
orig = (page.get("originalimage") or {}).get("source")
thumb = (page.get("thumbnail") or {}).get("source")
return orig or thumb or None
def _fetch_events(md: str) -> list[dict]:
"""All events for a MM-DD from Wikimedia, normalized to our candidate shape."""
mm, dd = md.split("-")
@@ -46,7 +58,7 @@ def _fetch_events(md: str) -> list[dict]:
"year": e.get("year"),
"text": text,
"summary": (page.get("extract") or "").strip() or None,
"image_url": ((page.get("thumbnail") or {}).get("source")) or None,
"image_url": _best_image(page),
"page_url": (((page.get("content_urls") or {}).get("desktop") or {}).get("page")) or None,
})
return out