Files
upbeatBytes/tests/test_feed_images.py
T
thejayman77 f46fee1197 Typographic-first imagery + opportunistic feed-HTML image extraction
Per the calm north star (images support reading, never become a stimulation
layer; metadata-only stays the posture):
- Image-less cards are now designed, not missing: secondary cards are text-first
  (no empty media band), and an image-less hero becomes a fully typographic lead
  with a faint topic wordmark behind it (CSS attr(data-topic)). No big empty
  image space is ever reserved.
- Opportunistic extraction: parse the first <img src> from a feed's
  content/description HTML when present, canonicalized — never fetching the
  article page. Applies to new ingests (existing rows keep their current image).
- Held by deliberate choice: og:image page enrichment, stock/AI imagery, and any
  image-coverage requirement for sources.

Tests: feed HTML image extraction (72 total).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 23:59:36 +00:00

18 lines
889 B
Python

from goodnews.feeds import _img_from_html, parse_feed
def test_img_from_html_finds_first_src():
assert _img_from_html('<p>hi</p><img src="https://x.com/a.jpg" alt="">') == "https://x.com/a.jpg"
assert _img_from_html("no images here") is None
assert _img_from_html(None) is None
RSS = b"""<?xml version="1.0"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel>
<item><title>Story</title><link>https://e.com/1</link>
<content:encoded><![CDATA[<p>lead</p><img src="https://e.com/photo.jpg"/> more]]></content:encoded></item>
<item><title>NoImg</title><link>https://e.com/2</link><description>just text</description></item>
</channel></rss>"""
def test_parse_feed_pulls_image_from_content_html():
items = parse_feed(RSS)
assert items[0].image_url == "https://e.com/photo.jpg"
assert items[1].image_url is None # opportunistic: none when absent