f46fee1197
Per the calm north star (images support reading, never become a stimulation layer; metadata-only stays the posture): - Image-less cards are now designed, not missing: secondary cards are text-first (no empty media band), and an image-less hero becomes a fully typographic lead with a faint topic wordmark behind it (CSS attr(data-topic)). No big empty image space is ever reserved. - Opportunistic extraction: parse the first <img src> from a feed's content/description HTML when present, canonicalized — never fetching the article page. Applies to new ingests (existing rows keep their current image). - Held by deliberate choice: og:image page enrichment, stock/AI imagery, and any image-coverage requirement for sources. Tests: feed HTML image extraction (72 total). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
18 lines
889 B
Python
18 lines
889 B
Python
from goodnews.feeds import _img_from_html, parse_feed
|
|
|
|
def test_img_from_html_finds_first_src():
|
|
assert _img_from_html('<p>hi</p><img src="https://x.com/a.jpg" alt="">') == "https://x.com/a.jpg"
|
|
assert _img_from_html("no images here") is None
|
|
assert _img_from_html(None) is None
|
|
|
|
RSS = b"""<?xml version="1.0"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel>
|
|
<item><title>Story</title><link>https://e.com/1</link>
|
|
<content:encoded><![CDATA[<p>lead</p><img src="https://e.com/photo.jpg"/> more]]></content:encoded></item>
|
|
<item><title>NoImg</title><link>https://e.com/2</link><description>just text</description></item>
|
|
</channel></rss>"""
|
|
|
|
def test_parse_feed_pulls_image_from_content_html():
|
|
items = parse_feed(RSS)
|
|
assert items[0].image_url == "https://e.com/photo.jpg"
|
|
assert items[1].image_url is None # opportunistic: none when absent
|