Geo Stage 1-2: subject-geography model + classifier + pipeline wiring

"Closer to Home" foundation (audit greenlit by Codex). Durable geography, kept
decoupled from volatile scoring.

- Schema: article_geo (breadth/confidence/rationale/geo_version) + article_places
  (0..N ISO-coded places), separate from article_scores so re-runs/audits never
  disturb scoring or acceptance. "local" is never stored — it's relative to the
  reader; the UI computes "Near you" later.
- geo.py: LLM proposes place NAMES, code disposes to ISO codes (country alpha-2,
  US state 2-letter); region words like "Europe" can never become a country.
  'global'/placeless is first-class, not failure. Confidence calibrated so 'high'
  needs an explicit location. Geo is its OWN LLM pass, not merged into the scoring
  prompt (durable metadata, re-runnable, keeps the sensitive prompt untouched).
- store_geo replaces places (geo is re-derivable, unlike scores). tag_articles is
  idempotent by geo_version, only touches accepted non-duplicate articles.
- CLI `geo` command (cycle-locked, --limit/--reclassify) for backfill, plus a
  bounded geo step in the cycle (--geo-limit 60, --no-geo). scripts/geo_audit.py
  is the prototype audit tool.

360 tests green; live smoke tagged real articles correctly (Gaza->PS, London->GB,
placeless science->global). No UI / SEO pages yet — ranking/personalization only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jay
2026-06-19 16:56:49 -04:00
parent 59ff48ae90
commit 1c05554a28
7 changed files with 613 additions and 1 deletions
+2 -1
View File
@@ -16,7 +16,8 @@ $ = informational
- Date showed 6/2/2026 while it was still 6/1/2026 at 10:32pm
- For account-based usage, we should have a thumbs up button that shows up to track the articles the user likes the most. We can then curate a special feed of articles that match the categories the user likes the most. Not social-based, just for seeing news that means the most to you.
- Feasibility of allowing users to add their own custom feeds for news sources
- Joke corner: a curated, clean, non-offensive daily/rotating joke spot. On-brand "escape the grind" — light, professional-but-fun. Curation bar same as the rest of UB (nothing mean or edgy).
- Joke corner: a curated, clean, non-offensive daily/rotating joke spot. On-brand "escape the grind" — light, professional-but-fun. Curation bar same as the rest of UB (nothing mean or edgy). PARTICIPATION LOOP: let people SUBMIT jokes → AI pre-screen (clean/non-insulting/actually-funny, conservative gate) → human batch-approval queue (user is fine doing batches to drive engagement) → approved ones go live. Same "LLM proposes, code disposes" + admin-approval-queue pattern already used for Bloom words, Daily Word pool, and source candidates — known architecture, not net-new. Drivers: submission gives a reason to RETURN ("did mine get approved?"), attribution ("submitted by …") deepens ownership, approved jokes are shareable. Guardrails: jokes are an offense minefield (punching-down/stereotypes) so AI gate stays conservative + human is final say; reuse feedback-form anti-abuse (honeypot + rate-limit) on the submit endpoint.
- Bubble shooter / "bubble blaster" game for /play (casual, calm-satisfying arcade — different fun than the word/brain games). Strategic point: own the destination + widen the funnel, NOT literally steal a clone's community. Make it feed the share loop: DAILY SEEDED board + shareable SCORE ("I scored 14,200 🫧") deep-linked like the other games. Scope flag: bigger than the turn-based grid games — it's a real-time CANVAS game (aim, projectile physics, collision, color-cluster pop, cascade/drop, animation loop). Post-launch build, our own art/calm aesthetic (no cloned name/assets).
- Text adventure that SAVES YOUR SPOT in time (resume where you left off — a reason to come back). Start single-player/choose-your-path; dream stretch goal = broaden to co-op/multiplayer where people work through it together. Theme TBD. Fits "UB isn't just news — it's somewhere between professional and fun, a place to escape." (Would live under /play.)