Stage 8b: filter low-support trends post-parse, don't fail the run

Claude reliably generates 35-50 trends but ~half typically have a long tail of 1-4 supporting videos — "patterns I noticed but can't strongly back". The strict Zod min(5) on supporting_video_ids rejected the ENTIRE response on the first such trend, throwing away the 20+ genuinely strong trends in the same call. The Dove run on prod just hit this: 49 trends generated, 27 with <5, whole batch lost. Fix: - RAW_TRENDS_SCHEMA no longer enforces the supporting-videos minimum at parse time. Lenient parse keeps the response intact. - After parse, filter to ≥MIN_SUPPORTING_VIDEOS_PER_TREND (default 5, env-overridable for small corpora). - Dropped tail is logged to qa/dropped_low_support_trends.json for forensics — shows what Claude noticed but couldn't strongly support. - Hard fail only if ALL trends drop, with a clear remedy in the message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:00:48 -04:00 · 2026-04-30 11:00:48 -04:00 · f5802cbbb9
commit f5802cbbb9
parent b1f18915bb
1 changed files with 24 additions and 2 deletions
--- a/v2/pipeline/stages/stage_8_trends.ts
+++ b/v2/pipeline/stages/stage_8_trends.ts
@ -6,7 +6,7 @@ import { writeFileSync, readFileSync, readdirSync, existsSync } from 'node:fs';
 import { z } from 'zod';
 import { callClaudeJSON } from '../lib/claude.js';
 import { loadRubric } from '../lib/rubrics.js';
-import { PATHS } from '../lib/paths.js';
+import { PATHS, ensureDir } from '../lib/paths.js';
 import type { AtomicInsight } from './stage_7_atomic_insights.js';
 import type { BriefInput } from '../../server/schemas/brief.js';
 import { ANALYSIS_SCHEMA, type Analysis } from './stage_6_analyse.js';
@ -22,6 +22,12 @@ type Categories = z.infer<typeof CATEGORIES_SCHEMA>;
 // V3 brief mandates ≥5 supporting videos per trend in production. Override via env
 // for small corpora (smoke tests, brand-new accounts) where 5 isn't always reachable.
 const MIN_SUPPORTING = parseInt(process.env.MIN_SUPPORTING_VIDEOS_PER_TREND ?? '5', 10);
+// Lenient schema: doesn't enforce the supporting-videos minimum here. Claude
+// reliably overshoots the requested count and includes a long tail of small
+// "noticed but unsupported" trends with 1-4 videos. Failing the whole parse on
+// the first short trend was discarding the 20+ genuinely strong trends in the
+// same response. Instead we parse leniently, filter to ≥MIN_SUPPORTING below,
+// and record the dropped tail for forensics.
 const RAW_TRENDS_SCHEMA = z.object({
  trends: z.array(z.object({
    slug: z.string().min(2),
@ -30,7 +36,7 @@ const RAW_TRENDS_SCHEMA = z.object({
    narrative: z.string().min(20),
    lens_tags: z.array(z.enum(['hooks', 'visual', 'audio', 'sentiment', 'narrative'])).min(1),
    top_atomic_ids: z.array(z.string()).default([]),
-    supporting_video_ids: z.array(z.string()).min(MIN_SUPPORTING),
+    supporting_video_ids: z.array(z.string()),
  })).min(1),
 });

@ -256,6 +262,22 @@ export async function runStage8Trends(reportId: string, brief: BriefInput): Prom
  // 8b — cluster
  const rawTrends = await clusterTrends(brief, categoryNames, atomicSummary);

+  // Filter out trends below the supporting-videos floor BEFORE relevance scoring.
+  // Claude reliably generates a long tail of small (1-4 video) trends — they're
+  // useful as "things worth watching" but they fail the V3 quality bar of ≥5
+  // supporting videos. Keeping them would also waste a relevance-scoring Claude
+  // call per trend. Logged to dropped_low_support_trends for forensics.
+  const lowSupport = rawTrends.trends.filter((t) => t.supporting_video_ids.length < MIN_SUPPORTING);
+  rawTrends.trends = rawTrends.trends.filter((t) => t.supporting_video_ids.length >= MIN_SUPPORTING);
+  if (lowSupport.length > 0) {
+    console.log(`[stage 8b] dropped ${lowSupport.length} trends with <${MIN_SUPPORTING} supporting videos; ${rawTrends.trends.length} kept for relevance scoring`);
+    ensureDir(PATHS.qaDir(reportId));
+    writeFileSync(`${PATHS.qaDir(reportId)}/dropped_low_support_trends.json`, JSON.stringify(lowSupport, null, 2));
+  }
+  if (rawTrends.trends.length === 0) {
+    throw new Error(`Stage 8b: every trend Claude generated had <${MIN_SUPPORTING} supporting videos. The dataset is producing too many small patterns and no broad ones. Lower MIN_SUPPORTING_VIDEOS_PER_TREND env (e.g. 3) for small corpora, or widen seeds + Force re-run.`);
+  }
+
  // 8b.5 — relevance scoring + filter
  const finalTrends: Trend[] = [];
  let dropped = 0, core = 0, peripheral = 0;