gmal-scope-builder

Author	SHA1	Message	Date
DJP	06bb1b9bfd	Fix deep extraction crash: unescaped curly braces in f-string Root cause: "name 'name' is not defined" error on line 300 The f-string example {name:"KV 360", tier:"Tier B"} was interpreted as Python set literal, not as JSON text. Changed to parentheses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 13:23:25 -04:00
DJP	68d342575e	Fix tier extraction: separate entry per tier + user context box Tier fix (reverses previous "extract once" mistake): - SEPARATE entry for EACH tier where volume > 0 - "KV 360" Tier A=No/0, Tier B=Yes/1, Tier C=Yes/1 → TWO entries - Tier field matches column header exactly ("Tier B", "Tier C") - Tiers with volume=0 or status=No are skipped - Applied to both normal and deep extraction prompts User context box (new Step 3 on Upload tab): - Textarea where users give hints before extraction runs - Examples: "Focus on Toolbox sheet", "Tier columns are D/F/H" - Context prepended to Claude prompt in both normal and deep modes - Passed through upload endpoint → background parse → AI calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 09:24:07 -04:00
DJP	6273179741	Fix duplicate asset extraction: one entry per unique asset Problem: Claude extracted the same asset 3 times (once per tier A/B/C), creating duplicate entries like "Toolbox presentation deck" x3. Fix: Both normal and deep extraction prompts now say: - Extract each UNIQUE asset ONCE only - Do NOT create duplicates for same asset at different tiers - Use the "tier" field to record the tier label - Skip assets with volume 0 across all tiers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:15:03 -04:00
DJP	09441a20b8	Fix deep extraction max_tokens: 16k→32k, shorter descriptions Root cause: stop_reason=max_tokens - Claude ran out of output tokens before finishing the tool call JSON for 50+ assets. Fix: - Bump max_tokens from 16000 to 32000 for both normal and deep extraction - Tell Claude to keep descriptions SHORT (1 sentence max) - Reduce input data to 35k chars (from 40k) to leave more room for output - Better stop_reason logging on normal extraction too Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:51:08 -04:00
DJP	617c1e3ca3	Debug deep extraction Pass 2: better logging, truncate analysis, force tool use - Log structure analysis length and data length before Pass 2 - Log stop_reason from Claude response - If no assets returned, log the text response for debugging - Truncate structure analysis to 4k chars if too long (leaves room for data) - Reduce data to 40k chars (was 45k, combined with analysis was too large) - Add instruction: "You MUST call extract_assets with at least one asset" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:43:38 -04:00
DJP	a79529470e	Fix header detection: distinguish header text from data values Problem: Header detection picked data rows (with Yes/No/numbers) as headers because they had more filled cells than the actual header row (which had merged cells with gaps). Result: data values became column labels, deep extraction failed. Fix: - Header values must be text-like (not numbers, Yes/No, 0/1, ü, x, -) - Only consecutive header rows count - stop scanning at first data row - Multi-row headers combined (row 1 + row 2 both contribute) - Tested against Wella Job Routes 2: correctly identifies row 2 as header with "Buckets \| Categories \| Top 10 deliverables \| Tier A \| Tier B \| Tier C" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:15:00 -04:00
DJP	8a2b45ae31	Deep extraction: live progress between passes + elapsed timer - Split deep extraction into two separate functions (pass1 + pass2) so the background task can update DB between them - Progress now shows: "Pass 1/2: Analyzing structure... (this takes 20-40 seconds)" "Pass 1 complete (23s). Pass 2/2: Extracting assets..." "Deep extraction complete (52s total). Found 45 assets." - Live elapsed timer (seconds) shown in the upload spinner - Timer ticks every second so user knows it's not hung Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 10:03:04 -04:00
DJP	714ab98388	Normal + Deep Extraction modes for complex client files Problem: Complex client Excel files (30+ columns, merged cells, Q&A columns, tier data) produced zero assets because the extraction was a dumb pipe dump that lost all column context. Fix: - Smart Excel extraction: detects header rows, labels each value with its column name, skips empty sheets, handles merged cells. Claude now sees "Top 10 deliverables: Toolbox presentation deck \| Tier A: Yes \| 1" instead of "Toolbox \| Base \| Toolbox presentation deck \| ü' \| Yes \| 1" - Two extraction modes on Upload tab: - Normal: fast single-pass extraction (~$0.05) - Deep Extraction: two-pass AI analysis (~$0.15-0.30) Pass 1: Claude analyzes the spreadsheet structure Pass 2: Claude extracts assets using the structural understanding - Upload endpoint accepts ?mode=normal\|deep query parameter - Background parse shows "Deep extraction: analyzing structure (Pass 1 of 2)" - Tested against both Wella files - header-aware extraction produces clear labelled output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 09:46:04 -04:00
DJP	3cb1973f57	Fix tier matching: use client tier to pick correct complexity variant - Doc parser now extracts tier labels (Tier A, A, Gold, etc.) per asset - Matching uses tier to find the correct GMAL complexity variant: - Claude matches to the GMAL family (asset type) - Post-match lookup: (asset_name + target_complexity_level) finds exact variant - e.g. "Banner - Tier A" with A=Complex → finds Complex variant by asset_name query - Tier hint passed to Claude prompt for better matching - No blind expansion - only the tier-appropriate GMAL is matched - Expand to Tiers button still available for when client doesn't specify tiers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:17:20 -04:00
DJP	26d3435be0	Improve matching, upload UX, collapse fix, full catalog approach - Upload now shows live stage progress (uploading -> extracting -> AI parsing -> done) - Fix match group collapse: proper React state instead of DOM manipulation - Replace pre-filter with full GMAL catalog sent to Claude (~3k tokens, <$0.01) - FTS and keyword matching missed too many semantic matches - Claude now sees all 243 assets and uses semantic understanding - Improved system prompt with terminology bridges for better scoring - Per-project AI cost tracking persisted to DB - Parallel matching with cancel support - Auto-select matches >= 80%, YOLO button for rest - Debug panel for AI call inspection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 19:22:08 -04:00
DJP	e18976fdb2	Initial commit - GMAL Scope Builder Dockerized web app (FastAPI + React + PostgreSQL) for scoping client ratecards against the GMAL master asset database. Features: - GMAL data ingestion from Excel (390 assets, 120 roles, 5 model types) - AI-powered document parsing and asset extraction (Claude Opus 4.6) - AI matching engine with parallel batching, confidence scoring, caveats - Ratecard builder with hours x volume calculation - Excel and PDF export - GMAL browser and inline editor - AI cost tracking per project (persisted to DB) - Debug panel for AI call inspection - Dark theme UI with gold (#FFC407) accent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 17:35:14 -04:00

11 commits