Commit graph

11 commits

Author SHA1 Message Date
DJP
06bb1b9bfd Fix deep extraction crash: unescaped curly braces in f-string
Root cause: "name 'name' is not defined" error on line 300
The f-string example {name:"KV 360", tier:"Tier B"} was interpreted
as Python set literal, not as JSON text. Changed to parentheses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 13:23:25 -04:00
DJP
68d342575e Fix tier extraction: separate entry per tier + user context box
Tier fix (reverses previous "extract once" mistake):
- SEPARATE entry for EACH tier where volume > 0
- "KV 360" Tier A=No/0, Tier B=Yes/1, Tier C=Yes/1 → TWO entries
- Tier field matches column header exactly ("Tier B", "Tier C")
- Tiers with volume=0 or status=No are skipped
- Applied to both normal and deep extraction prompts

User context box (new Step 3 on Upload tab):
- Textarea where users give hints before extraction runs
- Examples: "Focus on Toolbox sheet", "Tier columns are D/F/H"
- Context prepended to Claude prompt in both normal and deep modes
- Passed through upload endpoint → background parse → AI calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 09:24:07 -04:00
DJP
6273179741 Fix duplicate asset extraction: one entry per unique asset
Problem: Claude extracted the same asset 3 times (once per tier A/B/C),
creating duplicate entries like "Toolbox presentation deck" x3.

Fix: Both normal and deep extraction prompts now say:
- Extract each UNIQUE asset ONCE only
- Do NOT create duplicates for same asset at different tiers
- Use the "tier" field to record the tier label
- Skip assets with volume 0 across all tiers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:15:03 -04:00
DJP
09441a20b8 Fix deep extraction max_tokens: 16k→32k, shorter descriptions
Root cause: stop_reason=max_tokens - Claude ran out of output tokens
before finishing the tool call JSON for 50+ assets.

Fix:
- Bump max_tokens from 16000 to 32000 for both normal and deep extraction
- Tell Claude to keep descriptions SHORT (1 sentence max)
- Reduce input data to 35k chars (from 40k) to leave more room for output
- Better stop_reason logging on normal extraction too

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:51:08 -04:00
DJP
617c1e3ca3 Debug deep extraction Pass 2: better logging, truncate analysis, force tool use
- Log structure analysis length and data length before Pass 2
- Log stop_reason from Claude response
- If no assets returned, log the text response for debugging
- Truncate structure analysis to 4k chars if too long (leaves room for data)
- Reduce data to 40k chars (was 45k, combined with analysis was too large)
- Add instruction: "You MUST call extract_assets with at least one asset"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:43:38 -04:00
DJP
a79529470e Fix header detection: distinguish header text from data values
Problem: Header detection picked data rows (with Yes/No/numbers) as headers
because they had more filled cells than the actual header row (which had
merged cells with gaps). Result: data values became column labels, deep
extraction failed.

Fix:
- Header values must be text-like (not numbers, Yes/No, 0/1, ü, x, -)
- Only consecutive header rows count - stop scanning at first data row
- Multi-row headers combined (row 1 + row 2 both contribute)
- Tested against Wella Job Routes 2: correctly identifies row 2 as header
  with "Buckets | Categories | Top 10 deliverables | Tier A | Tier B | Tier C"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:15:00 -04:00
DJP
8a2b45ae31 Deep extraction: live progress between passes + elapsed timer
- Split deep extraction into two separate functions (pass1 + pass2)
  so the background task can update DB between them
- Progress now shows:
  "Pass 1/2: Analyzing structure... (this takes 20-40 seconds)"
  "Pass 1 complete (23s). Pass 2/2: Extracting assets..."
  "Deep extraction complete (52s total). Found 45 assets."
- Live elapsed timer (seconds) shown in the upload spinner
- Timer ticks every second so user knows it's not hung

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:03:04 -04:00
DJP
714ab98388 Normal + Deep Extraction modes for complex client files
Problem: Complex client Excel files (30+ columns, merged cells, Q&A columns,
tier data) produced zero assets because the extraction was a dumb pipe dump
that lost all column context.

Fix:
- Smart Excel extraction: detects header rows, labels each value with its
  column name, skips empty sheets, handles merged cells. Claude now sees
  "Top 10 deliverables: Toolbox presentation deck | Tier A: Yes | 1"
  instead of "Toolbox | Base | Toolbox presentation deck | ü' | Yes | 1"

- Two extraction modes on Upload tab:
  - Normal: fast single-pass extraction (~$0.05)
  - Deep Extraction: two-pass AI analysis (~$0.15-0.30)
    Pass 1: Claude analyzes the spreadsheet structure
    Pass 2: Claude extracts assets using the structural understanding

- Upload endpoint accepts ?mode=normal|deep query parameter
- Background parse shows "Deep extraction: analyzing structure (Pass 1 of 2)"
- Tested against both Wella files - header-aware extraction produces
  clear labelled output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 09:46:04 -04:00
DJP
3cb1973f57 Fix tier matching: use client tier to pick correct complexity variant
- Doc parser now extracts tier labels (Tier A, A, Gold, etc.) per asset
- Matching uses tier to find the correct GMAL complexity variant:
  - Claude matches to the GMAL family (asset type)
  - Post-match lookup: (asset_name + target_complexity_level) finds exact variant
  - e.g. "Banner - Tier A" with A=Complex → finds Complex variant by asset_name query
- Tier hint passed to Claude prompt for better matching
- No blind expansion - only the tier-appropriate GMAL is matched
- Expand to Tiers button still available for when client doesn't specify tiers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:17:20 -04:00
DJP
26d3435be0 Improve matching, upload UX, collapse fix, full catalog approach
- Upload now shows live stage progress (uploading -> extracting -> AI parsing -> done)
- Fix match group collapse: proper React state instead of DOM manipulation
- Replace pre-filter with full GMAL catalog sent to Claude (~3k tokens, <$0.01)
  - FTS and keyword matching missed too many semantic matches
  - Claude now sees all 243 assets and uses semantic understanding
- Improved system prompt with terminology bridges for better scoring
- Per-project AI cost tracking persisted to DB
- Parallel matching with cancel support
- Auto-select matches >= 80%, YOLO button for rest
- Debug panel for AI call inspection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 19:22:08 -04:00
DJP
e18976fdb2 Initial commit - GMAL Scope Builder
Dockerized web app (FastAPI + React + PostgreSQL) for scoping client ratecards
against the GMAL master asset database. Features:
- GMAL data ingestion from Excel (390 assets, 120 roles, 5 model types)
- AI-powered document parsing and asset extraction (Claude Opus 4.6)
- AI matching engine with parallel batching, confidence scoring, caveats
- Ratecard builder with hours x volume calculation
- Excel and PDF export
- GMAL browser and inline editor
- AI cost tracking per project (persisted to DB)
- Debug panel for AI call inspection
- Dark theme UI with gold (#FFC407) accent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 17:35:14 -04:00