Problem: Header detection picked data rows (with Yes/No/numbers) as headers because they had more filled cells than the actual header row (which had merged cells with gaps). Result: data values became column labels, deep extraction failed. Fix: - Header values must be text-like (not numbers, Yes/No, 0/1, ü, x, -) - Only consecutive header rows count - stop scanning at first data row - Multi-row headers combined (row 1 + row 2 both contribute) - Tested against Wella Job Routes 2: correctly identifies row 2 as header with "Buckets | Categories | Top 10 deliverables | Tier A | Tier B | Tier C" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| api | ||
| middleware | ||
| models | ||
| schemas | ||
| services | ||
| utils | ||
| __init__.py | ||
| config.py | ||
| database.py | ||
| main.py | ||