Commit graph

3 commits

Author SHA1 Message Date
nickviljoen
c51e0729ce fix(excel-processor): wrap extraction in try/except to honour 'never raises'
Code review found that _extract_workbook_text was unwrapped — a
corrupt/locked .xlsx or InvalidFileException would leak out of
process_excel_file despite the docstring promising 'Never raises'.
Wrap the extraction call too; on extraction failure, write a
degraded summary explaining the failure and return cleanly.

Verified by passing a non-existent file: the function returns a
degraded summary instead of raising FileNotFoundError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:55:54 +02:00
nickviljoen
abd36a9abe fix(excel-processor): use literal trademark glyphs in summary prompt
Spec requires "™, ®, ©" in the Approved Brand and Product Names
section instructions; first pass wrote "TM, R, C" out of unfounded
caution about encoding. Python 3 source handles UTF-8 fine and
pdf_processor.py uses smart punctuation throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:52:38 +02:00
nickviljoen
ed46504ac6 feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging
Mirrors pdf_processor.py — public process_excel_file() reads any HP
Source Messaging Excel, extracts cells via openpyxl (skipping empty
rows, capped at 50K chars), and summarises into structured Markdown
via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md.

On Gemini failure the processor writes a degraded summary containing
the raw extraction so the reference asset stays usable. Test fixtures
(real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored.
2026-05-17 20:49:50 +02:00