amazon-transcreation/backend/app/pipeline
DJP 2b44c3b4ee Round 2.8: ref loaders read the actual on-disk file shape
The agent reported (for an nl-BE job) that glossary and blacklist were
"not provided" and date/percentage formats were "provided but empty".
The files are on disk with real content — the bug was in the loaders,
which expected shapes that didn't match what's actually shipped:

- load_glossary expected a top-level JSON list, but files use
  {"locale": "...", "entries": [...]}. RefFileLoadError raised,
  silently caught by load_all_reference_files, result became None.
- load_blacklist had the same mismatch, same outcome.
- load_date_pct_formats accepted the dict shape but only knew about
  the "date_formats"/"percentage_formats" keys; the files use
  "entries" → returned {"date_formats": [], "percentage_formats": []}
  which is exactly what the agent reported.

Fix:
- New _extract_entries() helper that accepts both the wrapper shape
  {entries: [...]} and a bare list. load_glossary / load_blacklist
  both delegate to it.
- load_date_pct_formats now passes entries through alongside the
  legacy date_formats / percentage_formats keys (back-compat).
- load_all_reference_files now logs a warning when a loader raises
  RefFileLoadError instead of silently swallowing it — so any future
  loader/file-shape drift surfaces in the celery logs.

Verified inside the backend container against nl-BE, de-DE, fr-FR:
- 58 / 68 / 64 glossary entries respectively (was 0)
- 14 / 9 / 4 blacklist entries (was 0)
- 10 / 10 / 10 date/pct entries (was empty)
- locale_considerations and tov_global still load correctly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:26:59 -04:00
..
agents Round 2.7: three broken promises — empty TM, supplementary files, new-TM casing 2026-05-11 10:57:21 -04:00
modules Round 2.8: ref loaders read the actual on-disk file shape 2026-05-18 14:26:59 -04:00
__init__.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
contracts.py Round 2.5 feedback: TM replacements take effect, supplementary files reach LLM, larger briefs fit, free-text channel uploads 2026-05-05 14:28:20 -04:00
orchestrator.py Implement standalone agent feedback: consolidated locale selector, multi-TM selection, single-agent pipeline, and linguistic summary 2026-04-14 12:09:51 -04:00