amazon-transcreation/backend/app/pipeline/modules
DJP 2b44c3b4ee Round 2.8: ref loaders read the actual on-disk file shape
The agent reported (for an nl-BE job) that glossary and blacklist were
"not provided" and date/percentage formats were "provided but empty".
The files are on disk with real content — the bug was in the loaders,
which expected shapes that didn't match what's actually shipped:

- load_glossary expected a top-level JSON list, but files use
  {"locale": "...", "entries": [...]}. RefFileLoadError raised,
  silently caught by load_all_reference_files, result became None.
- load_blacklist had the same mismatch, same outcome.
- load_date_pct_formats accepted the dict shape but only knew about
  the "date_formats"/"percentage_formats" keys; the files use
  "entries" → returned {"date_formats": [], "percentage_formats": []}
  which is exactly what the agent reported.

Fix:
- New _extract_entries() helper that accepts both the wrapper shape
  {entries: [...]} and a bare list. load_glossary / load_blacklist
  both delegate to it.
- load_date_pct_formats now passes entries through alongside the
  legacy date_formats / percentage_formats keys (back-compat).
- load_all_reference_files now logs a warning when a loader raises
  RefFileLoadError instead of silently swallowing it — so any future
  loader/file-shape drift surfaces in the celery logs.

Verified inside the backend container against nl-BE, de-DE, fr-FR:
- 58 / 68 / 64 glossary entries respectively (was 0)
- 14 / 9 / 4 blacklist entries (was 0)
- 10 / 10 / 10 date/pct entries (was empty)
- locale_considerations and tov_global still load correctly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:26:59 -04:00
..
__init__.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
blacklist_scanner.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
character_counter.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
date_format_validator.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
domain_substitutor.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
excel_writer.py Round 2 feedback: parser fix, dynamic max_tokens, polling, TM auto-discovery, reviewer comments in export 2026-05-04 16:12:47 -04:00
line_break_normaliser.py feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
ref_file_loader.py Round 2.8: ref loaders read the actual on-disk file shape 2026-05-18 14:26:59 -04:00
source_file_parser.py feat: wire job wizard and dashboard to real backend API 2026-04-10 14:18:47 -04:00
tm_file_loader.py fix: improve TM parser EN/TX split and fix report SQL errors 2026-04-10 17:47:53 -04:00