Commit graph

12 commits

Author SHA1 Message Date
DJP
d3f6a57386 Round 2.5 feedback: TM replacements take effect, supplementary files reach LLM, larger briefs fit, free-text channel uploads
TM upload-replacement bug (critical):
- Uploads were writing to /storage/clients/<uuid>/tm/... but the pipeline
  reads from /storage/amazon/tm/... — replacements were silently ignored
- upload_tm_file now writes to the canonical pipeline path
  /storage/amazon/tm/<locale>/flat_<channel>_<lc>.json (overwrites in place)
- Filename casing is preserved when an existing file is being replaced
  (the on-disk seeded files use mixed casing: flat_MASS, flat_value,
  flat_PrimeSpeed); falls back to CHANNEL_FILE_MAP, then user-typed case
- Registry upsert by (client_id, locale_code, channel): replaces row in
  place rather than inserting duplicates
- Verified: replacement file at canonical path, registry COUNT=1, no dupes

Supplementary files now reach the LLM (critical):
- New supplementary_files field on FileManifest
- _resolve_file_manifest scans /storage/jobs/<job_id>/supplementary/ and
  populates the manifest, with per-locale gating by filename prefix
  (e.g. de-DE_glossary.txt only goes to de-DE; global_brief.txt goes to all)
- _format_supplementary_for_prompt reads each file (.txt/.md/.json/.csv/.tsv
  /.docx) and inlines its text into the LLM user message under a
  "## SUPPLEMENTARY MATERIAL" header, capped at 40k chars per file
- .docx files are extracted via inline zipfile read (no new dependency)

New job wizard:
- Per-supplementary-file locale dropdown ("Global" or one of 12 locales)
- Filename gets prefixed with the locale on upload (de-DE_brief.docx)

Admin TM upload:
- Channel field is now a free-text input with autocomplete suggestions
  (datalist of known channels) — lets users add brand-new channels like
  PrimeCBM that didn't exist before

Pipeline scaling:
- Bumped dynamic max_tokens tiers: 80+ lines now gets 64k output budget
  (was 32k); 132-line briefs no longer truncate. Sonnet 4.6 caps at 64k
- Added stop_reason logging — "max_tokens" stop now shows up in logs
  loud and clear rather than silently truncating

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:28:20 -04:00
DJP
9825b0497c Round 2 feedback: parser fix, dynamic max_tokens, polling, TM auto-discovery, reviewer comments in export
A1 Export columns shifted (critical):
- V25 LLM occasionally emits 12/13-col tables with Copy Type/Char Limit prefix
- Parser now anchors on "Option 1" header position; robust to any prefix shift
- Verified with 23/23 unit tests covering 11/12/13-col variants
- Source-line block in prompt no longer uses pipe separators (defence in depth)

A2 Linguistic summary fallback:
- Drop the metadata key/value table fallback on Tab 2
- Show "No linguistic summary was generated" when the agent didn't produce one

A3 Dashboard stuck on "Running":
- useJobs / useJob now poll every 5s while any job/locale is in an active state
- Stops polling once everything is COMPLETED or ERROR

B1 TM auto-config: respect empty selection
- Send no TM files when user unchecks all (was auto-adding campaign channel)
- Backend distinguishes empty list vs missing field

B2 Auto-discover channels from TM registry:
- New GET /api/v1/files/tm/channels endpoint reads distinct channels from registry
- Frontend StepConfigure fetches channels per client; falls back to static list
- Pipeline TM resolution falls back to flat_<Channel>_<lc>.json pattern for any
  registered channel (no hardcoded map needed for new channels like PrimeCBM)

B3 Job inputs visible on monitoring:
- New "Inputs sent to the agent" card on /jobs/[id] showing AI model, TM files,
  supplementary file list, and context override
- New GET /api/v1/jobs/{id}/supplementary endpoint listing on-disk supplementary files

C1 Context cap (large briefs truncating):
- max_tokens scales with source line count (8k/16k/32k/64k by tier)
- 172-line briefs now have ~64k output budget instead of fixed 16k

D1 Reviewer comments in xlsx export:
- Export endpoint now copies xlsx to temp path on download, queries Feedback
  joined with User, and appends "Reviewer (Name): comment" to the rationale
  cells of options that have feedback
- Original generated file remains untouched

D2 Hide Clients & Voice from sidebar (page still reachable by URL)
D3 Remove dead notifications + settings icons from header
D4 Cost by Locale table added to Analytics with total + avg cost per brief

Makefile seed target now also runs register_storage_files so TM registry is
populated from disk on first setup (deploy.sh already does this via --init).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-04 16:12:47 -04:00
DJP
d5fa4e49f7 Fix markdown table parser losing backtranslations/rationales, add model selection, update help page
The V25 table has duplicate column names (Backtranslation x3, Rationale x3).
The dict-based parser collapsed these — only the last value survived (Option 3's
"N/A"), causing all BT/rationale fields to be "N/A" in the output Excel.

Fixed by switching to positional list-based parsing instead of dicts.

Also adds per-job model selection (Sonnet 4.6 / Opus 4.6) through the full
stack: DB column, API schema, job wizard UI dropdown, pipeline contracts, and
LLM client with model-aware cost tracking. Includes Alembic migration.

Updated help page and README to reflect single-agent pipeline, multi-TM
selection, flat locale grid, model selector, and linguistic summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 12:40:17 -04:00
DJP
d56311b862 Implement standalone agent feedback: consolidated locale selector, multi-TM selection, single-agent pipeline, and linguistic summary
Four changes from user testing feedback:
1. Merge MAIN/DERIVED locale selectors into single 12-locale grid, auto-classify locale_type
2. Add multi-TM channel selection (checkbox grid, tm_channels JSON column, multi-file resolution)
3. Replace 6-agent pipeline with single V25-based agent (feature-flagged via USE_SINGLE_AGENT)
4. Replace Excel Tab 2 metadata with linguistic summary from agent output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 12:09:51 -04:00
DJP
437b9bd793 fix: add db.refresh after rerun_locale flush to prevent MissingGreenlet
The rerun endpoint returned 500 because Pydantic tried to serialize
updated_at from a stale SQLAlchemy instance after flush(). Added
db.refresh(instance) to ensure all attributes are loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 17:59:09 -04:00
DJP
52bc499272 fix: sidebar highlighting for shared paths and report SQL errors
Fix sidebar nav so Dashboard/Monitoring and Audit Trail/System Logs
highlight independently by using useSearchParams to distinguish
query-param-based routes. Fix get_jobs_over_time SQL GroupingError
by using literal_column for date_trunc interval. Add date filters to
status_breakdown query and fix Decimal serialization in locale stats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 17:25:34 -04:00
DJP
5ef7e588b6 feat: wire analytics to real data and add audit logging across all endpoints
Replace mock chart data on reports page with real backend queries (jobs over
time, locale stats, usage stats, quality metrics). Add audit logging to auth
(login/login_failed), file management (upload/delete TM and reference files),
and feedback submission so the system logs page shows complete activity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 17:17:14 -04:00
DJP
84f37a4649 feat: wire audit trail page to real backend data
- Fix API path: frontend now calls /audit/logs (was /audit)
- Backend eagerly loads User relationship for audit entries
- Backend response includes user_name field instead of just user_id
- Frontend logs page fetches real data with pagination
- Derive INFO/WARN/ERROR levels from action type
- Format details JSON into readable descriptions
- Add loading state and empty state handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 16:59:36 -04:00
DJP
8b07a59da0 fix: persist feedback/comments across page reloads on review page
Feedback was saving to DB but never loaded back on page revisit.
Three-point fix:
- Backend schema: add feedback list to OutputRowResponse
- Backend service: eagerly load feedback relationship in preview query
- Frontend mapper: map latest feedback entry to OutputRow.feedback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 16:54:49 -04:00
DJP
f2398e04c4 feat: add real-time progress tracking and admin job deletion
Progress tracking:
- Add progress (0-100%) and current_stage columns to locale_instances
- Wire orchestrator on_progress callback to update DB at each pipeline stage
- Agent 4 reports batch-level sub-progress (e.g. "Translating batch 2/4")
- Frontend reads real progress/stage data instead of hardcoding 50%
- Stages: Loading Files → Matching TM → Ranking → Translating (per-batch) → Reviewing → Formatting → Complete

Job deletion:
- DELETE /jobs/{id} endpoint (admin-only, 403 for non-admins)
- Cannot delete running jobs (must cancel first)
- Cascades to locale instances, output rows, source lines
- Frontend: Delete button with confirmation on job monitoring page (admin only)

Also: compute live duration_seconds from started_at, map pipeline stages to UI status badges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 16:18:59 -04:00
DJP
f271343bc0 feat: wire job wizard and dashboard to real backend API
- Job wizard now calls real API: create job → upload source → launch
- Dashboard and monitoring pages use live data instead of mock data
- Monitoring page polls every 3s while job is active
- Backend enriches job responses with client_name, created_by_name,
  source_line_count from eager-loaded relationships
- Frontend response mappers handle backend→frontend type differences
  (lowercase enum values, field name mapping, computed progress/stage)
- Source file parser accepts column aliases (Line type, Context notes)
  with case-insensitive matching for real-world Excel files
- Clients list endpoint accessible to all authenticated users
- Fixed uploadSource to use PUT, uploadSupplementary per-file
- Removed all hardcoded mock data from useJobs hook

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 14:18:47 -04:00
DJP
98fa16bfc3 feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton
Full-stack Amazon AI Transcreation Platform with:
- FastAPI backend (async, PostgreSQL, Redis, Celery) with 11 DB tables
- JWT auth (SSO-ready abstract provider pattern)
- 6-agent pipeline orchestrator with deterministic modules
- Next.js 14 frontend with Amazon branding (Ember fonts, orange/dark theme)
- Job wizard, monitoring HUD, output review, admin screens
- 154 TM/reference files imported, 12 locales configured
- Docker Compose for all services

Agents 2-5 (TM retrieval, ranker, transcreator, compliance) are stubs
pending Phase 3 LLM integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:31:43 -04:00