The V25 table has duplicate column names (Backtranslation x3, Rationale x3). The dict-based parser collapsed these — only the last value survived (Option 3's "N/A"), causing all BT/rationale fields to be "N/A" in the output Excel. Fixed by switching to positional list-based parsing instead of dicts. Also adds per-job model selection (Sonnet 4.6 / Opus 4.6) through the full stack: DB column, API schema, job wizard UI dropdown, pipeline contracts, and LLM client with model-aware cost tracking. Includes Alembic migration. Updated help page and README to reflect single-agent pipeline, multi-TM selection, flat locale grid, model selector, and linguistic summary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
683 lines
36 KiB
Markdown
683 lines
36 KiB
Markdown
# Amazon AI Transcreation Platform
|
|
|
|
An AI-powered transcreation platform that adapts Amazon marketing copy across 12 European locales using Claude LLM agents. Replaces a manual LibreChat workflow with structured, one-click multi-locale processing, real-time monitoring, in-app review, and proper job/file management.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [Architecture Overview](#architecture-overview)
|
|
- [How It Works](#how-it-works)
|
|
- [The Agent Pipeline](#the-agent-pipeline)
|
|
- [Tech Stack](#tech-stack)
|
|
- [Getting Started](#getting-started)
|
|
- [Configuration](#configuration)
|
|
- [Storage Layout](#storage-layout)
|
|
- [Supported Locales & Channels](#supported-locales--channels)
|
|
- [API Reference](#api-reference)
|
|
- [Database Schema](#database-schema)
|
|
- [User Guide](#user-guide)
|
|
- [Development](#development)
|
|
- [Deployment](#deployment)
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
│ AMAZON TRANSCREATION PLATFORM │
|
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌──────────────────┐ ┌──────────────────────────────────────────────────┐
|
|
│ │ HTTP │ FastAPI Backend │
|
|
│ Next.js 14 │ ◄─────►│ │
|
|
│ Frontend │ REST │ ┌────────────┐ ┌──────────┐ ┌─────────────┐ │
|
|
│ │ │ │ Auth │ │ Jobs │ │ Output │ │
|
|
│ ┌────────────┐ │ Poll │ │ Service │ │ API │ │ API │ │
|
|
│ │ Dashboard │ │ (3sec) │ └────────────┘ └────┬─────┘ └─────────────┘ │
|
|
│ │ Job Wizard │ │ │ │ │
|
|
│ │ Monitor │ │ │ ┌────────────────────▼───────────────────────┐ │
|
|
│ │ Review │ │ │ │ Celery Task Queue │ │
|
|
│ │ Admin │ │ │ │ (4 concurrent workers) │ │
|
|
│ └────────────┘ │ │ └────────────────────┬───────────────────────┘ │
|
|
└──────────────────┘ │ │ │
|
|
│ ┌────────────────────▼───────────────────────┐ │
|
|
│ │ Pipeline Orchestrator │ │
|
|
│ │ │ │
|
|
│ │ VALIDATE ► SINGLE_AGENT ► FORMAT ► DONE │ │
|
|
│ │ │ │
|
|
│ │ (Single LLM call with full V25 prompt) │ │
|
|
│ └─────────────────────────────────────────────┘ │
|
|
└──────────┬──────────────────────────┬─────────────┘
|
|
│ │
|
|
┌──────────▼──────┐ ┌──────────▼──────────┐
|
|
│ PostgreSQL 16 │ │ Redis 7 │
|
|
│ │ │ │
|
|
│ 11 tables │ │ Celery broker │
|
|
│ Jobs, Output, │ │ Task results │
|
|
│ Users, Audit │ │ WebSocket pub/sub │
|
|
└─────────────────┘ └─────────────────────┘
|
|
|
|
┌─────────────────┐ ┌─────────────────────┐
|
|
│ Claude API │ │ File Storage │
|
|
│ (Anthropic) │ │ │
|
|
│ │ │ /storage/amazon/ │
|
|
│ Single agent │ │ tm/ (JSONL) │
|
|
│ (1 LLM call) │ │ ref/ (JSON) │
|
|
└─────────────────┘ └─────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
### The Workflow (End to End)
|
|
|
|
```
|
|
USER PLATFORM CLAUDE API
|
|
│ │ │
|
|
│ 1. Create Job │ │
|
|
│ (campaign, locale, │ │
|
|
│ channel, programme) │ │
|
|
│ ──────────────────────────►│ │
|
|
│ │ │
|
|
│ 2. Upload Source xlsx │ │
|
|
│ (EN_GB lines, char │ │
|
|
│ limits, copy types) │ │
|
|
│ ──────────────────────────►│ │
|
|
│ │ │
|
|
│ 3. Launch │ │
|
|
│ ──────────────────────────►│ │
|
|
│ │ Celery dispatches per-locale │
|
|
│ │ tasks in PARALLEL (up to 4) │
|
|
│ │ ─────────┐ │
|
|
│ │ │ │
|
|
│ 4. Monitor Progress │ ┌───────▼────────┐ │
|
|
│ (polls every 3 sec) │ │ Agent Pipeline │ │
|
|
│ ◄─── 10% Loading Files ───│ │ │ │
|
|
│ ◄─── 20% Transcreating ───│ │ Single agent │──── LLM ────►│
|
|
│ ◄─── 90% Formatting ──────│ │ per locale │◄── table ────│
|
|
│ ◄── 100% Complete ────────│ │ │ │
|
|
│ │ └───────┬─────────┘ │
|
|
│ │ ┌───────▼────────┐ │
|
|
│ 5. Review Output │ │ Output saved │ │
|
|
│ (per-locale, per-line │ │ to DB + xlsx │ │
|
|
│ with confidence tiers) │ └────────────────┘ │
|
|
│ ──────────────────────────►│ │
|
|
│ │ │
|
|
│ 6. Approve / Revise │ │
|
|
│ ──────────────────────────►│ │
|
|
│ │ │
|
|
│ 7. Download xlsx │ │
|
|
│ ◄──────────────────────────│ │
|
|
```
|
|
|
|
### What Happens When You Launch a Job
|
|
|
|
1. **Job created** with campaign name, programme (Retail/Prime/Brand), channel, multiple TM files, and target locales (all 12 selectable in a single flat list)
|
|
2. **Source file uploaded** - an xlsx with English (en_GB) marketing copy, character limits, copy types, and creative guidance
|
|
3. **Launch** dispatches one Celery task per locale - up to 4 run in parallel
|
|
4. Each locale runs through the **single-agent pipeline** — one LLM call with the full V25 prompt (see below)
|
|
5. Real-time **progress updates** are stored in the database and polled by the frontend every 3 seconds
|
|
6. On completion, output is viewable in the **review interface** with confidence badges, backtranslations, and rationale
|
|
7. **Export** downloads a formatted xlsx (Tab 1: output table, Tab 2: linguistic summary)
|
|
|
|
---
|
|
|
|
## The Agent Pipeline
|
|
|
|
### Single-Agent Pipeline (Default)
|
|
|
|
The platform uses a **single consolidated LLM call** with the complete V25 Agent Instructions JSON as the system prompt. This replaces the earlier 6-agent sequential pipeline and produces better results by preserving inter-step context (TM reasoning, ranking rationale, cultural nuance) within a single prompt.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ PER-LOCALE PIPELINE │
|
|
│ │
|
|
│ ┌──────────────┐ Deterministic. Parses xlsx, loads glossary, │
|
|
│ │ VALIDATE │ blacklist, TOV, locale considerations, and │
|
|
│ │ [no LLM] │ date/percent format files. Builds PipelineContext. │
|
|
│ └──────┬───────┘ ~0.1 seconds │
|
|
│ │ 10% │
|
|
│ ┌──────▼───────┐ Single LLM call using V25 Agent Instructions. │
|
|
│ │ SINGLE │ System prompt: full V25 JSON (899 lines). │
|
|
│ │ AGENT │ User message: job params, ALL source lines, │
|
|
│ │ [1 LLM call]│ ALL TM entries (multiple channels), ALL reference │
|
|
│ │ │ files (glossary, blacklist, TOV, locale rules). │
|
|
│ │ │ │
|
|
│ │ │ The agent handles TM matching, ranking, │
|
|
│ │ │ transcreation, and compliance in one pass. │
|
|
│ │ │ Outputs a markdown table + linguistic summary. │
|
|
│ └──────┬───────┘ ~2-4 min, ~$0.30-0.50 │
|
|
│ │ 20-90% │
|
|
│ ┌──────▼───────┐ Deterministic. Generates output xlsx: │
|
|
│ │ FORMAT │ Tab 1: 11-column output table │
|
|
│ │ [no LLM] │ Tab 2: Linguistic summary from the agent │
|
|
│ └──────┬───────┘ ~0.1 seconds │
|
|
│ │ 100% │
|
|
│ ▼ │
|
|
│ DONE (~2-4 min total per locale) │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Legacy 6-Agent Pipeline (Feature Flag)
|
|
|
|
The original 6-agent sequential pipeline is preserved behind a feature flag (`USE_SINGLE_AGENT=false`). It runs: VALIDATE → TM_RETRIEVE → RANK → TRANSCREATE → COMPLY (retry x3) → FORMAT → DONE. This path makes 2+ LLM calls (TM retrieval + transcreation in batches) and takes longer (~5.5 min per locale).
|
|
|
|
### Confidence Tiers and Option Counts
|
|
|
|
```
|
|
TM Match Quality Confidence Options Generated
|
|
───────────────────────── ──────────── ──────────────────
|
|
Same channel + recent year HIGH 1 option (anchored to TM)
|
|
Cross-channel or older MODERATE 2 options
|
|
No TM match found LOW 3 creative options
|
|
```
|
|
|
|
### Voice Profiles (per Programme)
|
|
|
|
| Programme | Voice Attributes |
|
|
|-----------|-----------------|
|
|
| **Retail** | Real, Clear, Playful, Witty |
|
|
| **Prime** | Optimistic, Honest, Self-aware, Witty, Relatable |
|
|
| **Brand** | Authentic, Customer-obsessed, Intelligent, Warm, Understated |
|
|
|
|
### Deterministic Modules
|
|
|
|
The pipeline uses 9 pure-Python modules (no LLM) for specific tasks:
|
|
|
|
| Module | Purpose |
|
|
|--------|---------|
|
|
| `source_file_parser` | Parse xlsx, validate columns, detect display format |
|
|
| `tm_file_loader` | Parse JSONL TM files (compact + multi-field formats) |
|
|
| `ref_file_loader` | Load glossary, blacklist, TOV, locale considerations |
|
|
| `character_counter` | Unicode grapheme cluster counting (not `len()`) |
|
|
| `blacklist_scanner` | Exact + root-based forbidden term matching |
|
|
| `date_format_validator` | Validate date/percent formats per locale |
|
|
| `domain_substitutor` | Amazon.co.uk to locale-specific domain mapping |
|
|
| `line_break_normaliser` | Handle `\n` for TM queries vs Excel output |
|
|
| `excel_writer` | Generate formatted xlsx (Tab 1: output table, Tab 2: linguistic summary) |
|
|
|
|
---
|
|
|
|
## Tech Stack
|
|
|
|
```
|
|
┌───────────────────────────────────────────────────────────────┐
|
|
│ FRONTEND │ BACKEND │ INFRASTRUCTURE │
|
|
├───────────────────────┼──────────────────────┼────────────────┤
|
|
│ Next.js 14 (App Rtr) │ Python 3.12 │ Docker Compose │
|
|
│ React 18 │ FastAPI │ PostgreSQL 16 │
|
|
│ TypeScript 5.4 │ SQLAlchemy 2 (async) │ Redis 7 │
|
|
│ Tailwind CSS 3.4 │ Alembic (migrations) │ Nginx (prod) │
|
|
│ Radix UI primitives │ Celery 5.4 │ │
|
|
│ Recharts (charts) │ Pydantic v2 │ │
|
|
│ Axios │ Anthropic SDK │ │
|
|
│ Lucide (icons) │ openpyxl │ │
|
|
│ │ bcrypt + JWT │ │
|
|
└───────────────────────┴──────────────────────┴────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Docker and Docker Compose v2
|
|
- An Anthropic API key (for Claude)
|
|
- Node.js 18+ (for frontend builds)
|
|
- Git
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# 1. Clone the repository
|
|
git clone git@bitbucket.org:zlalani/amazon-transcreation.git
|
|
cd amazon-transcreation
|
|
|
|
# 2. Copy environment file and set your API key
|
|
cp .env.example .env
|
|
# Edit .env and set:
|
|
# ANTHROPIC_API_KEY=sk-ant-your-key-here
|
|
# JWT_SECRET_KEY=a-random-secret-string
|
|
|
|
# 3. Start all services
|
|
make up
|
|
# or: docker compose up -d
|
|
|
|
# 4. Run database migrations
|
|
make migrate
|
|
|
|
# 5. Seed default data (Amazon client + test users)
|
|
make seed
|
|
|
|
# 6. Build the frontend
|
|
cd frontend && npm install && npm run build && cd ..
|
|
|
|
# 7. Access the application
|
|
# Backend API: http://localhost:8040/api/v1
|
|
# Frontend: http://localhost:3000
|
|
```
|
|
|
|
### Default Users (after seeding)
|
|
|
|
| Email | Role | Password |
|
|
|-------|------|----------|
|
|
| admin@amazon.com | Admin | admin123 |
|
|
| manager@amazon.com | TM Manager | admin123 |
|
|
| reviewer@amazon.com | Reviewer | admin123 |
|
|
|
|
### Makefile Commands
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `make up` | Start all Docker services |
|
|
| `make down` | Stop all services |
|
|
| `make build` | Rebuild Docker images |
|
|
| `make migrate` | Run database migrations |
|
|
| `make seed` | Seed default client and test users |
|
|
| `make test` | Run backend test suite |
|
|
| `make shell` | Open a bash shell in the backend container |
|
|
| `make logs` | Stream all container logs |
|
|
| `make restart` | Restart backend + Celery worker |
|
|
| `make db-shell` | Open PostgreSQL interactive shell |
|
|
| `make redis-cli` | Open Redis CLI |
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
All configuration is via environment variables in `.env`:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DATABASE_URL` | `postgresql+asyncpg://...` | PostgreSQL async connection string |
|
|
| `REDIS_URL` | `redis://redis:6379/0` | Redis connection for Celery + pub/sub |
|
|
| `ANTHROPIC_API_KEY` | *(required)* | Your Anthropic API key for Claude |
|
|
| `JWT_SECRET_KEY` | *(required)* | Secret key for JWT token signing |
|
|
| `JWT_ALGORITHM` | `HS256` | JWT signing algorithm |
|
|
| `JWT_EXPIRY_HOURS` | `8` | Access token expiry in hours |
|
|
| `STORAGE_ROOT` | `/storage` | Root path for file storage |
|
|
| `LLM_MODEL` | `claude-sonnet-4-6` | Default Claude model (overridden per-job via UI: `claude-sonnet-4-6` or `claude-opus-4-6`) |
|
|
| `USE_SINGLE_AGENT` | `true` | Use single-agent pipeline (`true`) or legacy 6-agent (`false`) |
|
|
|
|
---
|
|
|
|
## Storage Layout
|
|
|
|
```
|
|
storage/amazon/
|
|
├── tm/ # Translation Memory files (JSONL)
|
|
│ ├── de-DE/
|
|
│ │ ├── flat_MASS_de-de.json # Mass channel TM
|
|
│ │ ├── flat_value_de-de.json # Value channel TM
|
|
│ │ ├── flat_Onsite_de-de.json # Onsite channel TM
|
|
│ │ ├── flat_Outbound_de-de.json # Outbound channel TM
|
|
│ │ ├── flat_UEFA_de-de.json # UEFA channel TM
|
|
│ │ └── ... # + BDA, DoubleDonut, EUSelection, etc.
|
|
│ ├── fr-FR/
|
|
│ │ └── ...
|
|
│ └── ... (12 locale directories)
|
|
│
|
|
└── ref/ # Reference files (JSON)
|
|
├── glossary/ # Locale-specific term glossaries
|
|
│ ├── de_DE_glossary.json
|
|
│ └── ...
|
|
├── blacklist/ # Forbidden terms per locale
|
|
│ ├── de_DE_blacklist.json
|
|
│ └── ...
|
|
├── tov_global/ # Global Tone of Voice guidelines
|
|
│ └── Amazon_TOV_Guidelines_for_Transcreation_290224.json
|
|
├── tov_supplement/ # Supplementary TOV (de-DE, de-AT)
|
|
│ └── DE_AT_TOV_Guidelines.json
|
|
├── locale_considerations/ # Locale-specific rules and notes
|
|
│ └── ...
|
|
└── date_pct_formats/ # Approved date/percentage formats
|
|
└── ...
|
|
```
|
|
|
|
### TM File Format (JSONL)
|
|
|
|
Each line is a JSON object. Two formats are supported:
|
|
|
|
**Compact format** (existing files):
|
|
```json
|
|
{"t": "Value Q1 24 Radio 001 VO de-de As Sophie opened the door... Als Sophie die Tuer oeffnete..."}
|
|
```
|
|
|
|
**Multi-field format**:
|
|
```json
|
|
{"seg_key": "Value Q1 24 Radio 001", "en": "As Sophie opened...", "lc": "de-de", "tx": "Als Sophie...", "nt": "VO", "channel": "value"}
|
|
```
|
|
|
|
---
|
|
|
|
## Supported Locales & Channels
|
|
|
|
### Locales (12)
|
|
|
|
| Code | Language | Notes |
|
|
|------|----------|-------|
|
|
| de-DE | German (Germany) | Shares TM/TOV supplement with de-AT |
|
|
| de-AT | German (Austria) | Shares TM/TOV supplement with de-DE |
|
|
| fr-FR | French (France) | Shares TM with fr-BE |
|
|
| fr-BE | French (Belgium) | Shares TM with fr-FR |
|
|
| es-ES | Spanish (Spain) | Shares TM with ca-ES |
|
|
| ca-ES | Catalan (Spain) | Enforced as Catalan, not Spanish |
|
|
| it-IT | Italian (Italy) | - |
|
|
| nl-NL | Dutch (Netherlands) | Independent from nl-BE |
|
|
| nl-BE | Dutch (Belgium) | Independent from nl-NL |
|
|
| pl-PL | Polish (Poland) | - |
|
|
| pt-PT | Portuguese (Portugal) | - |
|
|
| sv-SE | Swedish (Sweden) | - |
|
|
|
|
### Channels & TM Files
|
|
|
|
Jobs can select **multiple TM channels** to load into the agent's context. The campaign channel is auto-selected, and users can add additional TM files for cross-channel reference (e.g. MASS as a fallback alongside the primary channel).
|
|
|
|
| Channel | TM File Pattern |
|
|
|---------|----------------|
|
|
| Mass | `flat_MASS_{lc}.json` |
|
|
| Value | `flat_value_{lc}.json` |
|
|
| Onsite | `flat_Onsite_{lc}.json` |
|
|
| Outbound | `flat_Outbound_{lc}.json` |
|
|
| UEFA | `flat_UEFA_{lc}.json` |
|
|
| BDA | `flat_BDA_{lc}.json` |
|
|
| DoubleDonut | `flat_DoubleDonut_{lc}.json` |
|
|
| EUSelection | `flat_EUSelection_{lc}.json` |
|
|
| PrimeDualBenefit | `flat_PrimeDualBenefit_{lc}.json` |
|
|
| PrimeGourmetGuard | `flat_PrimeGourmetGuard_{lc}.json` |
|
|
| PrimeMidfunnel | `flat_PrimeMidfunnel_{lc}.json` |
|
|
| PrimeSpeed | `flat_PrimeSpeed_{lc}.json` |
|
|
| TheKiss | `flat_TheKiss_{lc}.json` |
|
|
|
|
### Programmes & Voice Profiles
|
|
|
|
| Programme | Voice | Description |
|
|
|-----------|-------|-------------|
|
|
| Retail | Real, Clear, Playful, Witty | Everyday value messaging |
|
|
| Prime | Optimistic, Honest, Self-aware, Witty, Relatable | Prime membership benefits |
|
|
| Brand | Authentic, Customer-obsessed, Intelligent, Warm, Understated | Brand-level communications |
|
|
|
|
---
|
|
|
|
## API Reference
|
|
|
|
### Authentication
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/v1/auth/login` | Login (email + password) |
|
|
| POST | `/api/v1/auth/refresh` | Refresh access token |
|
|
| GET | `/api/v1/auth/me` | Get current user claims |
|
|
|
|
### Jobs
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/v1/jobs` | Create job |
|
|
| GET | `/api/v1/jobs` | List jobs (filterable) |
|
|
| GET | `/api/v1/jobs/{id}` | Get job detail + locale instances |
|
|
| DELETE | `/api/v1/jobs/{id}` | Delete job (admin only) |
|
|
| PUT | `/api/v1/jobs/{id}/source` | Upload source xlsx |
|
|
| POST | `/api/v1/jobs/{id}/supplementary` | Upload supplementary file |
|
|
| POST | `/api/v1/jobs/{id}/launch` | Launch processing |
|
|
| POST | `/api/v1/jobs/{id}/cancel` | Cancel job |
|
|
| POST | `/api/v1/jobs/{id}/locales/{code}/rerun` | Re-run single locale |
|
|
|
|
### Output & Feedback
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/v1/output/jobs/{id}/locales/{code}/preview` | Output preview |
|
|
| GET | `/api/v1/output/jobs/{id}/locales/{code}/export` | Download xlsx |
|
|
| POST | `/api/v1/output/feedback` | Submit feedback |
|
|
| GET | `/api/v1/output/feedback/{output_id}` | Get feedback |
|
|
|
|
### File Management
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/v1/files/tm` | Upload TM file |
|
|
| GET | `/api/v1/files/tm` | List TM files |
|
|
| DELETE | `/api/v1/files/tm/{id}` | Delete TM file |
|
|
| POST | `/api/v1/files/reference` | Upload reference file |
|
|
| GET | `/api/v1/files/reference` | List reference files |
|
|
| DELETE | `/api/v1/files/reference/{id}` | Delete reference file |
|
|
|
|
### Admin & Reports
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| CRUD | `/api/v1/users` | User management (admin) |
|
|
| CRUD | `/api/v1/clients` | Client management (admin) |
|
|
| GET | `/api/v1/audit/logs` | Audit trail |
|
|
| GET | `/api/v1/reports/usage` | Usage statistics |
|
|
| GET | `/api/v1/reports/tokens` | Token cost breakdown |
|
|
| GET | `/api/v1/reports/quality` | Quality metrics |
|
|
|
|
---
|
|
|
|
## Database Schema
|
|
|
|
```
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
|
|
│ clients │ │ users │ │ user_clients │
|
|
│──────────────│ │──────────────│ │──────────────────│
|
|
│ id (PK) │◄────│ id (PK) │ │ user_id (FK) │
|
|
│ name │ │ email │ │ client_id (FK) │
|
|
│ settings │ │ name │ │ role_override │
|
|
└──────┬───────┘ │ password_hash│ └──────────────────┘
|
|
│ │ role (enum) │
|
|
│ │ status │
|
|
│ └──────┬───────┘
|
|
│ │
|
|
┌──────▼───────┐ │
|
|
│ jobs │ │
|
|
│──────────────│◄───────────┘ (created_by)
|
|
│ id (PK) │
|
|
│ client_id │ ┌──────────────────┐
|
|
│ campaign_name│ │ source_lines │
|
|
│ programme │ │──────────────────│
|
|
│ channel │ │ id (PK) │
|
|
│ tm_channels │ │ job_id (FK) │
|
|
│ status │◄────│ en_gb │
|
|
│ job_type │ │ copy_type │
|
|
└──────┬───────┘ │ char_limit │
|
|
│ │ char_limit │
|
|
│ │ is_display_format│
|
|
┌──────▼───────────┐ └──────────────────┘
|
|
│ locale_instances │
|
|
│──────────────────│ ┌──────────────────┐
|
|
│ id (PK) │ │ output_rows │
|
|
│ job_id (FK) │ │──────────────────│
|
|
│ locale_code │ │ id (PK) │
|
|
│ status │◄─│ instance_id (FK) │
|
|
│ progress │ │ line_id (FK) │
|
|
│ current_stage │ │ confidence_tier │
|
|
│ token_usage │ │ option_1,2,3 │ ┌──────────────┐
|
|
│ started_at │ │ backtranslation │ │ feedback │
|
|
│ completed_at │ │ rationale │ │──────────────│
|
|
└──────────────────┘ │ char_counts │◄──│ output_id │
|
|
└──────────────────┘ │ user_id │
|
|
│ flag_type │
|
|
┌──────────────────┐ ┌──────────────────┐ │ comment │
|
|
│ tm_file_registry │ │ reference_files │ └──────────────┘
|
|
│──────────────────│ │──────────────────│
|
|
│ client_id │ │ client_id │ ┌──────────────┐
|
|
│ locale_code │ │ file_type │ │ audit_logs │
|
|
│ channel │ │ locale_scope │ │──────────────│
|
|
│ filename │ │ filename │ │ user_id │
|
|
│ segment_count │ │ file_path │ │ action │
|
|
└──────────────────┘ └──────────────────┘ │ entity_type │
|
|
│ details │
|
|
┌──────────────────┐ └──────────────┘
|
|
│ token_usage_logs │
|
|
│──────────────────│
|
|
│ instance_id │
|
|
│ agent_name │
|
|
│ input_tokens │
|
|
│ output_tokens │
|
|
│ estimated_cost │
|
|
└──────────────────┘
|
|
```
|
|
|
|
11 tables total. All primary keys are UUIDs. Cascading deletes from jobs down through locale_instances, output_rows, and source_lines.
|
|
|
|
---
|
|
|
|
## User Guide
|
|
|
|
### Creating a Job
|
|
|
|
1. Navigate to **Jobs > New Job**
|
|
2. Fill in the job details:
|
|
- **Client** - Select the client (e.g. Amazon)
|
|
- **Campaign Name** - Name of the campaign (e.g. "DDA 26 BFW")
|
|
- **Programme** - Retail, Prime, or Brand (determines voice profile)
|
|
- **Channel** - Campaign channel (e.g. Value, Mass, Onsite, Outbound)
|
|
- **TM Files** - Select one or more TM channels to load (campaign channel auto-selected; add MASS as fallback or other channels for cross-reference)
|
|
- **Locales** - All 12 locales in a single flat grid (main and derived locales are auto-classified — no separate "Job Type" selection needed)
|
|
3. Upload the **source xlsx** file with columns:
|
|
- `EN_GB` (required) - English source copy
|
|
- `Copy Type` - Type of copy (headline, body, CTA, script, etc.)
|
|
- `Creative Guidance` - Context or instructions for the transcreator
|
|
- `Visual Ref` - Reference to visual assets
|
|
- `Char Limit` - Maximum character count for the translation
|
|
4. Optionally add a **context/override prompt** with special instructions
|
|
5. Review the summary and click **Launch**
|
|
|
|
### Monitoring Progress
|
|
|
|
Once launched, the job monitoring page shows real-time updates:
|
|
- Per-locale progress bars (0-100%)
|
|
- Current stage: Loading Files > Transcreating > Formatting Output > Complete
|
|
- Token usage and elapsed time
|
|
- Error details if any locale fails
|
|
|
|
Multiple locales process in **parallel** (up to 4 at once).
|
|
|
|
### Reviewing Output
|
|
|
|
Click **Preview** on a completed locale to open the review interface:
|
|
- Each source line shows its **confidence tier** (High / Moderate / Low)
|
|
- **High confidence**: 1 option anchored to a TM match
|
|
- **Moderate confidence**: 2 creative options
|
|
- **Low confidence**: 3 creative options
|
|
- Every option includes a **backtranslation** and **character count**
|
|
- Expandable **rationale** explains the translation choices and TM citations
|
|
- Feedback buttons: **Approve**, **Needs Revision**, or add a **Comment**
|
|
- **Export** button downloads the formatted xlsx (Tab 1: output table, Tab 2: linguistic summary explaining the agent's approach and cultural choices)
|
|
|
|
### Admin Features
|
|
|
|
Admins have access to additional pages:
|
|
- **User Management** - Create, edit, and deactivate users
|
|
- **Client Management** - Manage client configurations
|
|
- **TM Files** - Upload and manage Translation Memory files
|
|
- **Reference Files** - Manage glossaries, blacklists, TOV guidelines
|
|
- **Reports** - Usage statistics, token costs, quality metrics
|
|
- **Audit Logs** - Complete trail of all system actions
|
|
- **Delete Jobs** - Remove old jobs (with confirmation)
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
amazon-transcreation/
|
|
├── backend/
|
|
│ ├── app/
|
|
│ │ ├── main.py # FastAPI app factory
|
|
│ │ ├── config.py # pydantic-settings env loader
|
|
│ │ ├── dependencies.py # DI: get_db, get_current_user
|
|
│ │ ├── auth/ # JWT auth (SSO-ready provider pattern)
|
|
│ │ ├── api/v1/ # REST endpoint routers
|
|
│ │ ├── models/ # SQLAlchemy models (11 tables)
|
|
│ │ ├── schemas/ # Pydantic request/response models
|
|
│ │ ├── services/ # Business logic layer
|
|
│ │ ├── pipeline/
|
|
│ │ │ ├── orchestrator.py # State machine (single-agent or legacy 6-agent)
|
|
│ │ │ ├── contracts.py # Inter-agent Pydantic models
|
|
│ │ │ ├── agents/
|
|
│ │ │ │ ├── agent_single.py # Consolidated single-agent (V25 prompt)
|
|
│ │ │ │ ├── agent_1_validator.py # Deterministic file/input validation
|
|
│ │ │ │ ├── agent_6_formatter.py # Excel output generation
|
|
│ │ │ │ ├── agent_2-5_*.py # Legacy agents (behind feature flag)
|
|
│ │ │ │ └── prompts/
|
|
│ │ │ │ └── v25_instructions.json # V25 Agent Instructions (system prompt)
|
|
│ │ │ └── modules/ # 9 deterministic modules
|
|
│ │ ├── tasks/ # Celery task definitions
|
|
│ │ ├── llm/ # Anthropic SDK wrapper + retry
|
|
│ │ └── ws/ # WebSocket handler + manager
|
|
│ ├── alembic/ # Database migrations
|
|
│ └── tests/
|
|
├── frontend/
|
|
│ └── src/
|
|
│ ├── app/ # Next.js App Router pages
|
|
│ ├── components/ # React UI components
|
|
│ ├── hooks/ # Custom React hooks
|
|
│ └── lib/ # API client, types, utilities
|
|
├── storage/ # Runtime file storage (mounted volume)
|
|
├── docker-compose.yml # Development services
|
|
├── docker-compose.prod.yml # Production services
|
|
├── deploy.sh # Server deployment script
|
|
├── Makefile # Dev convenience commands
|
|
└── .env.example # Environment variable template
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
make test
|
|
# or
|
|
docker compose exec backend python -m pytest tests/ -v
|
|
```
|
|
|
|
### Adding a New Locale
|
|
|
|
1. Create TM files in `storage/amazon/tm/{locale_code}/`
|
|
2. Create reference files in the appropriate `storage/amazon/ref/` subdirectories
|
|
3. Add the locale code to `ALL_LOCALES` in `frontend/src/components/jobs/JobWizard/StepConfigure.tsx`
|
|
4. If it's a derived locale, add it to `DERIVED_LOCALE_CODES` in `backend/app/services/job_service.py`
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
### Using deploy.sh
|
|
|
|
```bash
|
|
# First time setup (clones repo, builds, migrates, seeds)
|
|
./deploy.sh --init
|
|
|
|
# Regular updates (pulls code, rebuilds changed services, migrates)
|
|
./deploy.sh
|
|
|
|
# Full rebuild (recreates all containers from scratch)
|
|
./deploy.sh --rebuild
|
|
```
|
|
|
|
### Docker Services
|
|
|
|
| Service | Internal Port | External Port | Description |
|
|
|---------|--------------|---------------|-------------|
|
|
| PostgreSQL | 5432 | 5492 | Database |
|
|
| Redis | 6379 | 6389 | Task broker |
|
|
| Backend (FastAPI) | 8000 | 8040 | API server |
|
|
| Celery Worker | - | - | 4 concurrent task workers |
|
|
| Frontend (Next.js) | 3000 | 3000 | SSR app |
|
|
| Nginx (prod only) | 80/443 | 80/443 | Reverse proxy + SSL |
|
|
|
|
### Cost Estimation
|
|
|
|
For a typical 53-line source brief (single-agent pipeline):
|
|
|
|
| | Per Locale | 12 Locales |
|
|
|---|-----------|------------|
|
|
| Single Agent (V25) | ~$0.30-0.50 | ~$3.60-6.00 |
|
|
| Processing time | ~2-4 min | ~2-4 min (parallel) |
|