amazon-transcreation/README.md

# Amazon AI Transcreation Platform

An AI-powered transcreation platform that adapts Amazon marketing copy across 12 European locales using Claude LLM agents. Replaces a manual LibreChat workflow with structured, one-click multi-locale processing, real-time monitoring, in-app review, and proper job/file management.

---

## Table of Contents

- [Architecture Overview](#architecture-overview)
- [How It Works](#how-it-works)
- [The Agent Pipeline](#the-agent-pipeline)
- [Tech Stack](#tech-stack)
- [Getting Started](#getting-started)
- [Configuration](#configuration)
- [Storage Layout](#storage-layout)
- [Supported Locales & Channels](#supported-locales--channels)
- [API Reference](#api-reference)
- [Database Schema](#database-schema)
- [User Guide](#user-guide)
- [Development](#development)
- [Deployment](#deployment)

---

## Architecture Overview

```
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                          AMAZON TRANSCREATION PLATFORM                         │
 └─────────────────────────────────────────────────────────────────────────────────┘

 ┌──────────────────┐         ┌──────────────────────────────────────────────────┐
 │                  │  HTTP   │                 FastAPI Backend                  │
 │   Next.js 14     │ ◄─────►│                                                  │
 │   Frontend       │  REST   │  ┌────────────┐  ┌──────────┐  ┌─────────────┐  │
 │                  │         │  │  Auth       │  │  Jobs    │  │  Output     │  │
 │  ┌────────────┐  │  Poll   │  │  Service   │  │  API     │  │  API        │  │
 │  │ Dashboard  │  │ (3sec)  │  └────────────┘  └────┬─────┘  └─────────────┘  │
 │  │ Job Wizard │  │         │                       │                          │
 │  │ Monitor    │  │         │  ┌────────────────────▼───────────────────────┐  │
 │  │ Review     │  │         │  │              Celery Task Queue             │  │
 │  │ Admin      │  │         │  │         (4 concurrent workers)             │  │
 │  └────────────┘  │         │  └────────────────────┬───────────────────────┘  │
 └──────────────────┘         │                       │                          │
                              │  ┌────────────────────▼───────────────────────┐  │
                              │  │           Pipeline Orchestrator             │  │
                              │  │                                             │  │
                              │  │   VALIDATE ► SINGLE_AGENT ► FORMAT ► DONE  │  │
                              │  │                                             │  │
                              │  │   (Single LLM call with full V25 prompt)   │  │
                              │  └─────────────────────────────────────────────┘  │
                              └──────────┬──────────────────────────┬─────────────┘
                                         │                          │
                              ┌──────────▼──────┐       ┌──────────▼──────────┐
                              │  PostgreSQL 16   │       │    Redis 7          │
                              │                  │       │                     │
                              │  11 tables       │       │  Celery broker      │
                              │  Jobs, Output,   │       │  Task results       │
                              │  Users, Audit    │       │  WebSocket pub/sub  │
                              └─────────────────┘       └─────────────────────┘

                              ┌─────────────────┐       ┌─────────────────────┐
                              │  Claude API      │       │  File Storage       │
                              │  (Anthropic)     │       │                     │
                              │                  │       │  /storage/amazon/   │
                              │  Single agent    │       │    tm/   (JSONL)    │
                              │  (1 LLM call)    │       │    ref/  (JSON)     │
                              └─────────────────┘       └─────────────────────┘
```

---

## How It Works

### The Workflow (End to End)

```
  USER                       PLATFORM                           CLAUDE API
   │                            │                                   │
   │  1. Create Job             │                                   │
   │  (campaign, locale,        │                                   │
   │   channel, programme)      │                                   │
   │ ──────────────────────────►│                                   │
   │                            │                                   │
   │  2. Upload Source xlsx     │                                   │
   │  (EN_GB lines, char        │                                   │
   │   limits, copy types)      │                                   │
   │ ──────────────────────────►│                                   │
   │                            │                                   │
   │  3. Launch                 │                                   │
   │ ──────────────────────────►│                                   │
   │                            │  Celery dispatches per-locale     │
   │                            │  tasks in PARALLEL (up to 4)      │
   │                            │ ─────────┐                        │
   │                            │          │                        │
   │  4. Monitor Progress       │  ┌───────▼────────┐              │
   │  (polls every 3 sec)       │  │ Agent Pipeline  │              │
   │ ◄─── 10% Loading Files ───│  │                 │              │
   │ ◄─── 20% Transcreating ───│  │ Single agent    │──── LLM ────►│
   │ ◄─── 90% Formatting ──────│  │ per locale      │◄── table ────│
   │ ◄── 100% Complete ────────│  │                 │              │
   │                            │  └───────┬─────────┘              │
   │                            │  ┌───────▼────────┐              │
   │  5. Review Output          │  │ Output saved   │              │
   │  (per-locale, per-line     │  │ to DB + xlsx   │              │
   │   with confidence tiers)   │  └────────────────┘              │
   │ ──────────────────────────►│                                   │
   │                            │                                   │
   │  6. Approve / Revise       │                                   │
   │ ──────────────────────────►│                                   │
   │                            │                                   │
   │  7. Download xlsx          │                                   │
   │ ◄──────────────────────────│                                   │
```

### What Happens When You Launch a Job

1. **Job created** with campaign name, programme (Retail/Prime/Brand), channel, multiple TM files, and target locales (all 12 selectable in a single flat list)
2. **Source file uploaded** - an xlsx with English (en_GB) marketing copy, character limits, copy types, and creative guidance
3. **Launch** dispatches one Celery task per locale - up to 4 run in parallel
4. Each locale runs through the **single-agent pipeline** — one LLM call with the full V25 prompt (see below)
5. Real-time **progress updates** are stored in the database and polled by the frontend every 3 seconds
6. On completion, output is viewable in the **review interface** with confidence badges, backtranslations, and rationale
7. **Export** downloads a formatted xlsx (Tab 1: output table, Tab 2: linguistic summary)

---

## The Agent Pipeline

### Single-Agent Pipeline (Default)

The platform uses a **single consolidated LLM call** with the complete V25 Agent Instructions JSON as the system prompt. This replaces the earlier 6-agent sequential pipeline and produces better results by preserving inter-step context (TM reasoning, ranking rationale, cultural nuance) within a single prompt.

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        PER-LOCALE PIPELINE                             │
│                                                                        │
│  ┌──────────────┐    Deterministic. Parses xlsx, loads glossary,       │
│  │  VALIDATE    │    blacklist, TOV, locale considerations, and        │
│  │  [no LLM]    │    date/percent format files. Builds PipelineContext. │
│  └──────┬───────┘    ~0.1 seconds                                      │
│         │ 10%                                                          │
│  ┌──────▼───────┐    Single LLM call using V25 Agent Instructions.     │
│  │  SINGLE      │    System prompt: full V25 JSON (899 lines).         │
│  │  AGENT       │    User message: job params, ALL source lines,       │
│  │  [1 LLM call]│    ALL TM entries (multiple channels), ALL reference │
│  │              │    files (glossary, blacklist, TOV, locale rules).   │
│  │              │                                                      │
│  │              │    The agent handles TM matching, ranking,           │
│  │              │    transcreation, and compliance in one pass.        │
│  │              │    Outputs a markdown table + linguistic summary.    │
│  └──────┬───────┘    ~2-4 min, ~$0.30-0.50                            │
│         │ 20-90%                                                       │
│  ┌──────▼───────┐    Deterministic. Generates output xlsx:             │
│  │  FORMAT      │    Tab 1: 11-column output table                     │
│  │  [no LLM]    │    Tab 2: Linguistic summary from the agent          │
│  └──────┬───────┘    ~0.1 seconds                                      │
│         │ 100%                                                         │
│         ▼                                                              │
│       DONE  (~2-4 min total per locale)                                │
└─────────────────────────────────────────────────────────────────────────┘
```

### Legacy 6-Agent Pipeline (Feature Flag)

The original 6-agent sequential pipeline is preserved behind a feature flag (`USE_SINGLE_AGENT=false`). It runs: VALIDATE → TM_RETRIEVE → RANK → TRANSCREATE → COMPLY (retry x3) → FORMAT → DONE. This path makes 2+ LLM calls (TM retrieval + transcreation in batches) and takes longer (~5.5 min per locale).

### Confidence Tiers and Option Counts

```
  TM Match Quality              Confidence      Options Generated
  ─────────────────────────     ────────────    ──────────────────
  Same channel + recent year    HIGH            1 option (anchored to TM)
  Cross-channel or older        MODERATE        2 options
  No TM match found             LOW             3 creative options
```

### Voice Profiles (per Programme)

| Programme | Voice Attributes |
|-----------|-----------------|
| **Retail** | Real, Clear, Playful, Witty |
| **Prime** | Optimistic, Honest, Self-aware, Witty, Relatable |
| **Brand** | Authentic, Customer-obsessed, Intelligent, Warm, Understated |

### Deterministic Modules

The pipeline uses 9 pure-Python modules (no LLM) for specific tasks:

| Module | Purpose |
|--------|---------|
| `source_file_parser` | Parse xlsx, validate columns, detect display format |
| `tm_file_loader` | Parse JSONL TM files (compact + multi-field formats) |
| `ref_file_loader` | Load glossary, blacklist, TOV, locale considerations |
| `character_counter` | Unicode grapheme cluster counting (not `len()`) |
| `blacklist_scanner` | Exact + root-based forbidden term matching |
| `date_format_validator` | Validate date/percent formats per locale |
| `domain_substitutor` | Amazon.co.uk to locale-specific domain mapping |
| `line_break_normaliser` | Handle `\n` for TM queries vs Excel output |
| `excel_writer` | Generate formatted xlsx (Tab 1: output table, Tab 2: linguistic summary) |

---

## Tech Stack

```
┌───────────────────────────────────────────────────────────────┐
│ FRONTEND              │ BACKEND              │ INFRASTRUCTURE │
├───────────────────────┼──────────────────────┼────────────────┤
│ Next.js 14 (App Rtr)  │ Python 3.12          │ Docker Compose │
│ React 18              │ FastAPI              │ PostgreSQL 16  │
│ TypeScript 5.4        │ SQLAlchemy 2 (async) │ Redis 7        │
│ Tailwind CSS 3.4      │ Alembic (migrations) │ Nginx (prod)   │
│ Radix UI primitives   │ Celery 5.4           │                │
│ Recharts (charts)     │ Pydantic v2          │                │
│ Axios                 │ Anthropic SDK        │                │
│ Lucide (icons)        │ openpyxl             │                │
│                       │ bcrypt + JWT         │                │
└───────────────────────┴──────────────────────┴────────────────┘
```

---

## Getting Started

### Prerequisites

- Docker and Docker Compose v2
- An Anthropic API key (for Claude)
- Node.js 18+ (for frontend builds)
- Git

### Quick Start

```bash
# 1. Clone the repository
git clone git@bitbucket.org:zlalani/amazon-transcreation.git
cd amazon-transcreation

# 2. Copy environment file and set your API key
cp .env.example .env
# Edit .env and set:
#   ANTHROPIC_API_KEY=sk-ant-your-key-here
#   JWT_SECRET_KEY=a-random-secret-string

# 3. Start all services
make up
# or: docker compose up -d

# 4. Run database migrations
make migrate

# 5. Seed default data (Amazon client + test users)
make seed

# 6. Build the frontend
cd frontend && npm install && npm run build && cd ..

# 7. Access the application
# Backend API: http://localhost:8040/api/v1
# Frontend:    http://localhost:3000
```

### Default Users (after seeding)

| Email | Role | Password |
|-------|------|----------|
| admin@amazon.com | Admin | admin123 |
| manager@amazon.com | TM Manager | admin123 |
| reviewer@amazon.com | Reviewer | admin123 |

### Makefile Commands

| Command | Description |
|---------|-------------|
| `make up` | Start all Docker services |
| `make down` | Stop all services |
| `make build` | Rebuild Docker images |
| `make migrate` | Run database migrations |
| `make seed` | Seed default client and test users |
| `make test` | Run backend test suite |
| `make shell` | Open a bash shell in the backend container |
| `make logs` | Stream all container logs |
| `make restart` | Restart backend + Celery worker |
| `make db-shell` | Open PostgreSQL interactive shell |
| `make redis-cli` | Open Redis CLI |

---

## Configuration

All configuration is via environment variables in `.env`:

| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | `postgresql+asyncpg://...` | PostgreSQL async connection string |
| `REDIS_URL` | `redis://redis:6379/0` | Redis connection for Celery + pub/sub |
| `ANTHROPIC_API_KEY` | *(required)* | Your Anthropic API key for Claude |
| `JWT_SECRET_KEY` | *(required)* | Secret key for JWT token signing |
| `JWT_ALGORITHM` | `HS256` | JWT signing algorithm |
| `JWT_EXPIRY_HOURS` | `8` | Access token expiry in hours |
| `STORAGE_ROOT` | `/storage` | Root path for file storage |
| `LLM_MODEL` | `claude-sonnet-4-6` | Default Claude model (overridden per-job via UI: `claude-sonnet-4-6` or `claude-opus-4-6`) |
| `USE_SINGLE_AGENT` | `true` | Use single-agent pipeline (`true`) or legacy 6-agent (`false`) |

---

## Storage Layout

```
storage/amazon/
├── tm/                                 # Translation Memory files (JSONL)
│   ├── de-DE/
│   │   ├── flat_MASS_de-de.json        # Mass channel TM
│   │   ├── flat_value_de-de.json       # Value channel TM
│   │   ├── flat_Onsite_de-de.json      # Onsite channel TM
│   │   ├── flat_Outbound_de-de.json    # Outbound channel TM
│   │   ├── flat_UEFA_de-de.json        # UEFA channel TM
│   │   └── ...                         # + BDA, DoubleDonut, EUSelection, etc.
│   ├── fr-FR/
│   │   └── ...
│   └── ... (12 locale directories)
│
└── ref/                                # Reference files (JSON)
    ├── glossary/                       # Locale-specific term glossaries
    │   ├── de_DE_glossary.json
    │   └── ...
    ├── blacklist/                      # Forbidden terms per locale
    │   ├── de_DE_blacklist.json
    │   └── ...
    ├── tov_global/                     # Global Tone of Voice guidelines
    │   └── Amazon_TOV_Guidelines_for_Transcreation_290224.json
    ├── tov_supplement/                 # Supplementary TOV (de-DE, de-AT)
    │   └── DE_AT_TOV_Guidelines.json
    ├── locale_considerations/          # Locale-specific rules and notes
    │   └── ...
    └── date_pct_formats/               # Approved date/percentage formats
        └── ...
```

### TM File Format (JSONL)

Each line is a JSON object. Two formats are supported:

**Compact format** (existing files):
```json
{"t": "Value Q1 24 Radio 001 VO de-de As Sophie opened the door... Als Sophie die Tuer oeffnete..."}
```

**Multi-field format**:
```json
{"seg_key": "Value Q1 24 Radio 001", "en": "As Sophie opened...", "lc": "de-de", "tx": "Als Sophie...", "nt": "VO", "channel": "value"}
```

---

## Supported Locales & Channels

### Locales (12)

| Code | Language | Notes |
|------|----------|-------|
| de-DE | German (Germany) | Shares TM/TOV supplement with de-AT |
| de-AT | German (Austria) | Shares TM/TOV supplement with de-DE |
| fr-FR | French (France) | Shares TM with fr-BE |
| fr-BE | French (Belgium) | Shares TM with fr-FR |
| es-ES | Spanish (Spain) | Shares TM with ca-ES |
| ca-ES | Catalan (Spain) | Enforced as Catalan, not Spanish |
| it-IT | Italian (Italy) | - |
| nl-NL | Dutch (Netherlands) | Independent from nl-BE |
| nl-BE | Dutch (Belgium) | Independent from nl-NL |
| pl-PL | Polish (Poland) | - |
| pt-PT | Portuguese (Portugal) | - |
| sv-SE | Swedish (Sweden) | - |

### Channels & TM Files

Jobs can select **multiple TM channels** to load into the agent's context. The campaign channel is auto-selected, and users can add additional TM files for cross-channel reference (e.g. MASS as a fallback alongside the primary channel).

| Channel | TM File Pattern |
|---------|----------------|
| Mass | `flat_MASS_{lc}.json` |
| Value | `flat_value_{lc}.json` |
| Onsite | `flat_Onsite_{lc}.json` |
| Outbound | `flat_Outbound_{lc}.json` |
| UEFA | `flat_UEFA_{lc}.json` |
| BDA | `flat_BDA_{lc}.json` |
| DoubleDonut | `flat_DoubleDonut_{lc}.json` |
| EUSelection | `flat_EUSelection_{lc}.json` |
| PrimeDualBenefit | `flat_PrimeDualBenefit_{lc}.json` |
| PrimeGourmetGuard | `flat_PrimeGourmetGuard_{lc}.json` |
| PrimeMidfunnel | `flat_PrimeMidfunnel_{lc}.json` |
| PrimeSpeed | `flat_PrimeSpeed_{lc}.json` |
| TheKiss | `flat_TheKiss_{lc}.json` |

### Programmes & Voice Profiles

| Programme | Voice | Description |
|-----------|-------|-------------|
| Retail | Real, Clear, Playful, Witty | Everyday value messaging |
| Prime | Optimistic, Honest, Self-aware, Witty, Relatable | Prime membership benefits |
| Brand | Authentic, Customer-obsessed, Intelligent, Warm, Understated | Brand-level communications |

---

## API Reference

### Authentication

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/auth/login` | Login (email + password) |
| POST | `/api/v1/auth/refresh` | Refresh access token |
| GET | `/api/v1/auth/me` | Get current user claims |

### Jobs

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/jobs` | Create job |
| GET | `/api/v1/jobs` | List jobs (filterable) |
| GET | `/api/v1/jobs/{id}` | Get job detail + locale instances |
| DELETE | `/api/v1/jobs/{id}` | Delete job (admin only) |
| PUT | `/api/v1/jobs/{id}/source` | Upload source xlsx |
| POST | `/api/v1/jobs/{id}/supplementary` | Upload supplementary file |
| POST | `/api/v1/jobs/{id}/launch` | Launch processing |
| POST | `/api/v1/jobs/{id}/cancel` | Cancel job |
| POST | `/api/v1/jobs/{id}/locales/{code}/rerun` | Re-run single locale |

### Output & Feedback

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/v1/output/jobs/{id}/locales/{code}/preview` | Output preview |
| GET | `/api/v1/output/jobs/{id}/locales/{code}/export` | Download xlsx |
| POST | `/api/v1/output/feedback` | Submit feedback |
| GET | `/api/v1/output/feedback/{output_id}` | Get feedback |

### File Management

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/files/tm` | Upload TM file |
| GET | `/api/v1/files/tm` | List TM files |
| DELETE | `/api/v1/files/tm/{id}` | Delete TM file |
| POST | `/api/v1/files/reference` | Upload reference file |
| GET | `/api/v1/files/reference` | List reference files |
| DELETE | `/api/v1/files/reference/{id}` | Delete reference file |

### Admin & Reports

| Method | Endpoint | Description |
|--------|----------|-------------|
| CRUD | `/api/v1/users` | User management (admin) |
| CRUD | `/api/v1/clients` | Client management (admin) |
| GET | `/api/v1/audit/logs` | Audit trail |
| GET | `/api/v1/reports/usage` | Usage statistics |
| GET | `/api/v1/reports/tokens` | Token cost breakdown |
| GET | `/api/v1/reports/quality` | Quality metrics |

---

## Database Schema

```
┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│   clients    │     │    users     │     │  user_clients    │
│──────────────│     │──────────────│     │──────────────────│
│ id (PK)      │◄────│ id (PK)      │     │ user_id (FK)     │
│ name         │     │ email        │     │ client_id (FK)   │
│ settings     │     │ name         │     │ role_override    │
└──────┬───────┘     │ password_hash│     └──────────────────┘
       │             │ role (enum)  │
       │             │ status       │
       │             └──────┬───────┘
       │                    │
┌──────▼───────┐            │
│    jobs      │            │
│──────────────│◄───────────┘ (created_by)
│ id (PK)      │
│ client_id    │     ┌──────────────────┐
│ campaign_name│     │  source_lines    │
│ programme    │     │──────────────────│
│ channel      │     │ id (PK)          │
│ tm_channels  │     │ job_id (FK)      │
│ status       │◄────│ en_gb            │
│ job_type     │     │ copy_type        │
└──────┬───────┘     │ char_limit       │
       │             │ char_limit       │
       │             │ is_display_format│
┌──────▼───────────┐ └──────────────────┘
│ locale_instances │
│──────────────────│  ┌──────────────────┐
│ id (PK)          │  │  output_rows     │
│ job_id (FK)      │  │──────────────────│
│ locale_code      │  │ id (PK)          │
│ status           │◄─│ instance_id (FK) │
│ progress         │  │ line_id (FK)     │
│ current_stage    │  │ confidence_tier  │
│ token_usage      │  │ option_1,2,3     │   ┌──────────────┐
│ started_at       │  │ backtranslation  │   │  feedback     │
│ completed_at     │  │ rationale        │   │──────────────│
└──────────────────┘  │ char_counts      │◄──│ output_id    │
                      └──────────────────┘   │ user_id      │
                                             │ flag_type    │
┌──────────────────┐  ┌──────────────────┐   │ comment      │
│ tm_file_registry │  │ reference_files  │   └──────────────┘
│──────────────────│  │──────────────────│
│ client_id        │  │ client_id        │   ┌──────────────┐
│ locale_code      │  │ file_type        │   │ audit_logs   │
│ channel          │  │ locale_scope     │   │──────────────│
│ filename         │  │ filename         │   │ user_id      │
│ segment_count    │  │ file_path        │   │ action       │
└──────────────────┘  └──────────────────┘   │ entity_type  │
                                             │ details      │
                      ┌──────────────────┐   └──────────────┘
                      │ token_usage_logs │
                      │──────────────────│
                      │ instance_id      │
                      │ agent_name       │
                      │ input_tokens     │
                      │ output_tokens    │
                      │ estimated_cost   │
                      └──────────────────┘
```

11 tables total. All primary keys are UUIDs. Cascading deletes from jobs down through locale_instances, output_rows, and source_lines.

---

## User Guide

### Creating a Job

1. Navigate to **Jobs > New Job**
2. Fill in the job details:
   - **Client** - Select the client (e.g. Amazon)
   - **Campaign Name** - Name of the campaign (e.g. "DDA 26 BFW")
   - **Programme** - Retail, Prime, or Brand (determines voice profile)
   - **Channel** - Campaign channel (e.g. Value, Mass, Onsite, Outbound)
   - **TM Files** - Select one or more TM channels to load (campaign channel auto-selected; add MASS as fallback or other channels for cross-reference)
   - **Locales** - All 12 locales in a single flat grid (main and derived locales are auto-classified — no separate "Job Type" selection needed)
3. Upload the **source xlsx** file with columns:
   - `EN_GB` (required) - English source copy
   - `Copy Type` - Type of copy (headline, body, CTA, script, etc.)
   - `Creative Guidance` - Context or instructions for the transcreator
   - `Visual Ref` - Reference to visual assets
   - `Char Limit` - Maximum character count for the translation
4. Optionally add a **context/override prompt** with special instructions
5. Review the summary and click **Launch**

### Monitoring Progress

Once launched, the job monitoring page shows real-time updates:
- Per-locale progress bars (0-100%)
- Current stage: Loading Files > Transcreating > Formatting Output > Complete
- Token usage and elapsed time
- Error details if any locale fails

Multiple locales process in **parallel** (up to 4 at once).

### Reviewing Output

Click **Preview** on a completed locale to open the review interface:
- Each source line shows its **confidence tier** (High / Moderate / Low)
- **High confidence**: 1 option anchored to a TM match
- **Moderate confidence**: 2 creative options
- **Low confidence**: 3 creative options
- Every option includes a **backtranslation** and **character count**
- Expandable **rationale** explains the translation choices and TM citations
- Feedback buttons: **Approve**, **Needs Revision**, or add a **Comment**
- **Export** button downloads the formatted xlsx (Tab 1: output table, Tab 2: linguistic summary explaining the agent's approach and cultural choices)

### Admin Features

Admins have access to additional pages:
- **User Management** - Create, edit, and deactivate users
- **Client Management** - Manage client configurations
- **TM Files** - Upload and manage Translation Memory files
- **Reference Files** - Manage glossaries, blacklists, TOV guidelines
- **Reports** - Usage statistics, token costs, quality metrics
- **Audit Logs** - Complete trail of all system actions
- **Delete Jobs** - Remove old jobs (with confirmation)

---

## Development

### Project Structure

```
amazon-transcreation/
├── backend/
│   ├── app/
│   │   ├── main.py                    # FastAPI app factory
│   │   ├── config.py                  # pydantic-settings env loader
│   │   ├── dependencies.py            # DI: get_db, get_current_user
│   │   ├── auth/                      # JWT auth (SSO-ready provider pattern)
│   │   ├── api/v1/                    # REST endpoint routers
│   │   ├── models/                    # SQLAlchemy models (11 tables)
│   │   ├── schemas/                   # Pydantic request/response models
│   │   ├── services/                  # Business logic layer
│   │   ├── pipeline/
│   │   │   ├── orchestrator.py        # State machine (single-agent or legacy 6-agent)
│   │   │   ├── contracts.py           # Inter-agent Pydantic models
│   │   │   ├── agents/
│   │   │   │   ├── agent_single.py    # Consolidated single-agent (V25 prompt)
│   │   │   │   ├── agent_1_validator.py  # Deterministic file/input validation
│   │   │   │   ├── agent_6_formatter.py  # Excel output generation
│   │   │   │   ├── agent_2-5_*.py     # Legacy agents (behind feature flag)
│   │   │   │   └── prompts/
│   │   │   │       └── v25_instructions.json  # V25 Agent Instructions (system prompt)
│   │   │   └── modules/              # 9 deterministic modules
│   │   ├── tasks/                     # Celery task definitions
│   │   ├── llm/                       # Anthropic SDK wrapper + retry
│   │   └── ws/                        # WebSocket handler + manager
│   ├── alembic/                       # Database migrations
│   └── tests/
├── frontend/
│   └── src/
│       ├── app/                       # Next.js App Router pages
│       ├── components/                # React UI components
│       ├── hooks/                     # Custom React hooks
│       └── lib/                       # API client, types, utilities
├── storage/                           # Runtime file storage (mounted volume)
├── docker-compose.yml                 # Development services
├── docker-compose.prod.yml            # Production services
├── deploy.sh                          # Server deployment script
├── Makefile                           # Dev convenience commands
└── .env.example                       # Environment variable template
```

### Running Tests

```bash
make test
# or
docker compose exec backend python -m pytest tests/ -v
```

### Adding a New Locale

1. Create TM files in `storage/amazon/tm/{locale_code}/`
2. Create reference files in the appropriate `storage/amazon/ref/` subdirectories
3. Add the locale code to `ALL_LOCALES` in `frontend/src/components/jobs/JobWizard/StepConfigure.tsx`
4. If it's a derived locale, add it to `DERIVED_LOCALE_CODES` in `backend/app/services/job_service.py`

---

## Deployment

### Using deploy.sh

```bash
# First time setup (clones repo, builds, migrates, seeds)
./deploy.sh --init

# Regular updates (pulls code, rebuilds changed services, migrates)
./deploy.sh

# Full rebuild (recreates all containers from scratch)
./deploy.sh --rebuild
```

### Docker Services

| Service | Internal Port | External Port | Description |
|---------|--------------|---------------|-------------|
| PostgreSQL | 5432 | 5492 | Database |
| Redis | 6379 | 6389 | Task broker |
| Backend (FastAPI) | 8000 | 8040 | API server |
| Celery Worker | - | - | 4 concurrent task workers |
| Frontend (Next.js) | 3000 | 3000 | SSR app |
| Nginx (prod only) | 80/443 | 80/443 | Reverse proxy + SSL |

### Cost Estimation

For a typical 53-line source brief (single-agent pipeline):

| | Per Locale | 12 Locales |
|---|-----------|------------|
| Single Agent (V25) | ~$0.30-0.50 | ~$3.60-6.00 |
| Processing time | ~2-4 min | ~2-4 min (parallel) |