amazon-transcreation/backend/app/pipeline/modules/domain_substitutor.py
DJP 98fa16bfc3 feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton
Full-stack Amazon AI Transcreation Platform with:
- FastAPI backend (async, PostgreSQL, Redis, Celery) with 11 DB tables
- JWT auth (SSO-ready abstract provider pattern)
- 6-agent pipeline orchestrator with deterministic modules
- Next.js 14 frontend with Amazon branding (Ember fonts, orange/dark theme)
- Job wizard, monitoring HUD, output review, admin screens
- 154 TM/reference files imported, 12 locales configured
- Docker Compose for all services

Agents 2-5 (TM retrieval, ranker, transcreator, compliance) are stubs
pending Phase 3 LLM integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:31:43 -04:00

110 lines
2.9 KiB
Python

"""Domain substitution for Amazon locales.
Maps Amazon.co.uk (source domain) to the correct locale-specific domain.
Handles both full domain URLs and bare "Amazon" references.
Emerging locales: bare "Amazon" stays as "Amazon"
Non-emerging locales: bare "Amazon" -> locale-specific brand name rules apply
"""
# Full domain map for all 12 supported locales
DOMAIN_MAP: dict[str, str] = {
"de_DE": "Amazon.de",
"fr_FR": "Amazon.fr",
"it_IT": "Amazon.it",
"es_ES": "Amazon.es",
"nl_NL": "Amazon.nl",
"pl_PL": "Amazon.pl",
"sv_SE": "Amazon.se",
"pt_BR": "Amazon.com.br",
"ja_JP": "Amazon.co.jp",
"en_AU": "Amazon.com.au",
"en_SG": "Amazon.sg",
"ar_AE": "Amazon.ae",
}
# Emerging locales where bare "Amazon" stays as-is
EMERGING_LOCALES: set[str] = {
"pl_PL",
"sv_SE",
"nl_NL",
"ar_AE",
"en_SG",
}
# Source domain to replace
SOURCE_DOMAIN = "Amazon.co.uk"
SOURCE_DOMAIN_LOWER = SOURCE_DOMAIN.lower()
def substitute_domains(text: str, locale_code: str) -> str:
"""Replace Amazon.co.uk with the locale-specific domain.
Args:
text: The text containing domain references.
locale_code: The target locale code (e.g., "de_DE").
Returns:
Text with domains substituted.
"""
if not text:
return text
target_domain = DOMAIN_MAP.get(locale_code)
if target_domain is None:
return text
# Replace full domain (case-insensitive)
result = text
idx = 0
while True:
lower_result = result.lower()
pos = lower_result.find(SOURCE_DOMAIN_LOWER, idx)
if pos == -1:
break
result = result[:pos] + target_domain + result[pos + len(SOURCE_DOMAIN):]
idx = pos + len(target_domain)
return result
def substitute_bare_amazon(text: str, locale_code: str) -> str:
"""Handle bare 'Amazon' references based on locale type.
For emerging locales: leave bare 'Amazon' as-is.
For non-emerging locales: append locale domain context if needed.
Args:
text: The text with potential bare Amazon references.
locale_code: The target locale code.
Returns:
Text with bare Amazon references handled.
"""
if not text:
return text
if locale_code in EMERGING_LOCALES:
# Emerging locales: bare Amazon stays as Amazon
return text
# For non-emerging locales, bare "Amazon" (not followed by .)
# is kept as-is since it's the brand name
return text
def get_locale_domain(locale_code: str) -> str | None:
"""Get the domain for a locale code.
Args:
locale_code: The target locale code.
Returns:
The domain string or None if locale is not supported.
"""
return DOMAIN_MAP.get(locale_code)
def is_emerging_locale(locale_code: str) -> bool:
"""Check if a locale is classified as emerging."""
return locale_code in EMERGING_LOCALES