5.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
The H&M Quality Control (HMQC) system is a modular Python application designed to perform quality control checks on both PDF documents and static images (JPG, PNG, PSD) for H&M marketing assets. It uses a modular approach with different specialized check modules to validate assets against criteria like filename formatting, image dimensions, imprint verification, language validation, pricing, censorship requirements, and more.
The system supports both development and production environments through environment configuration.
Key Components
- Core Module:
qc_module.py- The main engine that loads and runs QC checks based on profiles. - Environment Config:
config.py- Handles dev/production environment paths and configuration. - Launchers: Scripts in
/launchersthat execute the QC process, including CLI and Box hotfolder integration. - Check Modules: Individual validation components in
/checksthat implement specific QC criteria.- PDF checks: Parse, filename parse, imprint check
- Image checks: Image parse, filename parse, dimension check
- Shared checks: Language validation, currency check, censorship (CEN only)
- Profiles: JSON configuration files in
/profilesthat define which checks to run and their parameters.HM.json- PDF document checksHM_image.json- Static image checks
- HTML Reporting: Generated reports showing check results for each processed file.
Development Commands
Environment Setup
Set up development environment (uses local ./tmp/ paths):
export HM_QC_ENV=dev
For production (uses /opt/QC paths):
unset HM_QC_ENV
# or
export HM_QC_ENV=production
See DEV_SETUP.md for complete setup guide.
Running QC Checks
Run QC checks on a PDF file:
export HM_QC_ENV=dev
python launchers/HM_launcher_CLI.py <path_to_pdf> ./tmp/reports/report.html
Run QC checks on an image file (JPG, PNG, PSD):
export HM_QC_ENV=dev
python launchers/HM_launcher_CLI.py <path_to_image> ./tmp/reports/report.html
The launcher automatically detects file type and uses the appropriate profile.
Box Integration
Run the Box hotfolder integration (polls for files and processes them):
python launchers/ford_qc_box_hotfolder_process.py
Architecture Notes
Check Module Pattern
All check modules must implement the standard run_check(config, context, check_id) function:
config: Dict with the check's configuration parameters from the profile JSON.context: Shared context dictionary between checks containing results from previous checks.check_id: String identifier for the specific check being run.
Check modules should return a dictionary with at least a status key that can be:
passed: Check succeedederror: Check failed with an errorskipped: Check was intentionally skipped
Context Sharing
Results from each check are stored in a shared context dictionary, allowing subsequent checks to build on prior results.
API Dependencies
- LlamaParse: Used for PDF parsing and text extraction
- DSPy: Used for AI-based image analysis and content validation
- BoxSDK: Used for Box integration in the hotfolder processor
Important Implementation Details
-
Environment Configuration: The system now uses
config.pyto manage paths based on environment:- Development (
HM_QC_ENV=dev): Uses./tmp/HM_working/and./tmp/reports/ - Production (default): Uses
/tmp/HM_working/and/opt/QC/reports/ - All check modules and reporters use environment-aware paths
- Development (
-
API Keys: The code contains hardcoded API keys for OpenAI and LlamaParse that should be properly managed.
-
File Type Detection: The CLI launcher automatically detects file type:
.pdffiles → UsesHM.jsonprofile (PDF checks).jpg,.jpeg,.png,.psd→ UsesHM_image.jsonprofile (image checks)
-
Report Generation: The HTML reporter creates reports in environment-aware directories:
- Development:
./tmp/reports/ - Production:
/opt/QC/reports/
- Development:
-
Error Handling: The system uses a standardized error reporting format in check results, which should be maintained.
-
AI Integration: Several checks use OpenAI's GPT models for complex validation tasks through the DSPy framework.
Check-Specific Details
Imprint Check (HM_imprint_check.py)
- Now validates reference code INCLUDING country code
- Example: Expected
9000_10107-06_el-CYvs Detected9000_10107-06_el-GR→ ERROR - Properly extracts full reference codes with numeric prefixes (e.g.,
9000_10107-06) - Skips OOH (out-of-home) files
Censorship Check (HM_censorship.py)
- CRITICAL RULE: Only runs on CEN files
- GEN files are SKIPPED (no censorship check required)
- Standard market files are SKIPPED
- Uses DSPy with training images from
./supporting/censorship_trainset/
Image Dimension Check (HM_image_dimension_check.py)
- Validates actual pixel dimensions match filename specification
- Example: Filename
1200x400must have image that is exactly 1200×400 pixels - Works with various formats:
1200x400,1080x1920px,21.6x27.9cm
Filename Parsing
- PDF filenames: Uses GPT to parse complex H&M naming conventions
- Image filenames: Uses regex pattern matching for various formats (DOOH, OOH, Display, POS, DS)
- Both extract: reference, language, dimensions, format, year (if applicable)