hm_qc/CLAUDE.md
2025-11-13 13:41:31 +02:00

5.5 KiB
Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

The H&M Quality Control (HMQC) system is a modular Python application designed to perform quality control checks on both PDF documents and static images (JPG, PNG, PSD) for H&M marketing assets. It uses a modular approach with different specialized check modules to validate assets against criteria like filename formatting, image dimensions, imprint verification, language validation, pricing, censorship requirements, and more.

The system supports both development and production environments through environment configuration.

Key Components

  • Core Module: qc_module.py - The main engine that loads and runs QC checks based on profiles.
  • Environment Config: config.py - Handles dev/production environment paths and configuration.
  • Launchers: Scripts in /launchers that execute the QC process, including CLI and Box hotfolder integration.
  • Check Modules: Individual validation components in /checks that implement specific QC criteria.
    • PDF checks: Parse, filename parse, imprint check
    • Image checks: Image parse, filename parse, dimension check
    • Shared checks: Language validation, currency check, censorship (CEN only)
  • Profiles: JSON configuration files in /profiles that define which checks to run and their parameters.
    • HM.json - PDF document checks
    • HM_image.json - Static image checks
  • HTML Reporting: Generated reports showing check results for each processed file.

Development Commands

Environment Setup

Set up development environment (uses local ./tmp/ paths):

export HM_QC_ENV=dev

For production (uses /opt/QC paths):

unset HM_QC_ENV
# or
export HM_QC_ENV=production

See DEV_SETUP.md for complete setup guide.

Running QC Checks

Run QC checks on a PDF file:

export HM_QC_ENV=dev
python launchers/HM_launcher_CLI.py <path_to_pdf> ./tmp/reports/report.html

Run QC checks on an image file (JPG, PNG, PSD):

export HM_QC_ENV=dev
python launchers/HM_launcher_CLI.py <path_to_image> ./tmp/reports/report.html

The launcher automatically detects file type and uses the appropriate profile.

Box Integration

Run the Box hotfolder integration (polls for files and processes them):

python launchers/ford_qc_box_hotfolder_process.py

Architecture Notes

Check Module Pattern

All check modules must implement the standard run_check(config, context, check_id) function:

  • config: Dict with the check's configuration parameters from the profile JSON.
  • context: Shared context dictionary between checks containing results from previous checks.
  • check_id: String identifier for the specific check being run.

Check modules should return a dictionary with at least a status key that can be:

  • passed: Check succeeded
  • error: Check failed with an error
  • skipped: Check was intentionally skipped

Context Sharing

Results from each check are stored in a shared context dictionary, allowing subsequent checks to build on prior results.

API Dependencies

  • LlamaParse: Used for PDF parsing and text extraction
  • DSPy: Used for AI-based image analysis and content validation
  • BoxSDK: Used for Box integration in the hotfolder processor

Important Implementation Details

  1. Environment Configuration: The system now uses config.py to manage paths based on environment:

    • Development (HM_QC_ENV=dev): Uses ./tmp/HM_working/ and ./tmp/reports/
    • Production (default): Uses /tmp/HM_working/ and /opt/QC/reports/
    • All check modules and reporters use environment-aware paths
  2. API Keys: The code contains hardcoded API keys for OpenAI and LlamaParse that should be properly managed.

  3. File Type Detection: The CLI launcher automatically detects file type:

    • .pdf files → Uses HM.json profile (PDF checks)
    • .jpg, .jpeg, .png, .psd → Uses HM_image.json profile (image checks)
  4. Report Generation: The HTML reporter creates reports in environment-aware directories:

    • Development: ./tmp/reports/
    • Production: /opt/QC/reports/
  5. Error Handling: The system uses a standardized error reporting format in check results, which should be maintained.

  6. AI Integration: Several checks use OpenAI's GPT models for complex validation tasks through the DSPy framework.

Check-Specific Details

Imprint Check (HM_imprint_check.py)

  • Now validates reference code INCLUDING country code
  • Example: Expected 9000_10107-06_el-CY vs Detected 9000_10107-06_el-GR → ERROR
  • Properly extracts full reference codes with numeric prefixes (e.g., 9000_10107-06)
  • Skips OOH (out-of-home) files

Censorship Check (HM_censorship.py)

  • CRITICAL RULE: Only runs on CEN files
  • GEN files are SKIPPED (no censorship check required)
  • Standard market files are SKIPPED
  • Uses DSPy with training images from ./supporting/censorship_trainset/

Image Dimension Check (HM_image_dimension_check.py)

  • Validates actual pixel dimensions match filename specification
  • Example: Filename 1200x400 must have image that is exactly 1200×400 pixels
  • Works with various formats: 1200x400, 1080x1920px, 21.6x27.9cm

Filename Parsing

  • PDF filenames: Uses GPT to parse complex H&M naming conventions
  • Image filenames: Uses regex pattern matching for various formats (DOOH, OOH, Display, POS, DS)
  • Both extract: reference, language, dimensions, format, year (if applicable)