Two bugs surfaced by the first dev smoke test:
1. Profile JSON declared "llm": "gemini" (lowercase). llm_config's
dispatcher compares model_name == "Gemini" case-sensitively
(matches the rest of the codebase), so the check fell through to
"Invalid model selected" and never reached the API. Every other
profile uses "Gemini" with capital G. Spec mistake — fixed.
2. get_client_from_profile() resolves the per-report output folder
from the profile_id via hardcoded prefix matches. No 'hp_' branch
existed, so hp_copy_review reports landed under output-dev/general/
instead of output-dev/hp/ — the UI then couldn't find them. Added
'hp_' → 'hp' alongside the existing mappings.
The check itself works correctly otherwise: profile_source was
user_selected, brand resolved to 'hp', and the reference asset was
successfully attached. Bug 1 just prevented Gemini from being called.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HP is no longer a placeholder. The client gets a new hp_copy_review
profile (single weighted check, client-specific visibility) as its
default, plus the generic static_general and video_general profiles
it already had visibility into.
Moves the Dow Jones / MarketWatch / WSJ profile JSONs (4), check apps
(22), and CLAUDE_DOW_JONES.md into backend/_archive/dow_jones/. All
moves use git mv so history follows. Adds a restore-instructions
README. No loader changes needed — the archive lives outside the
scanned directories.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed axa_pdf_accessibility from axa_policy_document (was 8 checks, now 7)
and created a new axa_accessibility profile that contains only that check.
Marked the new profile strict_grade: true so a single PDF/UA-1 rule failure
forces an unmistakable Fail badge on the report — mirrors how axes4 PAC is
used in practice (single-purpose, binary verdict).
Lets users run accessibility-only QC without sitting through the rest of
the policy-document checks, and removes weight from the policy-document
score that the accessibility check wasn't really earning (its 0/10 verdict
was dragging the overall grade in a way that obscured the content checks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New profile boots_ppack for QCing multi-page Boots production packs
(PowerPoint-exported PDFs, 4-18 pages each). Built on top of AXA's
document-mode infrastructure — branched off feature/axa-document-mode
because it reuses the dispatcher, ingest, and result writer.
New checks:
- boots_logo_compliance — three-path scoring (master wordmark / partner
lock-up / no branding) so OLIVER x BOOTS-style footer lock-ups aren't
scored against master wordmark rules. Conservative without a formal
Boots logo guideline.
- boots_colour_palette — verifies CMYK/RGB/Hex spec values on creative-
guidance pages against canonical Boots Blue / Health Primary Blue /
Offer Red, plus visual sanity-check on artwork pages.
Existing checks tuned:
- boots_brand_name_accuracy: closed-world list semantics. Brands not on
the approved list now go to names_not_on_list (manual review) instead
of failing — the list is sourced from the original 7 docs and is known
incomplete (Remington, Imodium, Maybelline etc. are legitimate Boots-
stocked brands not on it).
- boots_tandc_wording: explicit font-weight caveat — Boots Sharp Regular
vs Light isn't reliably distinguishable by vision LLM at small sizes.
Surfaced via font_weight_caveat field + needs_manual_check value.
Page classifier (document_mode/page_classifier.py):
Heuristic tags each page as cover / checklist / palette / notes /
artwork. Validated on all 10 sample packs.
Strict-grade exemption (Profile.strict_grade flag):
Only artwork-classified pages count towards Pass/Fail. Cover, checklist,
palette, and notes pages are still QC'd and reported as Informational
but cannot trigger a Fail. Banner shows exactly which artwork-page
checks fell below 6.
Result writer extended:
- Per-page table with score + page_type pill for any page_each-scope
check (auto-applied as fallback)
- Strict-grade banner (red on violation, green when clean)
- Page_type pills throughout the per-page strip
Smoke-test result (Remington 4-page pack, 2026-05-05):
Overall 70.75/100, strict-grade Fail. After two iterations of prompt
tuning, all three remaining strict-grade violations are real catches:
orphan asterisk in T&Cs, "they may not be stocked" wording deviation,
missing "Charges may apply". brand_name_accuracy 7.0 (was 3.0 before
list fix), logo_compliance 9.5 (was 1.5 before lock-up path fix).
Local-only — not pushed to dev or merged to develop until after Boots
show-and-tell. Same posture as feature/axa-document-mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add Honda client with static_general and video_general profiles
- Add video QC capability using Gemini native video analysis (4 checks:
visual_quality, brand_consistency, text_legibility, pacing_flow)
- Add video_general profile assigned to all 8 clients
- Extend session lifetime with MSAL silent token refresh (proactive
every 45min + reactive on expiry), switch cache to localStorage
- Re-enable OCR layout measurements for Amazon checks
- Add scope boundary notes to all 6 Amazon checks to prevent cross-
check penalization (locale errors isolated to logo_country only)
- Relax margins left-alignment tolerance from 1% to 4% to account
for logo lockup internal padding
- Update brand guidelines DB with Amazon localization matrix and
processed Dove PDF summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New boots_static profile (5 checks, 2.0 weight each) for retail promotional
artwork compliance: caveat rules, brand name accuracy (~170 names), offer
mechanics, T&C wording, and currency/locale. Strict grading override (any
check <6 = Fail). Guidelines embedded from 7 thematic guidance documents.
Also splits client-specific documentation out of CLAUDE.md into separate
CLAUDE_LOREAL.md, CLAUDE_AMAZON.md, CLAUDE_BOOTS.md, and CLAUDE_DOW_JONES.md
files to reduce main file size.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert text_readability to original (overlap is a layout issue, not a
readability one — LLM kept scoring it Pass because text was readable).
New text_product_overlap check uses a step-by-step approach:
1. Define the product hero zone (including translucent/glass elements)
2. Identify all marketing text
3. Check spatial overlap between text and hero zone
4. Compare good vs bad layout patterns
L'Oreal Static profile now has 4 checks at 2.5 weight each (was 3
checks at 3.33). Total check count: 66.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New client with embedded brand guidelines for Dow Jones Corporate,
MarketWatch, and Wall Street Journal sub-brands. Guidelines sourced
from live.standards.site scrapes and baked into check prompts.
- dow_jones_static: 5 checks (logo, color, typography, square motif, photography)
- marketwatch_static: 6 checks (logo, color, typography, image treatment, layout, art direction)
- wsj_static: 6 checks (logo, color tiers, typography, imagery, capitalization, layout)
- System now has 7 clients, 12 profiles, 65 QC checks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add silent auth check every 5 minutes to detect expired sessions proactively,
showing a "Session Expired" prompt instead of failing silently on next action
- Rename amazon_box_placement to amazon_element_placement across module directory,
profile config, class name, and documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverted loreal_static from 2-check (visual_readability_contrast) to
3-check setup (language_consistency, text_readability, background_contrast)
to avoid score dilution. Updated text_readability and background_contrast
prompts from POS-focused to digital marketing, and added critical hidden/
invisible text detection (black-on-black, white-on-white scanning).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adjusted all 6 Amazon check weights to equal 1.67 each based on test
results showing incorrect scoring. Refined prompts for box placement
(format-aware positioning, better tape description), required elements
(subhead now optional for OOH), logo country (country match as primary
factor), margins (visual assessment over pixel estimates), and headline
layout (natural language break detection, tall format awareness).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Boots client with static_general profile and Amazon client with
6 new brand-specific QC checks based on ASD 2025 design guidelines:
amazon_required_elements, amazon_logo_country, amazon_typography,
amazon_headline_layout, amazon_margins, and amazon_box_placement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create visual_readability_contrast combined check merging text readability
and background contrast into a single LLM call for L'Oreal Static profile
- Update loreal_static.json to use combined check (2 checks, 100-point scale)
- Add client_id filtering to brand guidelines (upload, fetch, backfill migration)
- Restructure settings modal from 5 tabs to 4: Profile, Create New Profile,
Reference Assets, Reporting (removed Model Selection, merged Tools into Profile)
- Add GET /api/profile_usage_stats endpoint with summary cards and recent analyses
- Add POST /api/consolidate_reports endpoint generating HTML summary with
pass/fail highlighting from multiple selected reports
- Add report selection checkboxes and consolidation controls to saved files list
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Created 4 new "_general" QC check modules optimized for digital static assets:
- visual_hierarchy_general: Digital hierarchy assessment (removed POS/physical viewing distances)
- product_visibility_general: Digital product presentation (removed POS terminology)
- logo_visibility_general: Digital logo prominence (removed 3m/1m viewing distance requirements)
- call_to_action_general: Digital CTA effectiveness (added clickability and mobile considerations)
Updated Static General profile (static_general.json):
- Now includes 10 AI vision-focused checks
- Even weighting: 1.0 per check for 100-point scale
- Total weight: 10.0 for proper scoring calculation
- All checks assigned to Gemini LLM
- Updated description to clarify focus on AI vision capabilities
Profile focuses exclusively on checks that only AI vision models can perform,
excluding physical file properties that Twist system handles (file size, format,
resolution, naming, aspect ratio, bleed, crop marks, etc.).
10 checks in Static General profile:
1. text_readability_general
2. background_contrast_general
3. language_consistency
4. visual_hierarchy_general (NEW)
5. element_alignment
6. product_visibility_general (NEW)
7. logo_visibility_general (NEW)
8. call_to_action_general (NEW)
9. accessibility
10. inclusive
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
## New Features
### L'Oréal Static General Profile
- Created new profile with 3 checks optimized for digital marketing assets
- Even weighting (33.3% each) for 100-point scoring scale
- Removed print-specific requirements (3m viewing distance)
- Focus on marketing text vs product packaging distinction
### Multi-File Queue System (web_ui.html)
- Added file queue functionality for batch processing
- Users can now upload and process multiple files simultaneously
- Queue displays file status (pending, analyzing, complete, error)
- Individual file removal and queue clearing options
- Progress tracking for batch operations
### New General QC Checks
1. background_contrast_general
- Optimized for digital assets (no distance requirements)
- Checks logo, product, and marketing text contrast
- Detects overlapping and blending issues
- Provides element-by-element breakdown
2. text_readability_general
- Focus on marketing text only (excludes product packaging)
- Checks for overlapping elements
- Digital readability optimization
- Specific issue identification
3. language_consistency (enhanced)
- Better distinction between marketing and packaging text
- Detailed language detection and reporting
- Lists specific text analyzed
### Usage Tracking System
- Added usage_tracker.py for analysis logging
- Tracks user activity, profile usage, and costs
- Daily log files in JSONL format
- Cost estimation per LLM provider
## Bug Fixes
### Authentication & User Management
- Fixed Flask 'g' import missing issue
- Fixed user info access in background threads
- Pass user_info to threads instead of accessing g.user
- Improved error handling for usage logging
### HTML Report Generation
- Fixed missing analysis details in reports
- Now extracts and displays all JSON fields properly
- Shows comprehensive breakdowns:
- Analysis details
- Elements checked (logo, product, text)
- Marketing text found
- Issues identified
- Specific recommendations
- No more blank "Pass/Fail" results
### Scoring System
- Fixed usage_tracker to handle dict of check results (not list)
- Better handling of model_used field variations
- Skip non-dict check results gracefully
## Configuration Changes
### Model Versions (llm_config.py)
- Fixed invalid GPT-4.1 model ID to gpt-4o
- Added Gemini 3 Pro beta model option
- AVAILABLE_MODELS dict for UI selection
- Model version override support
### Profile Updates
- Static General: 3 checks, total weight 10.0
- Each check: text_readability_general (3.33), background_contrast_general (3.33), language_consistency (3.34)
- Maximum score: 100 points
## Technical Improvements
- Enhanced prompt engineering for consistent LLM outputs
- Mandatory detailed explanations in all checks
- Structured JSON responses with comprehensive fields
- Better error messages and fallback handling
- Client configuration support (client_config.py)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>