nickviljoen cec11f1f6a Tune Boots PPack prompts: superscript guard, ALL CAPS / logotype exceptions, weight/sizing limits

Three rounds of prompt tuning against the Remington (4p), Easter Overlay
(18p), and Grenade (7p) sample packs. Easter Overlay (the noisiest)
climbed 72.38 → 78.97 → 80.04 across iterations, with strict-grade
violations dropping 27 → 18 → 14. Remaining violations are now genuine
compliance issues — the noise patterns are cleared.

boots_caveat_compliance:
- Superscript guard: vision LLM was flagging every roundel asterisk as
  superscript because the * glyph naturally sits high in its line.
  Strict two-feature rule now required (raised baseline AND visibly
  shrunk ~50-60% of body). Borderline cases → "needs_manual_check"
  with new superscript_caveat field. Caveat avg 4.4 → 7.27.
- Same vision-LLM caveat applied to weight_matching (Light vs Regular
  at small sizes is below detection threshold) and sizing_compliant
  (1-2pt size differences below detection threshold). New weight_caveat
  and sizing_caveat fields. Reserved 1-2 score band for unambiguous
  critical violations only.
- Explicit scoring principle: "when in doubt, prefer 7-8 with
  manual_check flags over a lower confident-violation score".

boots_brand_name_accuracy:
- ALL CAPS retail convention now explicitly acceptable. L'OREAL,
  ESTEE LAUDER, MAYBELLINE etc. no longer flagged as casing errors —
  only structural element mismatches (accents, hyphens, apostrophes,
  special chars) count.
- Stylised brand logotype exception: known logomarks like `17` for
  SEVENTEEN, &SISTERS ampersand styling, e.l.f. dot rendering are
  Pass — surfaced via new logotype_observations field.
- Brand name avg 5.53 → 7.47 → 6.67 (LLM run-to-run variability).

Strongest real catch in dataset: Easter Overlay page 14 is labelled
for the ROI market in production notes but uses £ instead of € on
the artwork. Exactly the pre-press error worth surfacing. Caught
consistently across all runs by boots_currency_locale.

CLAUDE_BOOTS.md updated with three-pack smoke-test table, vision-LLM
limitations summary, and the four reusable prompt-tuning patterns
that worked on this build.

Local-only — feature/boots-ppack remains unmerged until after Boots
show-and-tell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 16:26:11 +02:00

9.5 KiB

Raw Permalink Blame History

Boots Client Documentation

Referenced from main CLAUDE.md. This file contains detailed Boots QC check descriptions, guidance document sources, and known limitations.

Overview

Boots is a retail client with promotional artwork compliance checks. Unlike other clients that focus on brand identity or marketing creative quality, Boots checks are strictly compliance and technical specs -- no creative/aesthetic assessment. The checks are derived from 7 thematic guidance documents the client's team previously used with a LibreChat agent.

Scoring override: Same as L'Oreal -- any individual check scoring below 6 forces an overall Fail.

Guidance documents location: /Users/nickviljoen/Desktop/AI_QC_Bitbucket/boots/Recieved Docs/ Original agent prompt: /Users/nickviljoen/Desktop/AI_QC_Bitbucket/boots/System Instruction Existing Agent.rtf

Boots QC Tools

Seven checks for Boots retail promotional artwork compliance. Profiles: boots_static (single-asset, 5 checks) and boots_ppack (multi-page production-pack document mode, all 7 checks).

Tool	Source Document(s)	What it checks
`boots_caveat_compliance`	ASTERISK RULES	Caveat ordering (* -> dagger -> -> double dagger -> triangle -> clover), sizing per context (1/3 headline, 1/2 sub-headline, same as body), NO SUPERSCRIPT** (critical), font weight matching, plus orphan-asterisk detection (smoke test caught a `*` in T&Cs with no matching marker in main copy)
`boots_brand_name_accuracy`	BRAND NAMES (3 pages)	Exact spelling of ~170 brand + product names including accents, apostrophes, hyphens, casing. Closed-world list — brands not on the list are surfaced in `names_not_on_list` for manual review and DO NOT cause a Fail; only spelling errors against listed brands fail.
`boots_offer_mechanics`	OFFER ROUNDELS + VALUE MECHANICS	Offer roundel format matches approved categories (price reductions, multibuys, threshold spend, FREE/GWP, points), spaced-caps styling, "Our best..." approved phrases
`boots_tandc_wording`	OFFER T&Cs + CLICK AND COLLECT + LOCK-UP T&Cs	Standard offer T&C wording (3FOR2, BOGOF, etc.), C&C exact text + font weight + hierarchy, lock-up T&Cs (Advantage Card, Parenting Club, Price Advantage, Pyramid), offer date formatting. Font weight is best-effort — flagged via `font_weight_caveat` field for manual verification.
`boots_currency_locale`	Agent prompt cross-cutting rules	Currency: GBP for UK / EUR for ROI, URLs: boots.com / boots.ie, consistent locale throughout asset
`boots_logo_compliance`	Built from PPack observation (no formal Boots logo guideline supplied)	Three-path scoring: A) master wordmark (strict — typeface, colour, orientation, distortion, clear space), B) partner / production lock-up (lenient — "OLIVER x BOOTS" footers etc. follow lock-up conventions, NOT master wordmark rules), C) no Boots branding (N/A neutral).
`boots_colour_palette`	Boots canonical palette derived from creative-guidance pages	Two modes: A) creative-guidance pages verify CMYK/RGB/Hex spec values match Boots Blue (#05054b), Health Primary Blue (#5dc4e9), Offer Red (#d3072a); B) artwork pages sanity-check dominant brand colours visually.

Boots Production Pack (`boots_ppack`) profile — multi-page document mode

For multi-page production packs (4-18 pages each, exported from PowerPoint as PDF). Built on top of AXA's document-mode infrastructure; all 7 checks run at scope: page_each with strict-grade override.

Page classifier (backend/document_mode/page_classifier.py): heuristic tags every page as cover / checklist / palette / notes / artwork. Decision order:

Strong palette (≥3 of CMYK/RGB/Hexadecimal headings + ≥2 hex colours) → palette
Strong checklist (≥3 of "Asset suitable", "Fonts present", "Resolution fine", etc.) → checklist
Artwork signals (T&Cs, offer mechanics, prices, GSL barcode) → artwork
Yellow Notes / Client Queries with no artwork signals → notes
Sparse Production Pack title block → cover (doubles as brief / context page)
Default → artwork (fail-safe: false positives on artwork are recoverable)

Strict-grade exemption (Profile.strict_grade=True in profile_config.py): only artwork-classified pages count towards Pass/Fail. Cover, checklist, palette, and notes pages are scored and surfaced in the report as informational but cannot trigger a Fail. The strict-grade banner in the HTML report lists exactly which artwork-page checks fell below 6.

Cost per pack: 7 checks × pages = roughly £0.05-0.30 per pack. 4-page packs ~£0.10, 18-page packs ~£0.30.

Smoke-test results (2026-05-05): all three test packs Fail by strict-grade — but the remaining violations are genuine compliance issues, not noise. Across three rounds of prompt tuning, Easter Overlay (the noisiest 18-page pack) climbed from 72.38 → 78.97 → 80.04. Strict-grade violations dropped from 27 → 18 → 14 across 10 pages.

Pack	Pages	Final overall	Strict-grade violations
Remington (1.8MB, 4 pages)	4	70.75	3 (orphan asterisk, T&C wording deviations)
Easter Overlay (3MB, 18 pages)	18	80.04	14 (real catches across brand_name / T&C / offer_mechanics / currency_locale)
Grenade (5.9MB, 7 pages)	7	78.0	3 (caveat orphan, meal-deal format)

The strongest real catch in the dataset: Easter Overlay page 14 is labelled for the ROI market in production notes but uses £ instead of € on the artwork — caught by boots_currency_locale. That's exactly the kind of pre-press error worth surfacing.

Vision-LLM limitations explicitly handled in prompts (so the Boots team understands what's reliable vs best-effort):

Font weight (Boots Sharp Regular vs Light) at small sizes — surfaced via font_weight_caveat (T&C check) and weight_caveat (caveat check)
Asterisk superscript at small sizes — surfaced via superscript_caveat (asterisk glyph naturally sits high; only flag when raised AND shrunk)
Caveat size comparison at small sizes — surfaced via sizing_caveat (1-2pt differences below detection threshold)
Subtle accent marks on brand names — accent_marks_verifiable flag

Tuning patterns that worked (worth knowing for future client onboards):

"Closed-world list" semantics — when an approved-list reference is incomplete (third-party brands, font lists, etc.), absence from list ≠ failure. Surface for manual review at neutral 7/10, flag misspellings of listed items as Fail.
"ALL CAPS retail convention" exception — brand names rendered in caps (L'OREAL, ESTEE LAUDER) are typographic choices, not spelling errors.
"Stylised brand logotype" exception — known logomarks like 17 for SEVENTEEN are Pass.
"Best-effort with manual_check flag" pattern — for vision-LLM limitations, score 7-8 with explicit caveat field rather than confident-but-wrong Fail.

Guidance Document Summary

1. ASTERISK RULES (1 page)

Mandatory ordering: * -> dagger -> ** -> double dagger -> triangle -> clover
Sizing depends on context (headline = 1/3, sub-headline/roundels = 1/2, body = same size)
NO SUPERSCRIPT ever
Font weight must match between caveat and its T&Cs reference
Caveat in main copy must not be smaller than in T&Cs

2. BRAND NAMES (3 pages)

~100 brand names with exact spelling requirements
~70 product/range names with exact spelling requirements
Key patterns: accent marks (Lancome, Tresemme), internal caps (BaByliss, SkinActive), hyphens (Bio-Oil, La Roche-Posay), apostrophes (Burt's Bees), special chars (e.l.f., So...?)

3. CLICK AND COLLECT (1 page)

One standard wording: "Available through Click & Collect, but may not be stocked in all stores. Charges may apply"
Set in Boots Sharp Regular (not Light)
Comes first in T&C hierarchy
Preferred line break after "not"

4. LOCK-UP T&Cs (1 page)

Advantage Card, Parenting Club, Price Advantage, Pyramid
Full vs condensed copy versions
UK vs ROI URL tailoring (boots.com / boots.ie)

5. OFFER ROUNDELS (2 pages)

Visual templates for all approved offer types
Spaced capitals in roundels
Categories: Everyday Low, Save/Price Reductions, Multibuys, Threshold Spend, FREE/GWP

6. OFFER TERMS AND CONDITIONS (1 page)

Standard T&Cs per offer type (exact wording mandated)
Offer date format rules (same month vs different months)
Qualifying lines

7. VALUE MECHANICS (1 page)

Client-approved offer mechanics and messages
"Our best..." approved phrases (10 variants)
Points-based offers (Double/Triple points, etc.)

Known Limitations

Accent marks on brand names: Vision LLMs may struggle with subtle accent differences (e vs e with acute, o vs o with circumflex). The check is most reliable for casing, hyphens, apostrophes, and spacing. Accent accuracy should be verified manually for critical assets.
Font weight distinction: LLMs cannot reliably distinguish Boots Sharp Regular from Boots Sharp Light in the T&C wording check. This rule is documented but may require manual verification.
Caveat sizing ratios: LLMs can assess relative sizing (larger/smaller/similar) but cannot measure exact point size ratios (1/3 vs 1/2). The check focuses on visually obvious violations.
No test assets yet: Checks built from guidance documents only. Prompt tuning will be needed once test assets are available from the client.

Client Contacts (from original agent prompt)

QC Lead: Lee Hammond (leehammond@oliver.agency) -- for guidance ambiguity/clarification
Agent Owner: George Colesmith (georgecolesmith@oliver.agency) -- for missing information

9.5 KiB Raw Permalink Blame History Unescape Escape