feat(hp_copy_review): single-check LLM grader against Source Messaging

Single Gemini call per asset. Prompt assembles attached Source
Messaging summaries + media-plan language context + the asset image.
Returns structured JSON with score, summary, and a findings array
(priority, category, quote, issue, suggested fix, source reference).
Empty findings = clean asset; missing reference -> score 0 with a
clear message rather than running blind.

Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate,
expose a static prompt template, let process_single_check inject
reference-asset content and media-plan context at runtime. A
standalone build_prompt() helper mirrors that assembly for unit-
style smoke tests and ad-hoc prompt inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
nickviljoen 2026-05-17 21:25:30 +02:00
parent 014a9cb8ff
commit 4c19a0fb9d
2 changed files with 179 additions and 0 deletions

View file

@ -0,0 +1,179 @@
"""HP Copy Review — single-call LLM grader against canonical Source Messaging.
This check compares all visible copy on an HP marketing asset (claims,
headlines, body, disclaimers, footnotes, spec call-outs, brand mentions)
against the canonical Source Messaging summaries attached as reference
assets (.xlsx Markdown summary via excel_processor).
It returns a structured JSON object with a 0-10 score, a one-paragraph
summary, and a `findings` array (priority / category / quote / issue /
suggested_fix / source_reference). Empty findings on a clean asset is a
valid result (score 9-10). When no Source Messaging is attached, the
LLM is instructed to return score 0 with an explanatory message rather
than grade blind.
Reference assets and media-plan context (including `language`) are
injected by `process_single_check` in `api_server.py` this module
exposes only the static prompt template. A standalone `build_prompt()`
helper is provided for unit-style smoke tests and for any future caller
that wants to assemble the full prompt outside the production path.
"""
import os
import sys
from typing import Iterable, Mapping, Optional, Sequence, Tuple
# Add parent directory to path so we can import shared template
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
from visual_qc_apps.flask_app_template import FlaskAppTemplate
# --- Canonical prompt template ------------------------------------------------
#
# The reference-asset summary block ("CANONICAL SOURCE MESSAGING") is
# prepended by `process_single_check` in `api_server.py` via
# `get_reference_asset_content()`. Likewise the media-plan context block
# ("=== MEDIA PLAN CONTEXT ===" with `- Language: <value>`) is appended
# by `process_single_check`. We embed instructions that *reference* both
# blocks so the LLM knows where to look.
HP_COPY_REVIEW_PROMPT = """You are a copy reviewer for HP marketing materials. Your job is to compare the marketing asset against the canonical Source Messaging that has been attached as a reference asset, and report every copy discrepancy as a structured finding.
WHAT YOU WILL BE GIVEN:
1. One or more canonical Source Messaging summaries, attached above as REFERENCE ASSET GUIDELINES. Each Source Messaging file (e.g. `messi_core.xlsx`, `messi_mainstream.xlsx`) has been pre-summarised into Markdown and is the single source of truth for product claims, KSPs, disclaimers, spec call-outs, variant naming, and approved tone.
2. A media-plan context block (appended below the prompt) which may include `- Language: <value>` and `- Country: <value>`. Treat the language value as the PRODUCT LANGUAGE the asset should be using (e.g. "UK English", "US English", "French (France)").
3. The marketing asset image itself.
WHAT TO DO:
For every claim, headline, body line, disclaimer, footnote, spec call-out, and brand mention visible on the asset, evaluate it against the canonical Source Messaging. Flag:
- Wording that disagrees with an approved KSP or claim.
- Missing or incorrect mandatory disclaimers / legal footnotes / asterisked notes.
- Spec call-outs that contradict the canonical spec (wrong number, wrong unit, wrong product variant).
- Variant / product-name errors (e.g. "OmniDesk Mini" vs "OmniDesk Mini Core").
- Tone / phrasing drift from the approved brand voice described in the source.
- Brand-name misuse (HP, sub-brand capitalisation, trademark glyph misuse).
- Language / locale mismatch against the media-plan PRODUCT LANGUAGE (e.g. "color" appearing in a UK English asset, or French copy on an asset specified as US English).
OUTPUT return ONE JSON object, and nothing else (no prose, no markdown fences outside the JSON code block). The shape:
```json
{
"score": <number 0-10>,
"summary": "<one-paragraph headline finding>",
"findings": [
{
"priority": "high" | "medium" | "low",
"category": "ksp" | "disclaimer" | "spec" | "variant" | "tone" | "brand-name" | "language" | "other",
"quote": "<exact quote from the asset>",
"issue": "<what's wrong>",
"suggested_fix": "<what it should say, citing the canonical source>",
"source_reference": "<where in the source messaging this comes from, e.g. file name + section heading>"
}
]
}
```
RULES:
- If no Source Messaging reference asset is attached (i.e. there is no "REFERENCE ASSET GUIDELINES" block above describing canonical HP messaging), return EXACTLY:
{"score": 0, "summary": "No HP Source Messaging reference was attached — cannot grade copy without a canonical source.", "findings": []}
Do not attempt to grade copy from prior knowledge.
- High-priority findings (factually-wrong claims, missing mandatory disclaimers, wrong product variant, wrong language) weight the score most heavily. A single high-priority finding should typically pull the score below 6.
- Medium-priority findings are wording drift that changes nuance but not meaning, or missing optional supporting copy.
- Low-priority findings are tone / style nits.
- An empty `findings` array is a valid and expected result for a clean asset in that case score 9 or 10 and write a short, positive summary.
- The `quote` field must be the EXACT visible text from the asset, including punctuation. If you can read it, quote it.
- `source_reference` should make it easy for a reviewer to verify the finding name the Source Messaging file and the section/heading you matched against.
- Return ONLY the JSON object inside a single ```json ... ``` code block. No surrounding prose, no explanations outside the JSON.
"""
def build_prompt(
reference_summaries: Optional[Sequence[Tuple[str, str]]] = None,
media_plan_row: Optional[Mapping[str, str]] = None,
base_prompt: str = HP_COPY_REVIEW_PROMPT,
) -> str:
"""Assemble a fully-rendered HP copy-review prompt for testing / inspection.
In production, `process_single_check` (api_server.py) does this
assembly itself: it prepends `get_reference_asset_content(...)` and
appends `build_media_plan_context(...)`. This helper mirrors that
flow so we can smoke-test the prompt assembly without running the
full server, and so callers that want to render the exact prompt
text for logging / debugging have a single entry point.
Args:
reference_summaries: List of (filename, markdown_summary) tuples,
one per attached Source Messaging .xlsx. Each summary is
already a Markdown string produced by `excel_processor`.
None or [] means "no canonical source attached" in that
case we still build the prompt but omit the canonical block,
and the LLM will fall back to the score-0 rule.
media_plan_row: Mapping with optional `language`, `country`,
`placement`, etc. Only `language` and `country` are
rendered into the prompt here; the production flow uses
`build_media_plan_context` and includes more fields.
base_prompt: Override for the canonical prompt template (used
in tests where we want to inject a shorter stub).
Returns:
The fully-assembled prompt string, with the canonical source
messaging block (if any) prepended, the media-plan language /
country line(s) appended, and the base template in between.
"""
parts = []
# 1. Canonical source messaging block — mirrors the shape of
# `get_reference_asset_content` so the LLM sees a consistent
# "REFERENCE ASSET GUIDELINES" heading whether it's running in
# production or via this helper.
if reference_summaries:
ref_lines = ["\n\n=== REFERENCE ASSET GUIDELINES ===",
"CANONICAL SOURCE MESSAGING:"]
for filename, summary in reference_summaries:
ref_lines.append(f"\n--- File: {filename} ---\n{summary}")
ref_lines.append("=== END REFERENCE ASSET GUIDELINES ===\n")
parts.append("\n".join(ref_lines))
# 2. The static prompt template itself.
parts.append(base_prompt)
# 3. Media-plan context (language / country). Production appends
# the full `build_media_plan_context` block; here we render just
# the language + country fields, which is what Step 5.6 asserts.
if media_plan_row:
mp_lines = ["\n=== MEDIA PLAN CONTEXT ==="]
if media_plan_row.get('language'):
mp_lines.append(f"- Language: {media_plan_row['language']}")
if media_plan_row.get('country'):
mp_lines.append(f"- Country: {media_plan_row['country']}")
mp_lines.append("=== END MEDIA PLAN CONTEXT ===")
parts.append("\n".join(mp_lines))
return "\n".join(parts)
class HpCopyReviewApp(FlaskAppTemplate):
"""HP Copy Review — single-call LLM copy grader against Source Messaging.
Subclasses `FlaskAppTemplate` so the check is auto-discovered by
`load_qc_apps()` in `api_server.py`. The class instance exposes
`self.prompt` (the canonical template plus the standard scoring
instructions appended by the template base class).
Reference asset summaries and media-plan context are injected at
runtime by `process_single_check` this class does NOT call Gemini
directly. Response parsing is handled by
`extract_json_from_response` / `extract_score_from_result` in
api_server.py, which will lift `score`, `summary`, and `findings`
out of the JSON code block returned by the LLM.
"""
def __init__(self):
super().__init__(__name__, HP_COPY_REVIEW_PROMPT)
# Allow running this check standalone for ad-hoc testing
if __name__ == "__main__":
app_instance = HpCopyReviewApp()
app_instance.run()