feat(hp_copy_review): single-check LLM grader against Source Messaging

Single Gemini call per asset. Prompt assembles attached Source Messaging summaries + media-plan language context + the asset image. Returns structured JSON with score, summary, and a findings array (priority, category, quote, issue, suggested fix, source reference). Empty findings = clean asset; missing reference -> score 0 with a clear message rather than running blind. Mirrors the boots_tandc_wording pattern: subclass FlaskAppTemplate, expose a static prompt template, let process_single_check inject reference-asset content and media-plan context at runtime. A standalone build_prompt() helper mirrors that assembly for unit- style smoke tests and ad-hoc prompt inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:25:30 +02:00 · 2026-05-17 21:25:30 +02:00 · 4c19a0fb9d
commit 4c19a0fb9d
parent 014a9cb8ff
2 changed files with 179 additions and 0 deletions
--- a/backend/visual_qc_apps/hp_copy_review/init.py
+++ b/backend/visual_qc_apps/hp_copy_review/init.py
--- a/backend/visual_qc_apps/hp_copy_review/app.py
+++ b/backend/visual_qc_apps/hp_copy_review/app.py
@ -0,0 +1,179 @@
+"""HP Copy Review — single-call LLM grader against canonical Source Messaging.
+
+This check compares all visible copy on an HP marketing asset (claims,
+headlines, body, disclaimers, footnotes, spec call-outs, brand mentions)
+against the canonical Source Messaging summaries attached as reference
+assets (.xlsx → Markdown summary via excel_processor).
+
+It returns a structured JSON object with a 0-10 score, a one-paragraph
+summary, and a `findings` array (priority / category / quote / issue /
+suggested_fix / source_reference). Empty findings on a clean asset is a
+valid result (score 9-10). When no Source Messaging is attached, the
+LLM is instructed to return score 0 with an explanatory message rather
+than grade blind.
+
+Reference assets and media-plan context (including `language`) are
+injected by `process_single_check` in `api_server.py` — this module
+exposes only the static prompt template. A standalone `build_prompt()`
+helper is provided for unit-style smoke tests and for any future caller
+that wants to assemble the full prompt outside the production path.
+"""
+
+import os
+import sys
+from typing import Iterable, Mapping, Optional, Sequence, Tuple
+
+# Add parent directory to path so we can import shared template
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
+
+from visual_qc_apps.flask_app_template import FlaskAppTemplate
+
+
+# --- Canonical prompt template ------------------------------------------------
+#
+# The reference-asset summary block ("CANONICAL SOURCE MESSAGING") is
+# prepended by `process_single_check` in `api_server.py` via
+# `get_reference_asset_content()`. Likewise the media-plan context block
+# ("=== MEDIA PLAN CONTEXT ===" with `- Language: <value>`) is appended
+# by `process_single_check`. We embed instructions that *reference* both
+# blocks so the LLM knows where to look.
+
+HP_COPY_REVIEW_PROMPT = """You are a copy reviewer for HP marketing materials. Your job is to compare the marketing asset against the canonical Source Messaging that has been attached as a reference asset, and report every copy discrepancy as a structured finding.
+
+WHAT YOU WILL BE GIVEN:
+1. One or more canonical Source Messaging summaries, attached above as REFERENCE ASSET GUIDELINES. Each Source Messaging file (e.g. `messi_core.xlsx`, `messi_mainstream.xlsx`) has been pre-summarised into Markdown and is the single source of truth for product claims, KSPs, disclaimers, spec call-outs, variant naming, and approved tone.
+2. A media-plan context block (appended below the prompt) which may include `- Language: <value>` and `- Country: <value>`. Treat the language value as the PRODUCT LANGUAGE the asset should be using (e.g. "UK English", "US English", "French (France)").
+3. The marketing asset image itself.
+
+WHAT TO DO:
+For every claim, headline, body line, disclaimer, footnote, spec call-out, and brand mention visible on the asset, evaluate it against the canonical Source Messaging. Flag:
+- Wording that disagrees with an approved KSP or claim.
+- Missing or incorrect mandatory disclaimers / legal footnotes / asterisked notes.
+- Spec call-outs that contradict the canonical spec (wrong number, wrong unit, wrong product variant).
+- Variant / product-name errors (e.g. "OmniDesk Mini" vs "OmniDesk Mini Core").
+- Tone / phrasing drift from the approved brand voice described in the source.
+- Brand-name misuse (HP, sub-brand capitalisation, trademark glyph misuse).
+- Language / locale mismatch against the media-plan PRODUCT LANGUAGE (e.g. "color" appearing in a UK English asset, or French copy on an asset specified as US English).
+
+OUTPUT — return ONE JSON object, and nothing else (no prose, no markdown fences outside the JSON code block). The shape:
+
+```json
+{
+  "score": <number 0-10>,
+  "summary": "<one-paragraph headline finding>",
+  "findings": [
+    {
+      "priority": "high" | "medium" | "low",
+      "category": "ksp" | "disclaimer" | "spec" | "variant" | "tone" | "brand-name" | "language" | "other",
+      "quote": "<exact quote from the asset>",
+      "issue": "<what's wrong>",
+      "suggested_fix": "<what it should say, citing the canonical source>",
+      "source_reference": "<where in the source messaging this comes from, e.g. file name + section heading>"
+    }
+  ]
+}
+```
+
+RULES:
+- If no Source Messaging reference asset is attached (i.e. there is no "REFERENCE ASSET GUIDELINES" block above describing canonical HP messaging), return EXACTLY:
+  {"score": 0, "summary": "No HP Source Messaging reference was attached — cannot grade copy without a canonical source.", "findings": []}
+  Do not attempt to grade copy from prior knowledge.
+- High-priority findings (factually-wrong claims, missing mandatory disclaimers, wrong product variant, wrong language) weight the score most heavily. A single high-priority finding should typically pull the score below 6.
+- Medium-priority findings are wording drift that changes nuance but not meaning, or missing optional supporting copy.
+- Low-priority findings are tone / style nits.
+- An empty `findings` array is a valid and expected result for a clean asset — in that case score 9 or 10 and write a short, positive summary.
+- The `quote` field must be the EXACT visible text from the asset, including punctuation. If you can read it, quote it.
+- `source_reference` should make it easy for a reviewer to verify the finding — name the Source Messaging file and the section/heading you matched against.
+- Return ONLY the JSON object inside a single ```json ... ``` code block. No surrounding prose, no explanations outside the JSON.
+"""
+
+
+def build_prompt(
+    reference_summaries: Optional[Sequence[Tuple[str, str]]] = None,
+    media_plan_row: Optional[Mapping[str, str]] = None,
+    base_prompt: str = HP_COPY_REVIEW_PROMPT,
+) -> str:
+    """Assemble a fully-rendered HP copy-review prompt for testing / inspection.
+
+    In production, `process_single_check` (api_server.py) does this
+    assembly itself: it prepends `get_reference_asset_content(...)` and
+    appends `build_media_plan_context(...)`. This helper mirrors that
+    flow so we can smoke-test the prompt assembly without running the
+    full server, and so callers that want to render the exact prompt
+    text for logging / debugging have a single entry point.
+
+    Args:
+        reference_summaries: List of (filename, markdown_summary) tuples,
+            one per attached Source Messaging .xlsx. Each summary is
+            already a Markdown string produced by `excel_processor`.
+            None or [] means "no canonical source attached" — in that
+            case we still build the prompt but omit the canonical block,
+            and the LLM will fall back to the score-0 rule.
+        media_plan_row: Mapping with optional `language`, `country`,
+            `placement`, etc. Only `language` and `country` are
+            rendered into the prompt here; the production flow uses
+            `build_media_plan_context` and includes more fields.
+        base_prompt: Override for the canonical prompt template (used
+            in tests where we want to inject a shorter stub).
+
+    Returns:
+        The fully-assembled prompt string, with the canonical source
+        messaging block (if any) prepended, the media-plan language /
+        country line(s) appended, and the base template in between.
+    """
+    parts = []
+
+    # 1. Canonical source messaging block — mirrors the shape of
+    #    `get_reference_asset_content` so the LLM sees a consistent
+    #    "REFERENCE ASSET GUIDELINES" heading whether it's running in
+    #    production or via this helper.
+    if reference_summaries:
+        ref_lines = ["\n\n=== REFERENCE ASSET GUIDELINES ===",
+                     "CANONICAL SOURCE MESSAGING:"]
+        for filename, summary in reference_summaries:
+            ref_lines.append(f"\n--- File: {filename} ---\n{summary}")
+        ref_lines.append("=== END REFERENCE ASSET GUIDELINES ===\n")
+        parts.append("\n".join(ref_lines))
+
+    # 2. The static prompt template itself.
+    parts.append(base_prompt)
+
+    # 3. Media-plan context (language / country). Production appends
+    #    the full `build_media_plan_context` block; here we render just
+    #    the language + country fields, which is what Step 5.6 asserts.
+    if media_plan_row:
+        mp_lines = ["\n=== MEDIA PLAN CONTEXT ==="]
+        if media_plan_row.get('language'):
+            mp_lines.append(f"- Language: {media_plan_row['language']}")
+        if media_plan_row.get('country'):
+            mp_lines.append(f"- Country: {media_plan_row['country']}")
+        mp_lines.append("=== END MEDIA PLAN CONTEXT ===")
+        parts.append("\n".join(mp_lines))
+
+    return "\n".join(parts)
+
+
+class HpCopyReviewApp(FlaskAppTemplate):
+    """HP Copy Review — single-call LLM copy grader against Source Messaging.
+
+    Subclasses `FlaskAppTemplate` so the check is auto-discovered by
+    `load_qc_apps()` in `api_server.py`. The class instance exposes
+    `self.prompt` (the canonical template plus the standard scoring
+    instructions appended by the template base class).
+
+    Reference asset summaries and media-plan context are injected at
+    runtime by `process_single_check` — this class does NOT call Gemini
+    directly. Response parsing is handled by
+    `extract_json_from_response` / `extract_score_from_result` in
+    api_server.py, which will lift `score`, `summary`, and `findings`
+    out of the JSON code block returned by the LLM.
+    """
+
+    def __init__(self):
+        super().__init__(__name__, HP_COPY_REVIEW_PROMPT)
+
+
+# Allow running this check standalone for ad-hoc testing
+if __name__ == "__main__":
+    app_instance = HpCopyReviewApp()
+    app_instance.run()