docs(plan): HP onboarding cycle 1 implementation plan

7-task plan against 2026-05-17-hp-cycle-1-onboarding-design.md:
excel_processor → .xlsx dispatch → media-plan language field →
HP client+profile → hp_copy_review check → findings-table renderer
→ dev smoke + deploy. Lightweight verification posture (py_compile +
imports + profile load + python3 -c mini-tests + dev smoke runs)
to match the project's existing style — no pytest scaffolding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
nickviljoen 2026-05-17 20:32:56 +02:00
parent 53ba67c2c0
commit 7d178f11ee

View file

@ -0,0 +1,786 @@
# HP Onboarding — Cycle 1 Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Implement the `hp_copy_review` check and its supporting infrastructure per `docs/superpowers/specs/2026-05-17-hp-cycle-1-onboarding-design.md`, replacing the deprecated `hp-copy` PHP/Make.com POC.
**Architecture:** New `excel_processor.py` mirrors `pdf_processor.py` to convert HP Source Messaging Excels into structured Markdown summaries at upload time. A single new `hp_copy_review` QC check assembles those summaries + media-plan language metadata + the asset image into one Gemini prompt and returns a structured findings list. HP gets a real client config entry, a dedicated profile, and routing for `.xlsx` uploads through the existing `/api/brand_guidelines` endpoint.
**Tech Stack:**
- openpyxl 3.x (existing dep, used by `media_plan_processor.py`)
- Gemini 2.5 Pro via `llm_config.py` (existing)
- Existing reference-asset / brand-guidelines flow
- Existing media-plan processor
- No new external dependencies
**Branch:** `feature/hp-cycle-1-onboarding` from `develop`.
**Testing posture:** This project does not use pytest. Verification matches `backend/scripts/test-system.sh`: `py_compile`, import checks, profile-load tests, and real-asset smoke runs on the dev server. Inline `python3 -c "..."` snippets stand in for unit tests where helpful.
---
## File Structure
**New files:**
- `backend/excel_processor.py` — Excel ingestion + Gemini summarisation
- `backend/profiles/hp_copy_review.json` — new profile
- `backend/visual_qc_apps/hp_copy_review/app.py` — new QC check
- `backend/visual_qc_apps/hp_copy_review/__init__.py` — empty module marker
**Modified files:**
- `backend/client_config.py` — HP entry promoted from placeholder
- `backend/api_server.py``.xlsx` dispatch on `/api/brand_guidelines` POST + findings-table rendering in both HTML generators
- `backend/media_plan_processor.py``language` column extraction + metadata surfacing
- `CLAUDE.md` — HP row updated from "_scope pending_" to the new doc reference (small)
**Test fixtures (placed manually on disk, not committed):**
- `backend/tests/fixtures/hp/messi_core_source_messaging.xlsx`
- `backend/tests/fixtures/hp/messi_mainstream_source_messaging.xlsx`
- `backend/tests/fixtures/hp/gaston_source_messaging.xlsx`
The user-provided originals live at `/Users/nickviljoen/Desktop/AI_QC_Bitbucket/hp/recieved_docs/excel/` — those get *copied* (not symlinked) into `backend/tests/fixtures/hp/` for repeatable local verification. The directory is gitignored.
---
### Task 1: Excel processor module
Implement `excel_processor.py` mirroring `pdf_processor.py`. This is the most foundational change and the largest single module of new code.
**Files:**
- Create: `backend/excel_processor.py`
- Create: `backend/tests/fixtures/hp/` (gitignored)
- Modify: `.gitignore` (add `backend/tests/fixtures/`)
- [ ] **Step 1.1: Set up the fixtures directory**
```bash
mkdir -p backend/tests/fixtures/hp
cp '/Users/nickviljoen/Desktop/AI_QC_Bitbucket/hp/recieved_docs/excel/26C2 Messi Core HP OmniDesk Mini Desktop PC Source Messaging 04-10 (1).xlsx' backend/tests/fixtures/hp/messi_core.xlsx
cp '/Users/nickviljoen/Desktop/AI_QC_Bitbucket/hp/recieved_docs/excel/26C2 Messi Mainstream HP OmniDesk Mini Desktop PC Source Messaging 04-10 (1).xlsx' backend/tests/fixtures/hp/messi_mainstream.xlsx
cp '/Users/nickviljoen/Desktop/AI_QC_Bitbucket/hp/recieved_docs/excel/HP AluminiumBook Source Messaging - Gaston 05-06.xlsx' backend/tests/fixtures/hp/gaston.xlsx
ls backend/tests/fixtures/hp/
```
Expected: three `.xlsx` files listed.
- [ ] **Step 1.2: Add gitignore rule for fixtures**
Add to `.gitignore` near the existing legacy-env block:
```
# Local test fixtures (real HP Source Messaging files; not for commit)
backend/tests/fixtures/
```
- [ ] **Step 1.3: Read `pdf_processor.py` as the pattern source**
```bash
wc -l backend/pdf_processor.py
```
Read the file end-to-end. Identify: public surface (`process_pdf_file`), helper for raw extraction, helper for LLM summarisation, file path conventions (`brand_guidelines/files/{file_id}_summary.txt`), error handling shape, retry pattern, return tuple `(summary_text, summary_path)`.
- [ ] **Step 1.4: Create `excel_processor.py` skeleton**
Create `backend/excel_processor.py` with:
```python
"""Excel reference-asset processor for HP Source Messaging files.
Mirrors pdf_processor.py: openpyxl extracts raw cell content from
every sheet, Gemini summarises the result into structured Markdown
under brand_guidelines/files/{file_id}_summary.md. The check
hp_copy_review pulls that Markdown into its prompt at QC time.
"""
import os
from typing import Tuple
from openpyxl import load_workbook
from llm_config import call_gemini_text # adjust to actual export name
BRAND_GUIDELINES_DIR = os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'brand_guidelines', 'files'
)
# Cap raw extraction at ~50K chars to keep the summary prompt bounded.
# A 30-row, 12-column workbook is ~10-15K chars in practice; this leaves
# headroom for HP's larger source files without blowing the prompt budget.
_RAW_EXTRACTION_CAP = 50_000
def process_excel_file(file_path: str, file_id: str) -> Tuple[str, str]:
"""Extract + summarise an HP Source Messaging Excel.
Returns (summary_text, summary_path). Saves the summary as
{file_id}_summary.md under BRAND_GUIDELINES_DIR. Never raises —
on failure, writes a degraded summary containing the raw extraction
so the reference asset is still usable, and returns that.
"""
raw_text = _extract_workbook_text(file_path)
try:
summary = _summarise_with_gemini(raw_text, os.path.basename(file_path))
except Exception as e:
summary = (
f"# {os.path.basename(file_path)} (degraded — summary failed)\n\n"
f"Gemini summarisation failed: {type(e).__name__}: {e}\n\n"
f"## Raw extraction\n\n```\n{raw_text}\n```\n"
)
os.makedirs(BRAND_GUIDELINES_DIR, exist_ok=True)
summary_path = os.path.join(BRAND_GUIDELINES_DIR, f"{file_id}_summary.md")
with open(summary_path, 'w', encoding='utf-8') as f:
f.write(summary)
return summary, summary_path
```
- [ ] **Step 1.5: Implement `_extract_workbook_text`**
Append:
```python
def _extract_workbook_text(file_path: str) -> str:
"""Read every sheet, dump as 'Sheet: <name>\\n<tab-aligned rows>\\n\\n'."""
wb = load_workbook(file_path, data_only=True, read_only=True)
parts = []
total_chars = 0
for sheet in wb.worksheets:
parts.append(f"Sheet: {sheet.title}\n")
for row in sheet.iter_rows(values_only=True):
# Skip rows where every cell is None/empty
if not any((c is not None and str(c).strip()) for c in row):
continue
line = '\t'.join(('' if c is None else str(c)) for c in row)
parts.append(line + '\n')
total_chars += len(line) + 1
if total_chars >= _RAW_EXTRACTION_CAP:
parts.append(f"\n[truncated — exceeded {_RAW_EXTRACTION_CAP}-char cap]\n")
return ''.join(parts)
parts.append('\n')
wb.close()
return ''.join(parts)
```
- [ ] **Step 1.6: Implement `_summarise_with_gemini`**
Append:
```python
_SYSTEM_PROMPT = """You're processing an HP Source Messaging Excel into a structured Markdown reference. Output these sections exactly, in this order:
## Product / Variant
(brand, product line, variant if any — e.g. "HP OmniDesk Mini — Core")
## Key Selling Points (KSPs)
For each KSP: heading, value proposition, supporting body copy, message-length variants (ultra-short / short / medium / long if present in the source).
## Disclaimers / Footnotes
Numbered list, exact wording, what claim each footnote anchors to.
## Approved Brand and Product Names
Exact spellings, including trademark glyphs (™, ®, ©).
## Variant Notes / Watch-outs
Anything explicitly marked variant-specific (e.g. "Mainstream only", "Core only", "must not appear in entry tier").
## Verboten Phrasing
Any explicitly disallowed or deprecated phrasing called out in the source.
Be exhaustive but concise. Quote exactly where the source is explicit. If a section has no content in this source, write 'None specified' under it — do not omit the section heading."""
def _summarise_with_gemini(raw_text: str, source_filename: str) -> str:
user_prompt = (
f"Source filename: {source_filename}\n\n"
f"Raw cell content:\n\n```\n{raw_text}\n```"
)
# call_gemini_text is the existing text-only Gemini wrapper in llm_config.
# If the actual export name differs, adjust in Step 1.7 verification.
return call_gemini_text(
system_prompt=_SYSTEM_PROMPT,
user_prompt=user_prompt,
model='gemini-2.5-pro',
)
```
- [ ] **Step 1.7: Verify llm_config exports a usable text-only Gemini wrapper**
```bash
grep -nE "def (call_gemini|gemini_text|generate.*gemini)" backend/llm_config.py | head -20
```
If `call_gemini_text` doesn't exist under that name, find the closest analogue (look at how `pdf_processor.py` calls Gemini) and update the import + call site in `excel_processor.py` accordingly.
- [ ] **Step 1.8: Syntax + import verification**
```bash
cd backend && python3 -m py_compile excel_processor.py && python3 -c "import excel_processor; print('OK', excel_processor.BRAND_GUIDELINES_DIR)"
```
Expected: `OK <path>/brand_guidelines/files`
- [ ] **Step 1.9: Run the processor against the Messi-Core fixture**
```bash
cd backend && python3 -c "
import os, sys
sys.path.insert(0, '.')
from excel_processor import process_excel_file
summary, path = process_excel_file('tests/fixtures/hp/messi_core.xlsx', 'test-messi-core')
print('summary_path:', path)
print('summary_len:', len(summary))
print('first 800 chars:')
print(summary[:800])
"
```
Expected: summary is 15004000 chars, contains `## Key Selling Points`, `## Disclaimers`, `## Approved Brand and Product Names`, and at least one KSP-level content snippet referencing "OmniDesk" or "Mini".
- [ ] **Step 1.10: Commit Task 1**
```bash
git add backend/excel_processor.py .gitignore
git commit -m "feat(excel-processor): add openpyxl + Gemini summary pipeline for HP Source Messaging
Mirrors pdf_processor.py — public process_excel_file() reads any HP
Source Messaging Excel, extracts cells via openpyxl (skipping empty
rows, capped at 50K chars), and summarises into structured Markdown
via Gemini 2.5 Pro. Output saved as brand_guidelines/files/{file_id}_summary.md.
On Gemini failure the processor writes a degraded summary containing
the raw extraction so the reference asset stays usable. Test fixtures
(real HP Excels) live under backend/tests/fixtures/hp/ and are gitignored."
```
---
### Task 2: `.xlsx` dispatch on the reference asset upload endpoint
Wire `excel_processor.process_excel_file` into the `/api/brand_guidelines` POST handler at `backend/api_server.py:4771` so `.xlsx` uploads route correctly.
**Files:**
- Modify: `backend/api_server.py` (around the existing `/api/brand_guidelines` POST handler near line 4771)
- [ ] **Step 2.1: Read the existing handler to find the PDF dispatch**
```bash
sed -n '4760,4900p' backend/api_server.py
```
Identify: where the extension is checked, where `pdf_processor.process_pdf_file` is called, and what's returned to the client.
- [ ] **Step 2.2: Add the `.xlsx` branch**
Edit the POST handler to dispatch by extension. The exact change depends on the existing code shape — pattern is:
- Where the handler currently checks for `.pdf` and calls `pdf_processor.process_pdf_file(...)`, add an `elif filename.lower().endswith('.xlsx')` branch that imports `excel_processor` and calls `excel_processor.process_excel_file(...)` with the same arg signature.
- The DB record / response shape should be identical to the PDF path — same `file_id`, same `status`, same return JSON.
- Cover image: PDF has one; Excel doesn't. If the DB record assigns a `cover_path`, set it to `None` for Excels.
- [ ] **Step 2.3: Syntax + import verification**
```bash
cd backend && python3 -m py_compile api_server.py && python3 -c "import api_server; print('api_server OK')"
```
- [ ] **Step 2.4: Commit Task 2**
```bash
git add backend/api_server.py
git commit -m "feat(brand-guidelines): route .xlsx uploads to excel_processor
The /api/brand_guidelines POST handler now dispatches by extension:
.pdf → pdf_processor.process_pdf_file (existing), .xlsx →
excel_processor.process_excel_file (new). Same DB record shape;
cover image is null for Excel since there's no first-page analogue."
```
---
### Task 3: Media plan `language` column
Add `language` to the media-plan column extraction and surface it into the prompt context.
**Files:**
- Modify: `backend/media_plan_processor.py`
- [ ] **Step 3.1: Locate the column-extraction logic**
```bash
grep -n -E "country|placement|vendor|dimensions" backend/media_plan_processor.py | head -10
```
These are the existing matched-row metadata fields. The `language` field will live alongside them.
- [ ] **Step 3.2: Add `language` to the case-insensitive header match list**
Edit the column-mapping section to recognise `Language` / `language` / `LANGUAGE` headers and store the value in the matched-row dict under the key `language`.
- [ ] **Step 3.3: Surface `language` in the prompt context block**
Locate where the matched-row dict is rendered as text injected into check prompts (the function that returns the "media plan context" string used by `process_single_check`). Add a line:
```python
if row.get('language'):
lines.append(f"Language: {row['language']}")
```
— preserving the existing structure (no line if absent).
- [ ] **Step 3.4: Syntax + import verification**
```bash
cd backend && python3 -m py_compile media_plan_processor.py && python3 -c "import media_plan_processor; print('OK')"
```
- [ ] **Step 3.5: Quick functional test with a synthetic plan**
```bash
cd backend && python3 -c "
# Mock test: build a minimal row dict with a language field and confirm the
# prompt-context formatter emits 'Language: <value>'. Exact function name to
# locate during Step 3.3 — adjust below.
from media_plan_processor import format_matched_row_for_prompt # adjust if named differently
row = {'country': 'UK', 'language': 'UK English', 'placement': 'eTail tile'}
print(format_matched_row_for_prompt(row))
"
```
Expected: output includes a line `Language: UK English`.
- [ ] **Step 3.6: Commit Task 3**
```bash
git add backend/media_plan_processor.py
git commit -m "feat(media-plan): extract and surface 'language' column
Adds case-insensitive 'language' header recognition to the media-plan
column mapper. When present in a matched row, the value flows into
the prompt context block as 'Language: <value>'. Absent → no line
(graceful no-op for clients whose plans don't include the field).
Enables multilingual support for hp_copy_review (Cycle 1) and any
future check that wants to reason about asset language."
```
---
### Task 4: HP client config + profile
Promote HP from placeholder. Create the `hp_copy_review` profile JSON. Ensure the profile loader picks it up.
**Files:**
- Modify: `backend/client_config.py`
- Create: `backend/profiles/hp_copy_review.json`
- [ ] **Step 4.1: Update the HP entry in `CLIENT_PROFILES`**
Edit `backend/client_config.py`. Replace the existing `'hp'` entry with:
```python
'hp': {
'name': 'HP',
'profiles': ['hp_copy_review', 'static_general', 'video_general'],
'display_name': 'HP',
'description': 'HP marketing copy QC graded against canonical Source Messaging',
'default_profile': 'hp_copy_review',
},
```
- [ ] **Step 4.2: Create the profile JSON**
Create `backend/profiles/hp_copy_review.json`:
```json
{
"name": "HP Copy Review",
"description": "Marketing copy graded against canonical HP Source Messaging",
"mode": "asset",
"visibility": "client_specific",
"visible_to_clients": ["hp"],
"checks": {
"hp_copy_review": {
"weight": 10.0,
"llm": "gemini",
"enabled": true
}
}
}
```
- [ ] **Step 4.3: Verify client config**
```bash
cd backend && python3 -c "
from client_config import get_client_profiles, get_default_profile
print('profiles:', get_client_profiles('hp'))
print('default:', get_default_profile('hp'))
"
```
Expected:
```
profiles: ['hp_copy_review', 'static_general', 'video_general']
default: hp_copy_review
```
- [ ] **Step 4.4: Verify profile load**
```bash
cd backend && python3 -c "
from profile_config import get_profile
p = get_profile('hp_copy_review')
print('name:', p.name)
print('mode:', getattr(p, 'mode', 'asset'))
print('enabled checks:', p.get_enabled_checks())
print('strict_grade:', getattr(p, 'strict_grade', False))
"
```
Expected: profile loads, mode is `asset`, enabled_checks lists `['hp_copy_review']`. (The check itself doesn't exist yet → may emit a "Loaded profile" line but the check loader fails for `hp_copy_review`; that's expected at this task boundary.)
- [ ] **Step 4.5: Commit Task 4**
```bash
git add backend/client_config.py backend/profiles/hp_copy_review.json
git commit -m "feat(hp): promote HP client + add hp_copy_review profile
HP is no longer a placeholder. The client gets a new hp_copy_review
profile (single weighted check, client-specific visibility) as its
default, plus the generic static_general and video_general profiles
it already had visibility into."
```
---
### Task 5: `hp_copy_review` check module
The actual QC check — single LLM call per asset.
**Files:**
- Create: `backend/visual_qc_apps/hp_copy_review/__init__.py` (empty)
- Create: `backend/visual_qc_apps/hp_copy_review/app.py`
- [ ] **Step 5.1: Read `flask_app_template.py` and a comparable real check**
```bash
ls backend/flask_app_template.py 2>/dev/null && wc -l backend/flask_app_template.py
ls backend/visual_qc_apps/boots_tandc_wording/app.py && wc -l backend/visual_qc_apps/boots_tandc_wording/app.py
```
Read both. The boots_tandc_wording check is the closest analogue (copy-against-reference, image input, structured findings output). Use it as the implementation pattern.
- [ ] **Step 5.2: Create the directory + empty `__init__.py`**
```bash
mkdir -p backend/visual_qc_apps/hp_copy_review
touch backend/visual_qc_apps/hp_copy_review/__init__.py
```
- [ ] **Step 5.3: Create `app.py` with the standard check skeleton**
Copy the structure from `boots_tandc_wording/app.py` (Flask blueprint pattern, `run_check(...)` or equivalent entry point, the reference-asset summary injection, the media-plan context injection). Adapt the prompt to:
```
You are a copy reviewer for HP marketing materials. Compare the
marketing asset against the canonical Source Messaging provided.
PRODUCT LANGUAGE: <from media plan, or "not specified">
CANONICAL SOURCE MESSAGING:
<one or more Markdown summaries from attached Excel reference assets,
concatenated, each preceded by a header like "--- File: messi_core.xlsx ---">
MARKETING ASSET:
[image]
For every claim, headline, body line, disclaimer, footnote, spec
call-out, and brand mention visible on the asset, evaluate against
the canonical source. Output a JSON object with this shape:
{
"score": <number 0-10>,
"summary": "<one-paragraph headline finding>",
"findings": [
{
"priority": "high" | "medium" | "low",
"category": "ksp" | "disclaimer" | "spec" | "variant" | "tone" | "brand-name" | "language" | "other",
"quote": "<exact quote from the asset>",
"issue": "<what's wrong>",
"suggested_fix": "<what it should say, citing the canonical source>",
"source_reference": "<where in source messaging this comes from>"
}
]
}
Rules:
- If no Source Messaging is attached, return {"score": 0, "summary": "No HP Source Messaging reference was attached — cannot grade copy without a canonical source.", "findings": []}
- High-priority findings weight the score most heavily
- Empty findings (clean asset) is a valid result; score 9-10
- Return ONLY the JSON object, no surrounding prose
```
- [ ] **Step 5.4: Implement response parsing**
The check function must parse the LLM's JSON response. Handle:
- Valid JSON with the expected shape → extract `score`, `summary`, `findings` and return them in the standard check result shape (`{'score': ..., 'response': ..., 'findings': ...}` — match the existing checks' return shape so the report renderer can pick up `findings` later).
- Malformed JSON → score 0, response = raw LLM text, findings = `[]`, summary = "Failed to parse check output".
- The `findings` array gets attached to the check result dict so the report renderer in Task 6 can detect it.
- [ ] **Step 5.5: Syntax + import + profile load verification**
```bash
cd backend && python3 -m py_compile visual_qc_apps/hp_copy_review/app.py && python3 -c "
from profile_config import get_profile
from app_discovery import discover_qc_apps # or the actual loader path
apps = discover_qc_apps()
print('hp_copy_review in apps:', 'hp_copy_review' in apps)
p = get_profile('hp_copy_review')
print('profile enabled checks:', p.get_enabled_checks())
"
```
Expected: `hp_copy_review in apps: True`, profile lists it as enabled.
- [ ] **Step 5.6: Dry-run prompt-assembly test (no LLM call)**
```bash
cd backend && python3 -c "
# Smoke test: instantiate the check, call its prompt-assembly helper
# (without invoking Gemini) with mock reference summaries and a mock
# media-plan row including language='UK English'. Confirm output prompt
# contains 'Language: UK English', 'CANONICAL SOURCE MESSAGING', and
# the findings-format instructions.
from visual_qc_apps.hp_copy_review.app import build_prompt # adjust if named differently
prompt = build_prompt(
reference_summaries=[('messi_core.xlsx', '## Product\nHP OmniDesk Mini Core')],
media_plan_row={'language': 'UK English', 'country': 'UK'},
)
assert 'Language: UK English' in prompt, 'language missing from prompt'
assert 'CANONICAL SOURCE MESSAGING' in prompt
assert 'findings' in prompt
print('prompt assembly OK')
"
```
- [ ] **Step 5.7: Commit Task 5**
```bash
git add backend/visual_qc_apps/hp_copy_review/
git commit -m "feat(hp_copy_review): single-check LLM grader against Source Messaging
Single Gemini call per asset. Prompt assembles attached Source
Messaging summaries + media-plan language context + the asset image.
Returns structured JSON with score, summary, and a findings array
(priority, category, quote, issue, suggested fix, source reference).
Empty findings = clean asset; missing reference → score 0 with a
clear message rather than running blind."
```
---
### Task 6: Findings-table rendering in both HTML report generators
Both HTML generators need a small case to render `findings` as a table.
**Files:**
- Modify: `backend/api_server.py` (`generate_html_content` and `generate_comprehensive_html_report` — see [[feedback_multi_html_generators]])
- [ ] **Step 6.1: Locate both generators**
```bash
grep -n "def generate_html_content\|def generate_comprehensive_html_report" backend/api_server.py
```
Expected: two function definitions, both render check results to HTML.
- [ ] **Step 6.2: Identify where each renders a per-check response**
In each generator, find the section that renders the per-check `response` text (often inside an expandable `<details>` block). The new case goes *before* that fallback: if the check's result dict contains a `findings` array, render the table; else fall back to the text response.
- [ ] **Step 6.3: Implement a shared helper `_render_findings_table(findings)`**
Add near the existing CSS/render helpers in `api_server.py`:
```python
def _render_findings_table(findings):
"""Render an hp_copy_review-style findings array as an HTML table."""
if not findings:
return '<p class="muted">No findings — copy is clean.</p>'
rows = []
for f in findings:
priority = f.get('priority', 'low')
pri_class = {'high': 'score-bad', 'medium': 'score-ok', 'low': 'score-good'}.get(priority, 'muted')
rows.append(
f'<tr>'
f'<td><span class="score-pill {pri_class}">{priority.upper()}</span></td>'
f'<td><code>{f.get("category", "")}</code></td>'
f'<td><code>{(f.get("quote") or "")[:200]}</code></td>'
f'<td>{f.get("issue", "")}</td>'
f'<td>{f.get("suggested_fix", "")}</td>'
f'<td class="muted">{f.get("source_reference", "")}</td>'
f'</tr>'
)
return (
'<table class="findings-table"><thead><tr>'
'<th>Priority</th><th>Category</th><th>Quote</th>'
'<th>Issue</th><th>Suggested fix</th><th>Source</th>'
'</tr></thead><tbody>'
+ ''.join(rows) + '</tbody></table>'
)
```
- [ ] **Step 6.4: Wire the helper into both generators**
In each generator, where it renders a check's response block, add (in pseudocode):
```python
findings = check_result.get('findings')
if findings is not None:
body_html += _render_findings_table(findings)
else:
body_html += render_response_text(check_result.get('response', ''))
```
Match the exact variable names and HTML scaffolding used by each generator.
- [ ] **Step 6.5: Syntax verification + manual HTML inspection**
```bash
cd backend && python3 -m py_compile api_server.py && python3 -c "
from api_server import _render_findings_table
html = _render_findings_table([
{'priority': 'high', 'category': 'disclaimer', 'quote': 'must be linked to a boots.com account', 'issue': 'Wrong account type', 'suggested_fix': '...linked to an Advantage Card account...', 'source_reference': 'Messi Core T&Cs row 18'},
{'priority': 'low', 'category': 'tone', 'quote': 'a tiny powerhouse', 'issue': 'Not approved phrasing', 'suggested_fix': 'Use \"compact and capable\"', 'source_reference': 'KSP 1'},
])
with open('/tmp/findings_preview.html', 'w') as f:
f.write('<!DOCTYPE html><html><head><style>table{border-collapse:collapse}td,th{border:1px solid #ddd;padding:6px}</style></head><body>' + html + '</body></html>')
print('wrote /tmp/findings_preview.html')
"
open /tmp/findings_preview.html
```
Eye-check: table renders, priority pills coloured correctly, quote in monospace.
- [ ] **Step 6.6: Commit Task 6**
```bash
git add backend/api_server.py
git commit -m "feat(report): render hp_copy_review findings as a structured table
Both HTML report generators (generate_html_content and
generate_comprehensive_html_report) get a small case: when a check
result has a 'findings' array, render it as a priority-coloured
table with quote/issue/suggested-fix/source columns instead of the
default response-text block. Fallback to text rendering when
findings is absent — every existing check is unaffected."
```
---
### Task 7: Dev smoke test + deployment
End-to-end verification on the dev server with real assets and real LLM calls.
- [ ] **Step 7.1: Run the full pre-session checklist**
```bash
cd backend && python3 -c "
from profile_config import get_profile
for p in ['general_check','static_general','unilever_key_visual','unilever_packaging','diageo_key_visual','diageo_packaging','loreal_static','amazon_static','boots_static','boots_ppack','inclusive_accessibility','video_general','axa_policy_document','axa_policy_document_diff','axa_accessibility','hp_copy_review']:
prof = get_profile(p)
print(f'OK {prof.name} ({len(prof.get_enabled_checks())} checks)')
"
cd .. && python3 -m py_compile backend/**/*.py
python3 -c "
import sys; sys.path.insert(0, 'backend')
import api_server, llm_config, profile_config, jwt_validator, auth_middleware
print('all imports OK')
"
```
Expected: every profile (including new `hp_copy_review`) loads; all syntax + imports green.
- [ ] **Step 7.2: Push the feature branch**
```bash
git push -u origin feature/hp-cycle-1-onboarding
```
- [ ] **Step 7.3: Open PR `feature/hp-cycle-1-onboarding → develop` via Bitbucket**
URL: `https://bitbucket.org/zlalani/ai_qc/pull-requests/new?source=feature/hp-cycle-1-onboarding&t=1`. Destination = `develop`. Title: "feat(hp): cycle 1 — hp_copy_review check + excel processor + language field". Body links to the spec.
- [ ] **Step 7.4: Merge PR, then deploy to dev**
SSH to `optical-production-dev`:
```bash
cd /opt/ai_qc
backend/scripts/deploy.sh dev
sudo journalctl -u ai-qc -n 30 --no-pager
```
Confirm clean deploy + service healthy.
- [ ] **Step 7.5: Manually upload Source Messaging fixtures to dev**
Via the UI at `optical-dev.oliver.solutions/ai_qc/`:
1. Sign in (admin).
2. Settings → Reference Assets (for client `hp`).
3. Upload `messi_core.xlsx`, `messi_mainstream.xlsx`, `gaston.xlsx` (from the original locations under `~/Desktop/AI_QC_Bitbucket/hp/recieved_docs/excel/`).
4. Watch the status badge — each should flip to `ready` within 60s. If degraded, inspect the saved `_summary.md` to see what failed.
- [ ] **Step 7.6: Run an HP marketing asset through `hp_copy_review`**
1. From the HP team, get a real Messi or Gaston marketing image (PNG/JPG).
2. Open a QC session as client `hp`, profile `hp_copy_review`.
3. Attach the relevant Source Messaging reference (e.g. `messi_core` for a Core-targeted asset).
4. (Optional) Upload a media plan with a `language` column populated so the prompt picks it up.
5. Run the QC.
6. Inspect the report: confirm findings table renders, priority pills coloured correctly, quotes are real text from the asset.
If output structure is wrong (e.g. LLM returns prose instead of JSON), iterate the prompt — small follow-up PRs against `develop`.
- [ ] **Step 7.7: PR `develop → main` and tag**
Once HP-side smoke testing confirms the output is useful:
```bash
# (laptop) sync local develop, open PR via Bitbucket UI:
# https://bitbucket.org/zlalani/ai_qc/pull-requests/new?source=develop&dest=main&t=1
```
After merge:
```bash
git fetch origin
git tag -a v1.4.0 origin/main -m "v1.4.0 — HP onboarding cycle 1 (hp_copy_review check + excel processor + media-plan language field)"
git push origin v1.4.0
git rev-parse v1.4.0^{commit}; git rev-parse origin/main # should match
```
- [ ] **Step 7.8: Deploy v1.4.0 to prod**
SSH to `optical-production`:
```bash
cd /opt/ai_qc
backend/scripts/deploy.sh prod v1.4.0
sudo journalctl -u ai-qc -n 30 --no-pager
```
No env-file backup dance needed — env files are now permanently gitignored (since v1.3.2).
- [ ] **Step 7.9: Upload Source Messaging files to prod**
Repeat Step 7.5 against the prod UI (`optical-prod.oliver.solutions/ai_qc/`). Source Messaging files are *per-server* — they live in `brand_guidelines/files/` on disk and don't sync between dev and prod.
- [ ] **Step 7.10: Hand off to HP team**
Confirm HP has access (via per-user client access — `Nick.Viljoen@oliver.agency` adds the HP team's email(s)). Walk them through:
1. Where to upload Source Messaging files (Settings → Reference Assets).
2. How to run a QC (select hp_copy_review, attach the right reference).
3. What feedback to send back (findings missed, findings wrong, output format suggestions).
Collect first-week feedback before opening Cycle 2 (Word/PPT processor).