On server restart, stale active jobs are automatically resumed rather
than failed. Docs already parsed in a prior run are skipped (resume from
cache), docs stuck at 'parsing' are reset to 'pending' and re-parsed.
- Repository: add get_all_stale_active_jobs() and reset_stuck_parsing_docs()
- Service: skip already-parsed docs in _parse_doc(), reset stuck docs on start
- Main: recover stale jobs via asyncio.create_task() in lifespan startup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates all display labels (PDF report, campaign page, Knowledge Base card, analytics, status dashboard, checks overview) and aligns internal agent name in backend. Adds migration 010 to update the knowledge base display_name in production DB.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Now that REST polling removes the 30s GCP LB timeout constraint,
gemini-3.1-pro-preview is restored as primary and gemini-3-flash-preview
is used only when Pro fails or times out.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
POST /api/analyze submits an analysis job and returns job_id instantly.
GET /api/analyze/{job_id} returns progress + result; frontend polls every 2s.
Analysis runs as asyncio.create_task in the background — each HTTP request
completes in milliseconds, well within the 30s GCP Load Balancer limit.
- Add backend/app/services/job_store.py: in-memory AnalysisJob store with
30-min TTL cleanup
- Add backend/app/api/analysis_routes.py: POST + GET /api/analyze endpoints
with full analysis pipeline (hash check, DB persistence, PDF pages, etc.)
- Remove backend/app/websocket/: handlers.py, manager.py, __init__.py
- Update backend/app/main.py: wire analysis_router, store analysis_service
in app.state, drop all WebSocket imports and endpoint
- Update frontend/services/geminiService.ts: replace WS with fetch+poll;
function signatures unchanged so App.tsx / WIPReviewer.tsx need no edits
- Remove VITE_BACKEND_WS_URL from vite.config.ts, deploy.sh, .env.deploy.example
- Update cloudrun.yaml: remove WebSocket-specific session affinity annotation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gemini-3.1-pro-preview takes ~25s per call, hitting the GCP load
balancer's 30s hard timeout before analysis completes. Flash model
returns in ~5-8s, fitting comfortably within the limit. Pro model
kept as fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 25s heartbeat ping from backend to prevent Apache/proxy idle-timeout
killing the connection during 1-3 min analysis runs
- Handle heartbeat silently in both analyzeProof and analyzeWIPProof frontend handlers
- Run PDF rasterization via asyncio.to_thread so heartbeats aren't blocked
- Wrap analyze_proof with asyncio.wait_for(timeout=300) for a hard 5-min cap
- Log dropped send_message calls in ConnectionManager instead of swallowing silently
- cloudrun.yaml: add sessionAffinity, startup probe, raise containerConcurrency 4→10,
document DISABLE_AUTH option
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- knowledge_base_service.py: wrap Gemini distillation call in try/except
to fall back to fallback_client/fallback_model if primary times out,
matching the fallback behaviour in GeminiService._generate_content()
- models.py: fix SpecVersion.source_document_ids ORM type annotation from
Mapped[Optional[dict]] to Mapped[Optional[list]] — the field stores a
JSON array of document ID strings, not an object
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace self.gemini.client with self.gemini.primary_client on line 295 of
knowledge_base_service.py. GeminiService only exposes primary_client and
fallback_client — there is no client attribute. This caused all processing
jobs to fail at the distillation step, which is also why Version History
was always blank (no SpecVersion records were ever created).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add LLAMA_CLOUD_BASE_URL config option so the LlamaCloud regional
endpoint can be set without code changes (fixes 401/region errors
on production); pass it through to AsyncLlamaCloud client init
- Document LLAMA_CLOUD_BASE_URL in .env.deploy.example with EU endpoint
- Copy BAR-ModComms-logo-v5.png to frontend/public
- Sidebar: update logo reference v4 → v5
- PDF header: update logo v4 → v5, wrap in black (#000) band for
legibility, remove duplicate "Oliver" wordmark
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend: thread on_fallback callback through analysis chain
(gemini_service → agents → analysis_service → handlers). The handler
sends a 'model_fallback' WebSocket message exactly once per analysis
when the primary model is unavailable.
Frontend: handle 'model_fallback' WS message and show a dismissible
yellow toast at the bottom of the screen with an 8-second auto-dismiss.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
google-genai SDK expects http_options 'timeout' in milliseconds.
Passing 45 (seconds) was interpreted as 45ms → ~1s deadline,
which Google API rejected with 400 INVALID_ARGUMENT
'Manually set deadline 1s is too short. Minimum allowed deadline is 10s.'
Primary: 45_000ms (45s), Fallback: 150_000ms (150s)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
asyncio.wait_for cannot reliably cancel SDK-internal HTTP connections.
Replace with two genai.Client instances — one per model — each configured
with http_options={'timeout': N} so the TCP connection is actually torn
down when the deadline is reached.
Primary model: 45s, Fallback model: 150s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Log analysis showed fallback model responses up to 154s under parallel
load. 60s was too aggressive and would cause false timeouts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Primary model (gemini-3.1-pro-preview): 45s timeout
Fallback model (gemini-3-flash-preview): 60s timeout
Without timeouts, the fallback model under high load would wait
indefinitely, causing analysis to hang for 10+ minutes per file.
asyncio.TimeoutError from the primary model is now handled the same
as other exceptions (falls through to fallback).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gemini_service.py: if the primary model (gemini-3.1-pro-preview) is
unavailable or returns a permission error, all three call sites now
automatically retry with gemini-3-flash-preview before propagating failure.
cloudrun.yaml: new Cloud Run service definition that ensures stable
WebSocket operation — 10-minute request timeout (vs 60s default),
2 vCPU / 4Gi RAM for PDF rasterisation, min 1 warm instance to prevent
cold-start disconnects, and GEMINI_API_KEY sourced from Secret Manager
so the service can actually reach the Gemini API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace single-line bullet format with a structured two-part format
(**Issue:** / **Recommendation:**) in all specialist and lead agent
prompts. Update Gemini response schema description to match. Update
frontend formatFeedbackText and formatFeedbackTextForPDF to parse
**bold** markdown and preserve line breaks within multi-line bullets.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous prompts instructed Gemini to "remove redundancy, marketing
fluff, or content not relevant to..." which caused salient details —
especially unusual, granular, or edge-case instructions — to be lost
from spec output. Rewritten all 5 agent prompts (legal, brand_barclays,
brand_barclaycard, channel_best_practices, channel_tech_specs) to:
- Reframe the task as "restructure and organise" rather than "distil
and filter"
- Add a zero-tolerance detail-loss instruction with concrete examples
of unconventional rules that must be preserved
- Explicitly forbid omitting, summarising away, or paraphrasing
specific rules/values/conditions
- Allow merging only exact duplicates while keeping all unique content
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After processing a new knowledge base spec, invalidate_cache() was
clearing the DB spec from the cache without replacing it. The next
analysis would then fall back to static prompts/*.md files instead of
using the newly generated DB spec.
Now invalidate_cache() accepts optional new_spec_content to immediately
populate the DB cache, and knowledge_base_service passes the freshly
distilled spec content so it's available for the next analysis without
a server restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- LlamaParse service now returns a ParseResult dataclass with markdown,
total page count, and a list of failed pages (page number + error)
- Knowledge base service sets status to "partial" (instead of "parsed")
when some pages failed, with a descriptive error listing which pages
failed and why
- Frontend StatusBadge shows "partial parse" in orange for partial status
- Error details are shown inline below the document row for both partial
and error statuses
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle MarkdownPageFailedMarkdownPage objects gracefully by checking for
the markdown attribute with hasattr instead of assuming all pages have
it. Failed pages now log their type and all attributes so the actual
LlamaParse error is visible in logs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parse documents concurrently (up to 10 at a time via semaphore) instead
of serially. Each coroutine uses its own DB session for per-document
status updates, while a shared lock serializes job progress increments
on the main session to avoid session-sharing issues.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All routine MSAL token verification logs now use DEBUG level so they
don't flood the console on every polling request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace deprecated llama-cloud-services package with llama-cloud>=1.0 (API v2).
Use AsyncLlamaCloud client with tier="agentic_plus" for maximum parsing accuracy
on complex layouts, tables, and visual structure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, proof metadata collected during upload was only used for database
persistence. Now it flows through the entire analysis pipeline so agents can
tailor their feedback to the specific channel and format being reviewed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add validation to check MAILGUN_API_URL has a valid protocol prefix
and MAILGUN_API_KEY is set before attempting to make HTTP request.
Returns False gracefully with warning log instead of crashing with
httpx.UnsupportedProtocol error.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a subsequent revision of a proof is uploaded, the analysis now takes
place in context of the previous version's results. The system identifies:
- Resolved issues: fixed in the new revision
- Outstanding issues: still present from previous version
- New issues: introduced in the new revision
Key changes:
- Add resolvedIssues, outstandingIssues, newIssues fields to SubReview
- Add PreviousReviewContext model for passing previous review data
- Update all specialist agents to accept previous_review context
- Extend GeminiService with include_revision_fields parameter
- Add get_latest_version_review() repository method
- Update LeadAgent to synthesize cross-version context in summary
- Fetch previous analysis in WebSocket handler for revisions
First version analysis continues to work exactly as before with revision
fields set to null.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Run all 4 specialist agents (Legal, Brand, Channel Best Practices,
Channel Tech Specs) concurrently instead of sequentially. This reduces
total analysis time to roughly the duration of the slowest agent rather
than the sum of all agent times.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed the AI model used for proof analysis from gemini-2.5-flash
to gemini-3-pro-preview for improved analysis capabilities.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove Tone Agent (tone is now part of Brand specs)
- Split Channel Agent into Channel Best Practices Agent and Channel Tech Specs Agent
- Convert Legal Agent from stub to full Gemini-powered implementation
- Add new prompt files for channel_best_practices.md, channel_tech_specs.md, legal.md
- Update ReferenceDocsService with new methods for loading specs
- Update schemas and analysis service to use new agent structure
- Update all frontend components to use new agent names and properties
- Update mock data in Projects.tsx and Campaigns.tsx
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add brand field to AnalyzeProofOptions interface and WebSocket message
- Pass campaign's brandGuidelines to analyzeProof in App.tsx (upload & retry)
- Extract brand from WebSocket message in handlers.py and pass to analysis
- Update AnalysisService.analyze_proof to accept brand parameter
- Refactor BrandAgent to dynamically select brand spec based on brand param
- Add get_barclays_brand_spec() method to ReferenceDocsService (placeholder)
The brand agent now uses the appropriate specification (Barclaycard spec or
Barclays spec when available) based on the campaign's brandGuidelines setting.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create prompts/brand_barclaycard.md with structured brand guidelines
covering logo, Card Portal, colors, typography, and accessibility
- Update ReferenceDocsService with get_barclaycard_brand_spec() method
- Update BrandAgent to use the new spec instead of raw reference docs
- Spec is ~15KB vs ~293KB of raw docs for more efficient analysis
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit formatting instructions to agent prompts requesting bullet-point
output instead of verbose paragraphs. Update JSON schema descriptions for
feedback and summary fields to enforce concise, outline-style format.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PDFs are now converted to PNG images at 200 DPI before being sent to
Gemini for analysis. This fixes the unreliable iframe-based PDF preview
and ensures all pages are properly analyzed.
- Add PyMuPDF dependency for PDF rasterization
- Create pdf_service.py with rasterize() and get_page_count()
- Update agent interfaces to accept list of images for multi-page support
- Add analyze_with_images() to Gemini service for multi-image analysis
- Return rasterized PDF pages via WebSocket for frontend display
- Add page navigation UI for multi-page PDFs in preview components
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Backend:
- Add email_service.py with Mailgun API integration
- Add SupportEmailRequest schema for email endpoint
- Add Mailgun config settings (API URL, key, from address, support email)
- Update .env.example with Mailgun configuration variables
Frontend:
- Update Login.tsx SupportModal to send emails via /api/support/email
- Update Profile.tsx question form to send emails via apiService
- Add loading states, success/error feedback, and auto-close on success
The support forms on both the login page and profile page now actually
send emails to the support team instead of just showing alerts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change frontend apiTokenRequest scopes from OpenID-only to CLIENT_ID/.default
This makes Azure AD issue tokens with audience = app client ID instead of Graph API
- Add diagnostic logging in backend to show token claims before verification
- Fixes 401 Unauthorized errors on all API calls after login
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change frontend scopes from api://{client_id}/.default to
openid, profile, email for simpler authentication
- Update backend token validation to expect ID token format:
- Audience: client_id (not api://{client_id})
- Issuer: v2.0 endpoint
This avoids requiring Application ID URI setup in Azure AD.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Backend:
- Add PostgreSQL service to docker-compose with health checks
- Add SQLAlchemy async models for all entities (Agency, User, Campaign,
Proof, ProofVersion, FlaggedItem, ResolvedItem, ErrorItem)
- Add Alembic migration framework with initial schema migration
- Add repository layer for CRUD operations
- Add REST API endpoints for campaigns, proofs, and audit items
- Add file storage service for proof uploads
- Update WebSocket handler to optionally persist analysis results
Frontend:
- Add apiService.ts for REST API communication
- Update geminiService.ts to support database persistence options
Deployment:
- Update deploy.sh to handle database migrations (6-step process)
- Update Dockerfile to include alembic configuration
- Add PostgreSQL environment variables to .env templates
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Frontend:
- Add @azure/msal-browser and @azure/msal-react packages
- Create authConfig.ts with MSAL configuration for PKCE flow
- Create authService.ts for token acquisition and user info
- Wrap App with MsalProvider in index.tsx
- Replace dummy login with real MSAL loginPopup() in Login.tsx
- Update App.tsx to use useIsAuthenticated/useMsal hooks
- Update Profile.tsx to display real user data from claims
- Update geminiService.ts to include access_token in WebSocket messages
- Update WIPReviewer.tsx to pass msalInstance for auth
Backend:
- Add python-jose and httpx dependencies for JWT verification
- Create auth_service.py with Azure AD JWKS fetching and token verification
- Create auth.py FastAPI dependency for protected REST endpoints
- Update main.py to verify tokens on WebSocket and protect /info endpoint
- Add AZURE_TENANT_ID, AZURE_CLIENT_ID, DISABLE_AUTH to config
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>