Subtle bug: the substitution code modified html_str (logo placeholder,
IG protocol-relative rewrites) but the response was sent from
html_bytes, which was only re-encoded INSIDE the print-CSS-splice
branch. New decks contain 'doc-view' from the skeleton update, so
that branch was skipped — and the response sent the unmodified
original bytes from the artefact, including the literal
{{LOGO_DATA_URI}} placeholder.
Fix: pull the html_bytes = html_str.encode() out of the conditional
so every modification (logo substitution, IG https rewrite, optional
print-CSS splice) is always reflected in the response.
|
||
|---|---|---|
| backend | ||
| Build-Information | ||
| deploy | ||
| frontend | ||
| infra | ||
| scripts | ||
| .env.example | ||
| .gitignore | ||
| CLAUDE.md | ||
| docker-compose.override.yml | ||
| docker-compose.yml | ||
| README.md | ||
| Social_Media_Reporting_Tool_Dev_Feedback Round2.docx | ||
| STATUS.md | ||
Social MI/BI — Multi-Agent Reporting
A web app that reproduces and extends the agency's four-agent social-media reporting workflow (Data Mapper → Input Collector → Strategy Writer → Report Builder). It turns monthly Meltwater / Meta exports into a standalone slide-deck HTML report — with a React wizard, Postgres-backed persistence, HITL checkpoints, optional YOLO auto-advance, Apify embed hydration, and clone-for-next-month.
Architecture
┌─────────────────────────────────────────────────────┐
│ React + Vite SPA │
│ │
│ Login → Dashboard → Report Wizard (stepper) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │Data Mppr │ │Input Col │ │Strategy │ │Report │ │
│ │Panel │→│Panel │→│Panel │→│Panel │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────┘ │
│ │ │ │ │ │
│ └── CheckpointPanel (approve / edit / reject)│
└────┬──────────────┬─────────────────────┬───────────┘
│ REST /api │ WebSocket /ws │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ FastAPI (uvicorn) │
│ │
│ routes_auth routes_brands routes_reports │
│ routes_stages routes_uploads routes_embeds │
│ ws_stage_events (Redis pub/sub fan-out) │
└────┬──────────────┬──────────────┬──────────────────┘
│ arq enqueue │ SQLAlchemy │ boto3
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Redis │ │ Postgres │ │ MinIO │
│ (queue + │ │ (state + │ │ (uploads │
│ pub/sub)│ │ artefacts│ │ +reports)│
└────┬─────┘ └──────────┘ └──────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ arq Worker │
│ │
│ StateMachine.run_stage(report_id, stage_no) │
│ │ │
│ ├── Stage 1: Data Mapper (deterministic; │
│ │ openpyxl → Meltwater IG + TikTok adapters)│
│ │ │
│ ├── Stage 2: Input Collector (deterministic; │
│ │ validates logos / embeds / insights) │
│ │ │
│ ├── Stage 3: Strategy Writer (Anthropic │
│ │ Opus 4.7, streaming, prompt caching) │
│ │ │
│ └── Stage 4: Report Builder (Anthropic │
│ Sonnet 4.6, streaming, emits slide deck) │
│ │
│ After each stage: │
│ • stage_artefacts row written (JSON or HTML) │
│ • top_posts / learnings projected for UI │
│ • token usage + cost_cents recorded │
│ • WS events published (stage_status / │
│ awaiting_review / stage_complete / tokens) │
│ • HITL mode → pause at awaiting_review │
│ • YOLO mode → auto-advance to next stage │
└─────────────────────────────────────────────────────┘
External services: Phase-2 add-ons:
• Anthropic (Claude) • Apify (TikTok + IG scrapers)
• Google Fonts (Inter) • Playwright (HTML → PPTX)
• Chart.js CDN • MSFT SSO (Phase 3)
The wizard is the control surface; the worker does the work; every artefact is persisted so you can pause, edit, resume, or clone any report at any stage. The WebSocket stream is eventually-consistent sugar on top — the source of truth is always Postgres.
Quick start (dev)
Prereqs: Docker Desktop, ~4 GB free, no services already on 5173 / 8088 / 5433 / 6380 / 9002 / 9003.
# One-time
cp .env.example .env
openssl rand -hex 32 # paste into SESSION_SECRET in .env
# Optional but required for stages 3–4: set ANTHROPIC_API_KEY
# Optional for phase-2 embed hydration: set APIFY_API_TOKEN
# Bring up storage + seed schema
docker compose up -d db redis minio minio-init
docker compose run --rm api alembic upgrade head
# Start the app
docker compose up -d api worker web
Open:
- Web UI — http://localhost:5173
- API docs — http://localhost:8088/docs
- MinIO console — http://localhost:9003
Log in with BOOTSTRAP_ADMIN_EMAIL / BOOTSTRAP_ADMIN_PASSWORD from .env
(the first admin is created on first boot only).
End-to-end smoke against real Cif data
# From inside the api container
docker compose exec -e PYTHONPATH=/app api python scripts/run_e2e_cif.py
Writes rendered/cif-e2e-{report_id}.html — a ~55 KB standalone slide deck.
Ports
Defaults; override in .env if conflicted.
| Service | Host port |
|---|---|
| API | 8088 |
| Web (Vite) | 5173 |
| Postgres | 5433 |
| Redis | 6380 |
| MinIO S3 | 9002 |
| MinIO Console | 9003 |
Repository layout
SOCIAL-MI-BI/
├── docker-compose.yml ─ dev stack (db / redis / minio / api / worker / web)
├── docker-compose.override.yml ─ dev-only hot-reload + volume mounts
├── infra/ ─ Dockerfiles, nginx conf, minio init
├── Build-Information/ ─ spec inputs (READ-ONLY)
│ • Agents/Agent 1..4.txt ─ canonical agent prompts (verbatim)
│ • Workflow/*.html ─ kick-off workflow description
│ • Sample Data Input/*.xlsx ─ real Cif Meltwater exports (fixtures)
├── backend/
│ ├── pyproject.toml
│ ├── alembic/ ─ schema migrations
│ └── app/
│ ├── main.py ─ FastAPI factory + lifespan bootstrap
│ ├── config.py ─ pydantic-settings (env-driven)
│ ├── db.py ─ async SQLAlchemy session scope
│ ├── models/ ─ ORM (Brand, ReportRun, StageExecution,
│ │ StageArtefact, TopPost, Learning, …)
│ ├── schemas/ ─ pydantic DTOs (auth, brand, report)
│ ├── api/ ─ routers_* + ws_stage_events
│ ├── services/
│ │ • storage.py ─ MinIO / S3 wrapper (dual endpoint)
│ │ • anthropic_client.py ─ Claude SDK; always-stream, cache-aware
│ │ • cost_tracker.py ─ cents-per-MTok rate table
│ │ • apify_client.py ─ TikTok + IG actor calls, 30d cache
│ │ • pubsub.py ─ Redis pub/sub for WS events
│ │ • auth.py ─ bcrypt + itsdangerous sessions
│ │ • bootstrap.py ─ seed platforms, benchmarks, admin, buckets
│ ├── agents/
│ │ • prompts/ ─ VERBATIM copies of Agents/Agent 1..4.txt
│ │ • base.py ─ StageBase / StageInput / StageOutput
│ │ • data_mapper.py ─ (deterministic) runs ingest adapters
│ │ • input_collector.py ─ (deterministic) validates embeds/insights
│ │ • strategy_writer.py ─ Opus 4.7, British-English guard
│ │ • report_builder.py ─ Sonnet 4.6, deck-skeleton-guided HTML
│ ├── ingest/ ─ openpyxl Meltwater IG / TikTok adapters
│ ├── workflow/
│ │ • state_machine.py ─ per-report stage lifecycle
│ │ • queue.py ─ arq WorkerSettings
│ │ • events.py ─ stage event names / publisher
│ ├── report/
│ │ • deck_skeleton.html ─ style reference (slide deck)
│ │ • pptx_converter.py ─ phase-2 HTML→PPTX via Playwright
│ └── tests/ ─ unit / integration / e2e (real Cif data)
├── frontend/
│ ├── vite.config.ts ─ proxy /api → api container
│ └── src/
│ ├── pages/ ─ Dashboard, NewReport, ReportWizard, …
│ ├── components/
│ │ • wizard/ ─ Stage*Panel + CheckpointPanel
│ │ • EventLog.tsx ─ coalesced token stream viewer
│ │ • SlidePreview.tsx ─ iframe over the export endpoint
│ ├── lib/ ─ api.ts, ws.ts, auth.ts, toast.ts
│ └── styles/globals.css ─ Tailwind + amber (#FFC407) palette
├── scripts/
│ ├── run_e2e_cif.py ─ one-shot end-to-end harness
│ └── seed_benchmarks.py ─ idempotent benchmark seed
├── rendered/ ─ generated decks land here (git-ignored)
├── STATUS.md ─ current delivery status / verification log
└── README.md ─ this file
Stage pipeline
| Stage | Agent | Model | Streaming | Deterministic? |
|---|---|---|---|---|
| 1 | Data Mapper | — | — | Yes (Python + openpyxl) |
| 2 | Input Collector | — | — | Yes (validation heuristics) |
| 3 | Strategy Writer | claude-opus-4-7 |
yes | No — LLM narrative |
| 4 | Report Builder | claude-sonnet-4-6 |
yes | No — LLM HTML slide deck |
Stage 4 receives the deck skeleton (backend/app/report/deck_skeleton.html,
modelled on the Monks.Flow reference deck) as a cache-controlled system
block, plus the brand, data-mapper JSON, input-collector JSON, and
strategy-writer learnings. The emitted HTML always contains a Chart.js CDN
tag, .slide.active transitions, a keyboard handler for arrow-key
navigation, and --brand-primary / --brand-secondary / --accent CSS
variables seeded from the brand row.
HITL mode pauses after every stage at status awaiting_review. The Check-
pointPanel exposes approve / reject-with-feedback / edit-inline / re-run.
YOLO mode auto-advances end-to-end but still streams events.
Resume-from-stage: POST /reports/{id}/stages/{n}/run on a completed stage
re-enqueues stage n; stages n+1..4 cascade because the state machine
rebuilds StageInput from the latest artefacts.
Clone-for-next-month: POST /reports/{id}/clone copies brand, up-front
context, platform scope; sets reporting_month to parent + 1 month; fresh
uploads are then required.
Data retention / GDPR
DELETE /api/v1/reports/{id} (admin-only) cascades every row tied to the
report (stage executions, artefacts, top posts, embeds, insights, learnings,
uploaded files) and purges the corresponding MinIO keys. audit_events rows
are kept but their payloads are scrubbed.
Verification
docker compose exec -T api pytest tests/unit tests/integration -v
Runs:
tests/unit/test_ingest.py— the two Meltwater adapters against the real Cif Instagram and TikTok Excels. Assertsposts_current = 95 / 93,posts_prev = 76 / 77, engagement-rate deltas, top-3 URLs by platform.tests/unit/test_data_mapper_agent.py— full Stage 1 StageOutput against the Agent 1 JSON contract.tests/integration/test_api_smoke.py—/healthz+ login +/auth/me.
tests/e2e/test_cif_pipeline.py runs all four stages (skipped unless
RUN_LIVE_E2E=1 and ANTHROPIC_API_KEY are set).
Deploying to the shared server
A one-shot deploy script lives in deploy/. It assumes the optical-dev
pattern (Apache terminates TLS and fronts every app on this host; each app
gets a subpath and a free port pair).
# On the server, as the app owner (NOT root):
sudo mkdir -p /opt/social-mi-bi && sudo chown "$USER:" /opt/social-mi-bi
git clone git@bitbucket.org:zlalani/social-mi-bi.git /opt/social-mi-bi
cd /opt/social-mi-bi
cp .env.example .env
# Fill in ANTHROPIC_API_KEY and APIFY_API_TOKEN — the script picks them up.
bash deploy/deploy-local.sh
What it does:
- Preflight — checks docker / compose / apache2ctl / rsync / ss, bails on
missing deps or a missing
ANTHROPIC_API_KEY. - Port scan — picks two free TCP ports in
8300–8399(API + MinIO) and preserves the pair across re-runs. - Frontend build — runs
npm ci && npm run buildinside anode:20container withVITE_BASE=/social-mi-bi/so the SPA is built for the subpath. No Node on the host. - rsync --delete the built SPA to
/var/www/html/social-mi-bi/. - Generate / preserve secrets —
SESSION_SECRET,POSTGRES_PASSWORD,MINIO_ROOT_PASSWORD, andBOOTSTRAP_ADMIN_PASSWORDare generated on first run and read back on subsequent runs so volumes / DBs aren't invalidated. Anthropic + Apify keys come from the repo.env(so refreshing them there and re-running deploy picks them up). docker compose -f deploy/docker-compose.prod.yml up -d --build— api and minio bind to127.0.0.1; db and redis have no host ports at all; alembic migrations run automatically on API container boot.- Apache fragment + vhost include — renders
deploy/apache-social-mi-bi.conffrom the template, backs up the vhost, inserts a singleIncludeline inside<VirtualHost>,apache2ctl configtest,systemctl reload apache2. - Verify — waits on the loopback API, then curls
https://<host>/social-mi-bi/api/v1/healthzto confirm the public path.
It's idempotent — safe to re-run after a git pull:
cd /opt/social-mi-bi && git pull && bash deploy/deploy-local.sh
The Apache fragment proxies three paths into the app:
| Path | Target | Why |
|---|---|---|
/social-mi-bi/api/... |
127.0.0.1:<api_port>/api/... |
REST API |
/social-mi-bi/api/v1/ws/... (wss) |
127.0.0.1:<api_port>/api/v1/ws/... |
WebSocket stage events |
/social-mi-bi/s3/... |
127.0.0.1:<minio_port>/... |
MinIO presigned uploads/gets |
/social-mi-bi/* (static) |
/var/www/html/social-mi-bi/* |
SPA — SPA-fallback to index |
MinIO's MINIO_SERVER_URL is set to the public /s3 prefix so SigV4
signatures match when the browser PUTs through Apache. DB and Redis never
leave the docker bridge network.
Phase status
- Phase 1 — MVP (done). HITL wizard, Postgres persistence, 4 agents proven end-to-end against real Cif data, HTML slide-deck output, brand-theming via CSS variables, admin/user auth.
- Phase 2 — in flight. YOLO mode live in the state machine; Apify
client wired (manual-paste fallback today); PPTX converter scaffolded
behind
pip install .[phase2]+playwright install chromium; clone and resume-from-stage working; MSFT SSO deferred. - Phase 3 — planned. MSFT SSO (Entra ID / MSAL), auto-brand discovery (search + logo scrape + dominant-colour extraction), cost dashboard.