No description
Find a file
Dave Porter 33c6d1acc8 Fix: re-encode html_bytes unconditionally at end of /export
Subtle bug: the substitution code modified html_str (logo placeholder,
IG protocol-relative rewrites) but the response was sent from
html_bytes, which was only re-encoded INSIDE the print-CSS-splice
branch. New decks contain 'doc-view' from the skeleton update, so
that branch was skipped — and the response sent the unmodified
original bytes from the artefact, including the literal
{{LOGO_DATA_URI}} placeholder.

Fix: pull the html_bytes = html_str.encode() out of the conditional
so every modification (logo substitution, IG https rewrite, optional
print-CSS splice) is always reflected in the response.
2026-05-05 19:54:23 -04:00
backend Fix: re-encode html_bytes unconditionally at end of /export 2026-05-05 19:54:23 -04:00
Build-Information Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
deploy Revert subpath default — actual deployment is /social-mi-bi/ 2026-05-05 19:40:15 -04:00
frontend Fix logo URL for subpath deploys; drop auth on logo endpoint 2026-05-05 17:04:09 -04:00
infra Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
scripts scripts/reset_password.py — one-command admin password reset 2026-05-05 19:18:19 -04:00
.env.example Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
.gitignore Merge: combine project .gitignore with Bitbucket defaults 2026-04-17 17:08:25 -04:00
CLAUDE.md Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
docker-compose.override.yml Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
docker-compose.yml Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00
README.md Add deploy/ artifacts for subpath hosting behind Apache 2026-04-17 21:06:57 -04:00
Social_Media_Reporting_Tool_Dev_Feedback Round2.docx Round 2 QA fixes — bugs + selected feature requests 2026-05-05 16:26:05 -04:00
STATUS.md Initial commit: social MI/BI multi-agent reporting tool 2026-04-17 17:07:44 -04:00

Social MI/BI — Multi-Agent Reporting

A web app that reproduces and extends the agency's four-agent social-media reporting workflow (Data Mapper → Input Collector → Strategy Writer → Report Builder). It turns monthly Meltwater / Meta exports into a standalone slide-deck HTML report — with a React wizard, Postgres-backed persistence, HITL checkpoints, optional YOLO auto-advance, Apify embed hydration, and clone-for-next-month.


Architecture

                  ┌─────────────────────────────────────────────────────┐
                  │                    React + Vite SPA                 │
                  │                                                     │
                  │   Login → Dashboard → Report Wizard (stepper)       │
                  │                                                     │
                  │   ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
                  │   │Data Mppr │ │Input Col │ │Strategy  │ │Report  │ │
                  │   │Panel     │→│Panel     │→│Panel     │→│Panel   │ │
                  │   └──────────┘ └──────────┘ └──────────┘ └────────┘ │
                  │        │            │            │           │     │
                  │        └── CheckpointPanel (approve / edit / reject)│
                  └────┬──────────────┬─────────────────────┬───────────┘
                       │   REST /api  │  WebSocket /ws      │
                       ▼              ▼                     ▼
                  ┌─────────────────────────────────────────────────────┐
                  │                 FastAPI (uvicorn)                   │
                  │                                                     │
                  │   routes_auth   routes_brands   routes_reports      │
                  │   routes_stages routes_uploads  routes_embeds       │
                  │   ws_stage_events (Redis pub/sub fan-out)           │
                  └────┬──────────────┬──────────────┬──────────────────┘
                       │  arq enqueue │  SQLAlchemy  │  boto3
                       ▼              ▼              ▼
                  ┌──────────┐  ┌──────────┐   ┌──────────┐
                  │  Redis   │  │ Postgres │   │  MinIO   │
                  │ (queue + │  │ (state + │   │ (uploads │
                  │  pub/sub)│  │ artefacts│   │ +reports)│
                  └────┬─────┘  └──────────┘   └──────────┘
                       │
                       ▼
                  ┌─────────────────────────────────────────────────────┐
                  │                  arq Worker                         │
                  │                                                     │
                  │   StateMachine.run_stage(report_id, stage_no)       │
                  │       │                                             │
                  │       ├── Stage 1: Data Mapper   (deterministic;    │
                  │       │   openpyxl → Meltwater IG + TikTok adapters)│
                  │       │                                             │
                  │       ├── Stage 2: Input Collector (deterministic;  │
                  │       │   validates logos / embeds / insights)      │
                  │       │                                             │
                  │       ├── Stage 3: Strategy Writer (Anthropic       │
                  │       │   Opus 4.7, streaming, prompt caching)      │
                  │       │                                             │
                  │       └── Stage 4: Report Builder (Anthropic        │
                  │           Sonnet 4.6, streaming, emits slide deck)  │
                  │                                                     │
                  │   After each stage:                                 │
                  │     • stage_artefacts row written (JSON or HTML)    │
                  │     • top_posts / learnings projected for UI        │
                  │     • token usage + cost_cents recorded             │
                  │     • WS events published (stage_status /           │
                  │       awaiting_review / stage_complete / tokens)    │
                  │     • HITL mode → pause at awaiting_review          │
                  │     • YOLO mode → auto-advance to next stage        │
                  └─────────────────────────────────────────────────────┘

    External services:                Phase-2 add-ons:
      • Anthropic (Claude)              • Apify (TikTok + IG scrapers)
      • Google Fonts (Inter)            • Playwright (HTML → PPTX)
      • Chart.js CDN                    • MSFT SSO (Phase 3)

The wizard is the control surface; the worker does the work; every artefact is persisted so you can pause, edit, resume, or clone any report at any stage. The WebSocket stream is eventually-consistent sugar on top — the source of truth is always Postgres.


Quick start (dev)

Prereqs: Docker Desktop, ~4 GB free, no services already on 5173 / 8088 / 5433 / 6380 / 9002 / 9003.

# One-time
cp .env.example .env
openssl rand -hex 32         # paste into SESSION_SECRET in .env
# Optional but required for stages 34: set ANTHROPIC_API_KEY
# Optional for phase-2 embed hydration: set APIFY_API_TOKEN

# Bring up storage + seed schema
docker compose up -d db redis minio minio-init
docker compose run --rm api alembic upgrade head

# Start the app
docker compose up -d api worker web

Open:

Log in with BOOTSTRAP_ADMIN_EMAIL / BOOTSTRAP_ADMIN_PASSWORD from .env (the first admin is created on first boot only).

End-to-end smoke against real Cif data

# From inside the api container
docker compose exec -e PYTHONPATH=/app api python scripts/run_e2e_cif.py

Writes rendered/cif-e2e-{report_id}.html — a ~55 KB standalone slide deck.


Ports

Defaults; override in .env if conflicted.

Service Host port
API 8088
Web (Vite) 5173
Postgres 5433
Redis 6380
MinIO S3 9002
MinIO Console 9003

Repository layout

SOCIAL-MI-BI/
├── docker-compose.yml            ─ dev stack (db / redis / minio / api / worker / web)
├── docker-compose.override.yml   ─ dev-only hot-reload + volume mounts
├── infra/                        ─ Dockerfiles, nginx conf, minio init
├── Build-Information/            ─ spec inputs (READ-ONLY)
│     • Agents/Agent 1..4.txt     ─ canonical agent prompts (verbatim)
│     • Workflow/*.html           ─ kick-off workflow description
│     • Sample Data Input/*.xlsx  ─ real Cif Meltwater exports (fixtures)
├── backend/
│   ├── pyproject.toml
│   ├── alembic/                  ─ schema migrations
│   └── app/
│     ├── main.py                 ─ FastAPI factory + lifespan bootstrap
│     ├── config.py               ─ pydantic-settings (env-driven)
│     ├── db.py                   ─ async SQLAlchemy session scope
│     ├── models/                 ─ ORM (Brand, ReportRun, StageExecution,
│     │                             StageArtefact, TopPost, Learning, …)
│     ├── schemas/                ─ pydantic DTOs (auth, brand, report)
│     ├── api/                    ─ routers_* + ws_stage_events
│     ├── services/
│     │     • storage.py          ─ MinIO / S3 wrapper (dual endpoint)
│     │     • anthropic_client.py ─ Claude SDK; always-stream, cache-aware
│     │     • cost_tracker.py     ─ cents-per-MTok rate table
│     │     • apify_client.py     ─ TikTok + IG actor calls, 30d cache
│     │     • pubsub.py           ─ Redis pub/sub for WS events
│     │     • auth.py             ─ bcrypt + itsdangerous sessions
│     │     • bootstrap.py        ─ seed platforms, benchmarks, admin, buckets
│     ├── agents/
│     │     • prompts/            ─ VERBATIM copies of Agents/Agent 1..4.txt
│     │     • base.py             ─ StageBase / StageInput / StageOutput
│     │     • data_mapper.py      ─ (deterministic) runs ingest adapters
│     │     • input_collector.py  ─ (deterministic) validates embeds/insights
│     │     • strategy_writer.py  ─ Opus 4.7, British-English guard
│     │     • report_builder.py   ─ Sonnet 4.6, deck-skeleton-guided HTML
│     ├── ingest/                 ─ openpyxl Meltwater IG / TikTok adapters
│     ├── workflow/
│     │     • state_machine.py    ─ per-report stage lifecycle
│     │     • queue.py            ─ arq WorkerSettings
│     │     • events.py           ─ stage event names / publisher
│     ├── report/
│     │     • deck_skeleton.html  ─ style reference (slide deck)
│     │     • pptx_converter.py   ─ phase-2 HTML→PPTX via Playwright
│     └── tests/                  ─ unit / integration / e2e (real Cif data)
├── frontend/
│   ├── vite.config.ts            ─ proxy /api → api container
│   └── src/
│     ├── pages/                  ─ Dashboard, NewReport, ReportWizard, …
│     ├── components/
│     │     • wizard/             ─ Stage*Panel + CheckpointPanel
│     │     • EventLog.tsx        ─ coalesced token stream viewer
│     │     • SlidePreview.tsx    ─ iframe over the export endpoint
│     ├── lib/                    ─ api.ts, ws.ts, auth.ts, toast.ts
│     └── styles/globals.css      ─ Tailwind + amber (#FFC407) palette
├── scripts/
│   ├── run_e2e_cif.py            ─ one-shot end-to-end harness
│   └── seed_benchmarks.py        ─ idempotent benchmark seed
├── rendered/                     ─ generated decks land here (git-ignored)
├── STATUS.md                     ─ current delivery status / verification log
└── README.md                     ─ this file

Stage pipeline

Stage Agent Model Streaming Deterministic?
1 Data Mapper Yes (Python + openpyxl)
2 Input Collector Yes (validation heuristics)
3 Strategy Writer claude-opus-4-7 yes No — LLM narrative
4 Report Builder claude-sonnet-4-6 yes No — LLM HTML slide deck

Stage 4 receives the deck skeleton (backend/app/report/deck_skeleton.html, modelled on the Monks.Flow reference deck) as a cache-controlled system block, plus the brand, data-mapper JSON, input-collector JSON, and strategy-writer learnings. The emitted HTML always contains a Chart.js CDN tag, .slide.active transitions, a keyboard handler for arrow-key navigation, and --brand-primary / --brand-secondary / --accent CSS variables seeded from the brand row.

HITL mode pauses after every stage at status awaiting_review. The Check- pointPanel exposes approve / reject-with-feedback / edit-inline / re-run. YOLO mode auto-advances end-to-end but still streams events.

Resume-from-stage: POST /reports/{id}/stages/{n}/run on a completed stage re-enqueues stage n; stages n+1..4 cascade because the state machine rebuilds StageInput from the latest artefacts.

Clone-for-next-month: POST /reports/{id}/clone copies brand, up-front context, platform scope; sets reporting_month to parent + 1 month; fresh uploads are then required.


Data retention / GDPR

DELETE /api/v1/reports/{id} (admin-only) cascades every row tied to the report (stage executions, artefacts, top posts, embeds, insights, learnings, uploaded files) and purges the corresponding MinIO keys. audit_events rows are kept but their payloads are scrubbed.


Verification

docker compose exec -T api pytest tests/unit tests/integration -v

Runs:

  • tests/unit/test_ingest.py — the two Meltwater adapters against the real Cif Instagram and TikTok Excels. Asserts posts_current = 95 / 93, posts_prev = 76 / 77, engagement-rate deltas, top-3 URLs by platform.
  • tests/unit/test_data_mapper_agent.py — full Stage 1 StageOutput against the Agent 1 JSON contract.
  • tests/integration/test_api_smoke.py/healthz + login + /auth/me.

tests/e2e/test_cif_pipeline.py runs all four stages (skipped unless RUN_LIVE_E2E=1 and ANTHROPIC_API_KEY are set).


Deploying to the shared server

A one-shot deploy script lives in deploy/. It assumes the optical-dev pattern (Apache terminates TLS and fronts every app on this host; each app gets a subpath and a free port pair).

# On the server, as the app owner (NOT root):
sudo mkdir -p /opt/social-mi-bi && sudo chown "$USER:" /opt/social-mi-bi
git clone git@bitbucket.org:zlalani/social-mi-bi.git /opt/social-mi-bi
cd /opt/social-mi-bi
cp .env.example .env
# Fill in ANTHROPIC_API_KEY and APIFY_API_TOKEN — the script picks them up.

bash deploy/deploy-local.sh

What it does:

  1. Preflight — checks docker / compose / apache2ctl / rsync / ss, bails on missing deps or a missing ANTHROPIC_API_KEY.
  2. Port scan — picks two free TCP ports in 83008399 (API + MinIO) and preserves the pair across re-runs.
  3. Frontend build — runs npm ci && npm run build inside a node:20 container with VITE_BASE=/social-mi-bi/ so the SPA is built for the subpath. No Node on the host.
  4. rsync --delete the built SPA to /var/www/html/social-mi-bi/.
  5. Generate / preserve secretsSESSION_SECRET, POSTGRES_PASSWORD, MINIO_ROOT_PASSWORD, and BOOTSTRAP_ADMIN_PASSWORD are generated on first run and read back on subsequent runs so volumes / DBs aren't invalidated. Anthropic + Apify keys come from the repo .env (so refreshing them there and re-running deploy picks them up).
  6. docker compose -f deploy/docker-compose.prod.yml up -d --build — api and minio bind to 127.0.0.1; db and redis have no host ports at all; alembic migrations run automatically on API container boot.
  7. Apache fragment + vhost include — renders deploy/apache-social-mi-bi.conf from the template, backs up the vhost, inserts a single Include line inside <VirtualHost>, apache2ctl configtest, systemctl reload apache2.
  8. Verify — waits on the loopback API, then curls https://<host>/social-mi-bi/api/v1/healthz to confirm the public path.

It's idempotent — safe to re-run after a git pull:

cd /opt/social-mi-bi && git pull && bash deploy/deploy-local.sh

The Apache fragment proxies three paths into the app:

Path Target Why
/social-mi-bi/api/... 127.0.0.1:<api_port>/api/... REST API
/social-mi-bi/api/v1/ws/... (wss) 127.0.0.1:<api_port>/api/v1/ws/... WebSocket stage events
/social-mi-bi/s3/... 127.0.0.1:<minio_port>/... MinIO presigned uploads/gets
/social-mi-bi/* (static) /var/www/html/social-mi-bi/* SPA — SPA-fallback to index

MinIO's MINIO_SERVER_URL is set to the public /s3 prefix so SigV4 signatures match when the browser PUTs through Apache. DB and Redis never leave the docker bridge network.

Phase status

  • Phase 1 — MVP (done). HITL wizard, Postgres persistence, 4 agents proven end-to-end against real Cif data, HTML slide-deck output, brand-theming via CSS variables, admin/user auth.
  • Phase 2 — in flight. YOLO mode live in the state machine; Apify client wired (manual-paste fallback today); PPTX converter scaffolded behind pip install .[phase2] + playwright install chromium; clone and resume-from-stage working; MSFT SSO deferred.
  • Phase 3 — planned. MSFT SSO (Entra ID / MSAL), auto-brand discovery (search + logo scrape + dominant-colour extraction), cost dashboard.