23 KiB
BTG Rackham Video Sales Coach — Development Plan (MVP, Python Backend, Single‑Pass)
Author: @michaelclervi (OML Inc)
Date: 2025-10-27
Status: MVP build plan — single-pass Gemini video → structured JSON; React/Vite frontend; FastAPI backend.
0) Locked Scope (from answers)
- Phase 1 (MVP): Video upload only (no live coaching, no calendar/Slack/Teams/Zoom integrations).
- Backend calls Gemini 2.5 Pro (video) once to produce one deterministic JSON (includes both diarized transcript and Rackham analysis).
- LLM-only: no custom ML/embeddings/classical NLP.
- Deterministic output: strict JSON Schema + validation + up to 2 retries.
- UI: upload → progress → dashboard → PDF export → 30/60/90 day history.
- Storage: local MongoDB, local disk for video. TTL 90 days.
- Infra: Docker Compose; Apache reverse proxy.
- Uploads: drag & drop chunked (no resume), max 2 GB.
- Concurrency: single-job, FIFO queue.
- Privacy: trust-based “internal meetings only”.
- Branding: placeholder BTG theme (refine later).
- QA: best-effort analysis; must export PDF.
1) Product Objectives
- Convert an uploaded meeting video into actionable coaching based on Rackham’s communication behaviors.
- Provide metrics and coaching: Pull:Push, speaking time, clarity / impact / inclusion, filler rates, question quality, building, timestamped quotes, and 2–3 action items per participant.
- Deliver interactive dashboard + PDF.
2) Architecture Overview (Python Backend, Single‑Pass)
[Browser UI (React + Vite + TS)]
└── Upload (≤2GB, chunked) → /api/uploads
Poll /api/jobs/:id → status/progress
[Backend (FastAPI, Python 3.11+)]
├── /uploads: chunk assembly → /data/videos/{jobId}.mp4
├── /jobs: create, start, status
├── Single-concurrency Worker (async FIFO)
│ └─ Single-Pass: Gemini(video) → Unified JSON (transcript + analysis)
│ ↳ jsonschema validate → retry up to 2x
├── /analyses: JSON + PDF endpoints (WeasyPrint)
└── MongoDB (Motor): jobs, analyses (TTL 90d)
[Apache] http :80/443 → Reverse proxy
[Docker Compose] frontend | backend | mongo | (apache optional)
Volumes
/data/videos— raw uploads- Mongo volume — DB data
/tmp/chunks— assembly, ephemeral
3) Directory Structure
repo/
frontend/ # React + Vite (TS)
backend/ # FastAPI (Python)
app/
api/
uploads.py
jobs.py
analyses.py
core/
config.py
deps.py
services/
gemini.py
pdf.py
queue.py
storage.py
validation.py
history.py
schemas/
video_analysis.schema.json
models/
job.py
analysis.py
main.py
tests/
test_validation.py
test_routes.py
infra/
docker-compose.yml
apache/btg.conf
README.md
4) Environment & Dependencies
backend/.env.example
PORT=8080
ENV=production
MONGO_URL=mongodb://mongo:27017/btg
MONGO_DB=btg
UPLOAD_DIR=/data/videos
TMP_DIR=/tmp/chunks
GEMINI_API_KEY=replace-me
GEMINI_MODEL=gemini-2.5-pro
JWT_SECRET=change-me
backend/requirements.txt
fastapi==0.115.0
uvicorn[standard]==0.30.6
python-multipart==0.0.9
pydantic==2.9.2
motor==3.6.0
jsonschema==4.23.0
weasyprint==62.3
jinja2==3.1.4
python-dotenv==1.0.1
httpx==0.27.2
google-generativeai==0.7.2
WeasyPrint note: container needs
libcairo2,pango,gdk-pixbuf(see Dockerfile).
5) Rackham Framework (MVP Adaptation)
Pull behaviors
open_question,closed_question,testing_understanding,summarizing,bringing_in
Push behaviors
proposing(flagbuild_onwhen extending prior idea),giving_info_fact,giving_info_opinion,disagreeing,defending_attacking,shutting_out_interrupting
Appropriate Push timing
- High Urgency: deadlines ≤ 48h, crisis/outage/compliance.
- Low Rejection Risk: agreement/low effort/explicit support.
- If Push lacks both within ±60s → flag as inappropriate and suggest Pull alternatives.
6) Metrics & Formulas
- Behavior Frequency per speaker; normalized per minute spoken.
- Pull:Push = Pull / Push (exclude neutral filler turns).
- Speaking Time %: diarized voice time per speaker / total.
- Filler Words per min: {"um","uh","er","ah","you know","like"(discourse),"kind of","sort of","basically","actually"(empty)}.
- Question Quality:
open / (open + closed) - Clarity/Impact/Inclusion (0–100): composite indices (as guidance in prompt).
- Transitions: detect Pull→Push within rolling 60s; mark timeline.
- Alerts defaults: Pull:Push < 0.7; Filler > 5/min for ≥2 min; Question Quality < 0.4; any
defending_attacking; top speaker > 60%.
7) Single-Pass Unified JSON Schema
Save as backend/app/schemas/video_analysis.schema.json.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["version", "transcript", "analysis"],
"properties": {
"version": { "type": "string", "const": "v1" },
"transcript": {
"type": "object",
"required": ["duration_sec", "speakers", "utterances"],
"properties": {
"duration_sec": { "type": "number" },
"speakers": {
"type": "array",
"items": { "type": "string", "pattern": "^S[0-9]+$" }
},
"utterances": {
"type": "array",
"items": {
"type": "object",
"required": ["speaker", "start_sec", "end_sec", "text"],
"properties": {
"speaker": { "type": "string", "pattern": "^S[0-9]+$" },
"start_sec": { "type": "number" },
"end_sec": { "type": "number" },
"text": { "type": "string" }
},
"additionalProperties": false
}
}
},
"additionalProperties": false
},
"analysis": {
"type": "object",
"required": ["participants", "metrics", "timeline", "feedback"],
"properties": {
"participants": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "behavior_counts", "speaking_time_sec", "pull_push", "filler_per_min", "question_quality", "scores", "action_items"],
"properties": {
"id": { "type": "string", "pattern": "^S[0-9]+$" },
"behavior_counts": {
"type": "object",
"properties": {
"open_question": { "type": "integer", "minimum": 0 },
"closed_question": { "type": "integer", "minimum": 0 },
"testing_understanding": { "type": "integer", "minimum": 0 },
"summarizing": { "type": "integer", "minimum": 0 },
"bringing_in": { "type": "integer", "minimum": 0 },
"proposing": { "type": "integer", "minimum": 0 },
"giving_info_fact": { "type": "integer", "minimum": 0 },
"giving_info_opinion": { "type": "integer", "minimum": 0 },
"disagreeing": { "type": "integer", "minimum": 0 },
"defending_attacking": { "type": "integer", "minimum": 0 },
"shutting_out_interrupting": { "type": "integer", "minimum": 0 }
},
"additionalProperties": false
},
"speaking_time_sec": { "type": "number", "minimum": 0 },
"pull_push": {
"type": "object",
"required": ["pull_count", "push_count", "ratio"],
"properties": {
"pull_count": { "type": "integer", "minimum": 0 },
"push_count": { "type": "integer", "minimum": 0 },
"ratio": { "type": "number", "minimum": 0 }
}
},
"filler_per_min": { "type": "number", "minimum": 0 },
"question_quality": {
"type": "object",
"required": ["open", "closed", "ratio"],
"properties": {
"open": { "type": "integer", "minimum": 0 },
"closed": { "type": "integer", "minimum": 0 },
"ratio": { "type": "number", "minimum": 0 }
}
},
"scores": {
"type": "object",
"required": ["clarity", "impact", "inclusion"],
"properties": {
"clarity": { "type": "number", "minimum": 0, "maximum": 100 },
"impact": { "type": "number", "minimum": 0, "maximum": 100 },
"inclusion": { "type": "number", "minimum": 0, "maximum": 100 }
}
},
"action_items": {
"type": "array",
"minItems": 2,
"maxItems": 3,
"items": {
"type": "object",
"required": ["title", "why", "how", "example_utterance_id"],
"properties": {
"title": { "type": "string" },
"why": { "type": "string" },
"how": { "type": "string" },
"example_utterance_id": { "type": "integer", "minimum": 0 }
},
"additionalProperties": false
}
}
},
"additionalProperties": false
}
},
"timeline": {
"type": "array",
"items": {
"type": "object",
"required": ["utterance_id", "speaker", "behavior", "start_sec", "end_sec"],
"properties": {
"utterance_id": { "type": "integer", "minimum": 0 },
"speaker": { "type": "string", "pattern": "^S[0-9]+$" },
"behavior": { "type": "string" },
"start_sec": { "type": "number" },
"end_sec": { "type": "number" },
"proposal": {
"type": "object",
"required": ["build_on", "appropriate_push"],
"properties": {
"build_on": { "type": "boolean" },
"appropriate_push": { "type": "boolean" }
},
"additionalProperties": false
}
},
"additionalProperties": false
}
},
"metrics": {
"type": "object",
"required": ["speaking_time", "pull_push_transitions", "alerts"],
"properties": {
"speaking_time": {
"type": "array",
"items": {
"type": "object",
"required": ["speaker", "seconds"],
"properties": {
"speaker": { "type": "string", "pattern": "^S[0-9]+$" },
"seconds": { "type": "number" }
},
"additionalProperties": false
}
},
"pull_push_transitions": {
"type": "array",
"items": {
"type": "object",
"required": ["time_sec", "from", "to", "speaker"],
"properties": {
"time_sec": { "type": "number" },
"from": { "type": "string" },
"to": { "type": "string" },
"speaker": { "type": "string", "pattern": "^S[0-9]+$" }
},
"additionalProperties": false
}
},
"alerts": {
"type": "array",
"items": {
"type": "object",
"required": ["type", "severity", "message", "utterance_id"],
"properties": {
"type": { "type": "string" },
"severity": { "type": "string", "enum": ["info", "warn", "critical"] },
"message": { "type": "string" },
"utterance_id": { "type": "integer", "minimum": 0 }
},
"additionalProperties": false
}
}
},
"additionalProperties": false
},
"feedback": {
"type": "object",
"required": ["overall", "by_participant"],
"properties": {
"overall": {
"type": "object",
"required": ["strengths", "opportunities"],
"properties": {
"strengths": { "type": "array", "items": { "type": "string" } },
"opportunities": { "type": "array", "items": { "type": "string" } }
},
"additionalProperties": false
},
"by_participant": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "notes"],
"properties": {
"id": { "type": "string", "pattern": "^S[0-9]+$" },
"notes": { "type": "array", "items": { "type": "string" } }
},
"additionalProperties": false
}
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
8) Prompt Template (Single‑Pass, Copy/Paste)
System / Instruction
You are an expert meeting analyzer using Rackham’s behavior framework.
Given a meeting video, return ONLY a valid JSON object that matches the provided JSON Schema exactly.
Tasks:
- Transcribe with speaker diarization (S1, S2, ...) and utterance-level timestamps (start_sec, end_sec).
- Classify each utterance into one Rackham behavior.
- Compute per-participant Pull:Push, speaking time, filler rate, question quality, and scores (clarity/impact/inclusion).
- Detect Pull→Push transitions; for proposals, set proposal.build_on and proposal.appropriate_push
(appropriate_push = true only if High Urgency AND Low Rejection Risk occur within ±60 seconds).
- Provide 2–3 action_items per participant with one real example_utterance_id each.
- No extra keys. No prose. Output MUST validate against the JSON Schema.
Behavior enum
"open_question","closed_question","testing_understanding","summarizing","bringing_in",
"proposing","giving_info_fact","giving_info_opinion","disagreeing","defending_attacking","shutting_out_interrupting"
Heuristic reminders
- High Urgency: “deadline”, “by EOD”, “outage”, “compliance”, “penalty”, “within 24 hours”
- Low Rejection Risk: “agree”, “sounds good”, “quick”, “easy”, “I’ll help”
User content
- Attach video bytes/URL (per Gemini video API).
- Paste the Unified JSON Schema from §7.
Tiny Example (shape only)
{
"version":"v1",
"transcript":{"duration_sec":120,"speakers":["S1","S2"],"utterances":[
{"speaker":"S1","start_sec":3.2,"end_sec":6.5,"text":"How are we qualifying inbound leads today?"},
{"speaker":"S2","start_sec":7.0,"end_sec":11.3,"text":"Um, mostly manually in the CRM."}
]},
"analysis":{
"participants":[
{
"id":"S1",
"behavior_counts":{"open_question":1,"closed_question":0,"testing_understanding":0,"summarizing":0,"bringing_in":0,"proposing":0,"giving_info_fact":0,"giving_info_opinion":0,"disagreeing":0,"defending_attacking":0,"shutting_out_interrupting":0},
"speaking_time_sec":6.5,
"pull_push":{"pull_count":1,"push_count":0,"ratio":1.0},
"filler_per_min":0,
"question_quality":{"open":1,"closed":0,"ratio":1.0},
"scores":{"clarity":90,"impact":70,"inclusion":60},
"action_items":[{"title":"Ask follow-ups","why":"Increase Pull","how":"Use 'walk me through'","example_utterance_id":1}]
}
],
"timeline":[{"utterance_id":0,"speaker":"S1","behavior":"open_question","start_sec":3.2,"end_sec":6.5}],
"metrics":{"speaking_time":[{"speaker":"S1","seconds":6.5},{"speaker":"S2","seconds":4.3}],"pull_push_transitions":[],"alerts":[]},
"feedback":{"overall":{"strengths":["Good discovery"],"opportunities":["Invite S2 more"]},"by_participant":[{"id":"S1","notes":["Good opener; build with follow-ups."]}]}
}
}
9) Backend Implementation (FastAPI)
Routes
POST /api/uploads/init -> { jobId, chunkSize }
POST /api/uploads/chunk -> index, jobId (binary body) 204
POST /api/uploads/finish -> { jobId }
POST /api/jobs/:jobId/start -> enqueue/run single-pass analysis
GET /api/jobs/:jobId -> { status, progress, error? }
GET /api/analyses/:jobId -> unified JSON
GET /api/analyses/:jobId/pdf -> PDF
GET /api/history?range=30|60|90 -> summaries
Minimal Sketches
# app/main.py
from fastapi import FastAPI
from app.api.uploads import router as uploads_router
from app.api.jobs import router as jobs_router
from app.api.analyses import router as analyses_router
app = FastAPI(title="BTG Rackham Coach API")
app.include_router(uploads_router, prefix="/api/uploads")
app.include_router(jobs_router, prefix="/api/jobs")
app.include_router(analyses_router, prefix="/api/analyses")
# app/services/gemini.py (single-pass pseudo)
import google.generativeai as genai
import json, os
from jsonschema import Draft7Validator
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
MODEL = os.getenv("GEMINI_MODEL", "gemini-2.5-pro")
async def analyze_video_singlepass(video_path: str, unified_schema: dict, fewshot_example: dict | None = None) -> dict:
# 1) Upload video (SDK or signed URL)
# 2) Send system instructions + unified schema + optional tiny example
# 3) Parse JSON; validate; if invalid, send first error back for 1–2 fix attempts
# 4) Return parsed dict
...
# app/services/validation.py
from jsonschema import Draft7Validator
def first_error(validator: Draft7Validator, data: dict) -> str | None:
for err in validator.iter_errors(data):
return f"{err.message} at {'/'.join(map(str, err.path))}"
return None
# app/services/pdf.py
from weasyprint import HTML
from jinja2 import Environment, PackageLoader, select_autoescape
env = Environment(loader=PackageLoader("app"), autoescape=select_autoescape())
def render_pdf(report: dict) -> bytes:
tpl = env.get_template("templates/report.html")
return HTML(string=tpl.render(data=report)).write_pdf()
10) Frontend (React + Vite + TS)
- Upload: drag & drop, 2 GB cap, chunk progress.
- Progress: stepper “Upload → Analyze → Render”.
- Results:
- Behavior table per participant
- Pull:Push gauges (overall + per participant)
- Speaking-time donut
- Timeline with Pull→Push markers & inappropriate push flags
- Score cards: Clarity / Impact / Inclusion
- Coaching: 2–3 action items per participant with jump-to timestamp
- Download PDF button
- History: 30/60/90 day filters.
- A11y: WCAG AA, keyboard, ARIA.
BTG placeholder theme
:root{
--btg-primary:#2B6CB0; --btg-accent:#38B2AC; --btg-warn:#DD6B20;
--btg-fg:#E5EEF9; --btg-bg:#0B1016;
}
11) PDF Export (WeasyPrint)
Sections: Cover → Overview → Metrics → Behavior Breakdown → Timeline Highlights → Per-Participant Coaching.
Footer: “Internal use only. 90-day retention.”
12) Docker & Apache
infra/docker-compose.yml
version: "3.9"
services:
frontend:
build: ../frontend
ports: ["3000:3000"]
environment: [ "VITE_API_BASE=/api" ]
depends_on: [ backend ]
backend:
build: ../backend
ports: ["8080:8080"]
environment:
- MONGO_URL=mongodb://mongo:27017/btg
- GEMINI_API_KEY=${GEMINI_API_KEY}
- GEMINI_MODEL=gemini-2.5-pro
- UPLOAD_DIR=/data/videos
- TMP_DIR=/tmp/chunks
volumes:
- videos:/data/videos
- tmp:/tmp/chunks
depends_on: [ mongo ]
mongo:
image: mongo:7
ports: [ "27017:27017" ]
volumes:
- mongodata:/data/db
volumes: { mongodata: {}, videos: {}, tmp: {} }
infra/apache/btg.conf
<VirtualHost *:80>
ServerName btg.example.com
ProxyPreserveHost On
ProxyPass /api http://backend:8080/
ProxyPassReverse /api http://backend:8080/
ProxyPass / http://frontend:3000/
ProxyPassReverse / http://frontend:3000/
</VirtualHost>
backend/Dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y libpango-1.0-0 libpangoft2-1.0-0 libcairo2 libjpeg62-turbo libharfbuzz0b libgdk-pixbuf-2.0-0 fonts-dejavu-core && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend /app
ENV PYTHONUNBUFFERED=1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
13) Security & Privacy (MVP)
- Optional local auth (email+password), HTTPOnly cookies.
- Validate upload size/type; sanitize filenames; prevent path traversal.
- CORS locked to site origin.
- TTL purge DB and filesystem.
- UI reminder: “Internal meetings only; user owns data.”
14) Testing (Lightweight)
- jsonschema validation tests for unified schema.
- Golden fixtures: few demo videos → stable unified JSON.
- PDF smoke (WeasyPrint).
- Performance smoke: 60‑min video processes under target constraints (single-concurrency).
15) Suggested Build Order
Week 1 — Infra + Uploads + Models
Week 2 — Single‑pass Gemini call + Validation + Progress UI
Week 3 — Dashboard visualizations + Coaching panes
Week 4 — PDF, History, TTL cleanup, Accessibility polish
16) Tiny Unified JSON Examples
Minimal
{
"version":"v1",
"transcript":{"duration_sec":90,"speakers":["S1","S2"],"utterances":[
{"speaker":"S1","start_sec":2.1,"end_sec":5.8,"text":"What outcomes matter most this quarter?"},
{"speaker":"S2","start_sec":6.2,"end_sec":10.1,"text":"Uh, revenue from self-serve signups mainly."}
]},
"analysis":{
"participants":[
{"id":"S1","behavior_counts":{"open_question":1,"closed_question":0,"testing_understanding":0,"summarizing":0,"bringing_in":0,"proposing":0,"giving_info_fact":0,"giving_info_opinion":0,"disagreeing":0,"defending_attacking":0,"shutting_out_interrupting":0},"speaking_time_sec":3.7,"pull_push":{"pull_count":1,"push_count":0,"ratio":1.0},"filler_per_min":0,"question_quality":{"open":1,"closed":0,"ratio":1.0},"scores":{"clarity":92,"impact":70,"inclusion":60},"action_items":[{"title":"Add follow-ups","why":"Increase Pull depth","how":"Use 'walk me through'","example_utterance_id":1}]}
],
"timeline":[{"utterance_id":0,"speaker":"S1","behavior":"open_question","start_sec":2.1,"end_sec":5.8}],
"metrics":{"speaking_time":[{"speaker":"S1","seconds":3.7},{"speaker":"S2","seconds":3.9}],"pull_push_transitions":[],"alerts":[]},
"feedback":{"overall":{"strengths":["Strong discovery"],"opportunities":["Invite S2 explicitly"]},"by_participant":[{"id":"S1","notes":["Good opener; build with follow-ups."]}]}
}
}
17) Future (Phase 2 — not in MVP)
- Resume uploads; multi-concurrency; estimates.
- Prosody/interruptions, slide detection, non‑verbal cues.
- Team rollups with anonymization; trends.
- Calendar/meeting-platform integrations.
- Fine‑tuned classifiers if needed.
End of Document