2025-11-03 08:15:51 -06:00

23 KiB

Raw Blame History

BTG Rackham Video Sales Coach — Development Plan (MVP, Python Backend, Single‑Pass)

Author: @michaelclervi (OML Inc)
Date: 2025-10-27
Status: MVP build plan — single-pass Gemini video → structured JSON; React/Vite frontend; FastAPI backend.

0) Locked Scope (from answers)

Phase 1 (MVP): Video upload only (no live coaching, no calendar/Slack/Teams/Zoom integrations).
Backend calls Gemini 2.5 Pro (video) once to produce one deterministic JSON (includes both diarized transcript and Rackham analysis).
LLM-only: no custom ML/embeddings/classical NLP.
Deterministic output: strict JSON Schema + validation + up to 2 retries.
UI: upload → progress → dashboard → PDF export → 30/60/90 day history.
Storage: local MongoDB, local disk for video. TTL 90 days.
Infra: Docker Compose; Apache reverse proxy.
Uploads: drag & drop chunked (no resume), max 2 GB.
Concurrency: single-job, FIFO queue.
Privacy: trust-based “internal meetings only”.
Branding: placeholder BTG theme (refine later).
QA: best-effort analysis; must export PDF.

1) Product Objectives

Convert an uploaded meeting video into actionable coaching based on Rackham’s communication behaviors.
Provide metrics and coaching: Pull:Push, speaking time, clarity / impact / inclusion, filler rates, question quality, building, timestamped quotes, and 2–3 action items per participant.
Deliver interactive dashboard + PDF.

2) Architecture Overview (Python Backend, Single‑Pass)

[Browser UI  (React + Vite + TS)]
  └── Upload (≤2GB, chunked) → /api/uploads
      Poll /api/jobs/:id → status/progress

[Backend (FastAPI, Python 3.11+)]
  ├── /uploads: chunk assembly → /data/videos/{jobId}.mp4
  ├── /jobs: create, start, status
  ├── Single-concurrency Worker (async FIFO)
  │    └─ Single-Pass: Gemini(video) → Unified JSON (transcript + analysis)
  │         ↳ jsonschema validate → retry up to 2x
  ├── /analyses: JSON + PDF endpoints (WeasyPrint)
  └── MongoDB (Motor): jobs, analyses (TTL 90d)

[Apache]  http :80/443  →  Reverse proxy
[Docker Compose] frontend | backend | mongo | (apache optional)

Volumes

/data/videos — raw uploads
Mongo volume — DB data
/tmp/chunks — assembly, ephemeral

3) Directory Structure

repo/
  frontend/                       # React + Vite (TS)
  backend/                        # FastAPI (Python)
    app/
      api/
        uploads.py
        jobs.py
        analyses.py
      core/
        config.py
        deps.py
      services/
        gemini.py
        pdf.py
        queue.py
        storage.py
        validation.py
        history.py
      schemas/
        video_analysis.schema.json
      models/
        job.py
        analysis.py
      main.py
    tests/
      test_validation.py
      test_routes.py
  infra/
    docker-compose.yml
    apache/btg.conf
  README.md

4) Environment & Dependencies

backend/.env.example

PORT=8080
ENV=production

MONGO_URL=mongodb://mongo:27017/btg
MONGO_DB=btg

UPLOAD_DIR=/data/videos
TMP_DIR=/tmp/chunks

GEMINI_API_KEY=replace-me
GEMINI_MODEL=gemini-2.5-pro

JWT_SECRET=change-me

backend/requirements.txt

fastapi==0.115.0
uvicorn[standard]==0.30.6
python-multipart==0.0.9
pydantic==2.9.2
motor==3.6.0
jsonschema==4.23.0
weasyprint==62.3
jinja2==3.1.4
python-dotenv==1.0.1
httpx==0.27.2
google-generativeai==0.7.2

WeasyPrint note: container needs libcairo2, pango, gdk-pixbuf (see Dockerfile).

5) Rackham Framework (MVP Adaptation)

Pull behaviors

open_question, closed_question, testing_understanding, summarizing, bringing_in

Push behaviors

proposing (flag build_on when extending prior idea), giving_info_fact, giving_info_opinion, disagreeing, defending_attacking, shutting_out_interrupting

Appropriate Push timing

High Urgency: deadlines ≤ 48h, crisis/outage/compliance.
Low Rejection Risk: agreement/low effort/explicit support.
If Push lacks both within ±60s → flag as inappropriate and suggest Pull alternatives.

6) Metrics & Formulas

Behavior Frequency per speaker; normalized per minute spoken.
Pull:Push = Pull / Push (exclude neutral filler turns).
Speaking Time %: diarized voice time per speaker / total.
Filler Words per min: {"um","uh","er","ah","you know","like"(discourse),"kind of","sort of","basically","actually"(empty)}.
Question Quality: open / (open + closed)
Clarity/Impact/Inclusion (0–100): composite indices (as guidance in prompt).
Transitions: detect Pull→Push within rolling 60s; mark timeline.
Alerts defaults: Pull:Push < 0.7; Filler > 5/min for ≥2 min; Question Quality < 0.4; any defending_attacking; top speaker > 60%.

7) Single-Pass Unified JSON Schema

Save as backend/app/schemas/video_analysis.schema.json.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["version", "transcript", "analysis"],
  "properties": {
    "version": { "type": "string", "const": "v1" },

    "transcript": {
      "type": "object",
      "required": ["duration_sec", "speakers", "utterances"],
      "properties": {
        "duration_sec": { "type": "number" },
        "speakers": {
          "type": "array",
          "items": { "type": "string", "pattern": "^S[0-9]+$" }
        },
        "utterances": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["speaker", "start_sec", "end_sec", "text"],
            "properties": {
              "speaker": { "type": "string", "pattern": "^S[0-9]+$" },
              "start_sec": { "type": "number" },
              "end_sec": { "type": "number" },
              "text": { "type": "string" }
            },
            "additionalProperties": false
          }
        }
      },
      "additionalProperties": false
    },

    "analysis": {
      "type": "object",
      "required": ["participants", "metrics", "timeline", "feedback"],
      "properties": {
        "participants": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["id", "behavior_counts", "speaking_time_sec", "pull_push", "filler_per_min", "question_quality", "scores", "action_items"],
            "properties": {
              "id": { "type": "string", "pattern": "^S[0-9]+$" },
              "behavior_counts": {
                "type": "object",
                "properties": {
                  "open_question": { "type": "integer", "minimum": 0 },
                  "closed_question": { "type": "integer", "minimum": 0 },
                  "testing_understanding": { "type": "integer", "minimum": 0 },
                  "summarizing": { "type": "integer", "minimum": 0 },
                  "bringing_in": { "type": "integer", "minimum": 0 },
                  "proposing": { "type": "integer", "minimum": 0 },
                  "giving_info_fact": { "type": "integer", "minimum": 0 },
                  "giving_info_opinion": { "type": "integer", "minimum": 0 },
                  "disagreeing": { "type": "integer", "minimum": 0 },
                  "defending_attacking": { "type": "integer", "minimum": 0 },
                  "shutting_out_interrupting": { "type": "integer", "minimum": 0 }
                },
                "additionalProperties": false
              },
              "speaking_time_sec": { "type": "number", "minimum": 0 },
              "pull_push": {
                "type": "object",
                "required": ["pull_count", "push_count", "ratio"],
                "properties": {
                  "pull_count": { "type": "integer", "minimum": 0 },
                  "push_count": { "type": "integer", "minimum": 0 },
                  "ratio": { "type": "number", "minimum": 0 }
                }
              },
              "filler_per_min": { "type": "number", "minimum": 0 },
              "question_quality": {
                "type": "object",
                "required": ["open", "closed", "ratio"],
                "properties": {
                  "open": { "type": "integer", "minimum": 0 },
                  "closed": { "type": "integer", "minimum": 0 },
                  "ratio": { "type": "number", "minimum": 0 }
                }
              },
              "scores": {
                "type": "object",
                "required": ["clarity", "impact", "inclusion"],
                "properties": {
                  "clarity": { "type": "number", "minimum": 0, "maximum": 100 },
                  "impact": { "type": "number", "minimum": 0, "maximum": 100 },
                  "inclusion": { "type": "number", "minimum": 0, "maximum": 100 }
                }
              },
              "action_items": {
                "type": "array",
                "minItems": 2,
                "maxItems": 3,
                "items": {
                  "type": "object",
                  "required": ["title", "why", "how", "example_utterance_id"],
                  "properties": {
                    "title": { "type": "string" },
                    "why": { "type": "string" },
                    "how": { "type": "string" },
                    "example_utterance_id": { "type": "integer", "minimum": 0 }
                  },
                  "additionalProperties": false
                }
              }
            },
            "additionalProperties": false
          }
        },

        "timeline": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["utterance_id", "speaker", "behavior", "start_sec", "end_sec"],
            "properties": {
              "utterance_id": { "type": "integer", "minimum": 0 },
              "speaker": { "type": "string", "pattern": "^S[0-9]+$" },
              "behavior": { "type": "string" },
              "start_sec": { "type": "number" },
              "end_sec": { "type": "number" },
              "proposal": {
                "type": "object",
                "required": ["build_on", "appropriate_push"],
                "properties": {
                  "build_on": { "type": "boolean" },
                  "appropriate_push": { "type": "boolean" }
                },
                "additionalProperties": false
              }
            },
            "additionalProperties": false
          }
        },

        "metrics": {
          "type": "object",
          "required": ["speaking_time", "pull_push_transitions", "alerts"],
          "properties": {
            "speaking_time": {
              "type": "array",
              "items": {
                "type": "object",
                "required": ["speaker", "seconds"],
                "properties": {
                  "speaker": { "type": "string", "pattern": "^S[0-9]+$" },
                  "seconds": { "type": "number" }
                },
                "additionalProperties": false
              }
            },
            "pull_push_transitions": {
              "type": "array",
              "items": {
                "type": "object",
                "required": ["time_sec", "from", "to", "speaker"],
                "properties": {
                  "time_sec": { "type": "number" },
                  "from": { "type": "string" },
                  "to": { "type": "string" },
                  "speaker": { "type": "string", "pattern": "^S[0-9]+$" }
                },
                "additionalProperties": false
              }
            },
            "alerts": {
              "type": "array",
              "items": {
                "type": "object",
                "required": ["type", "severity", "message", "utterance_id"],
                "properties": {
                  "type": { "type": "string" },
                  "severity": { "type": "string", "enum": ["info", "warn", "critical"] },
                  "message": { "type": "string" },
                  "utterance_id": { "type": "integer", "minimum": 0 }
                },
                "additionalProperties": false
              }
            }
          },
          "additionalProperties": false
        },

        "feedback": {
          "type": "object",
          "required": ["overall", "by_participant"],
          "properties": {
            "overall": {
              "type": "object",
              "required": ["strengths", "opportunities"],
              "properties": {
                "strengths": { "type": "array", "items": { "type": "string" } },
                "opportunities": { "type": "array", "items": { "type": "string" } }
              },
              "additionalProperties": false
            },
            "by_participant": {
              "type": "array",
              "items": {
                "type": "object",
                "required": ["id", "notes"],
                "properties": {
                  "id": { "type": "string", "pattern": "^S[0-9]+$" },
                  "notes": { "type": "array", "items": { "type": "string" } }
                },
                "additionalProperties": false
              }
            }
          },
          "additionalProperties": false
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
}

8) Prompt Template (Single‑Pass, Copy/Paste)

System / Instruction

You are an expert meeting analyzer using Rackham’s behavior framework.
Given a meeting video, return ONLY a valid JSON object that matches the provided JSON Schema exactly.
Tasks:
- Transcribe with speaker diarization (S1, S2, ...) and utterance-level timestamps (start_sec, end_sec).
- Classify each utterance into one Rackham behavior.
- Compute per-participant Pull:Push, speaking time, filler rate, question quality, and scores (clarity/impact/inclusion).
- Detect Pull→Push transitions; for proposals, set proposal.build_on and proposal.appropriate_push
  (appropriate_push = true only if High Urgency AND Low Rejection Risk occur within ±60 seconds).
- Provide 2–3 action_items per participant with one real example_utterance_id each.
- No extra keys. No prose. Output MUST validate against the JSON Schema.

Behavior enum

"open_question","closed_question","testing_understanding","summarizing","bringing_in",
"proposing","giving_info_fact","giving_info_opinion","disagreeing","defending_attacking","shutting_out_interrupting"

Heuristic reminders

High Urgency: “deadline”, “by EOD”, “outage”, “compliance”, “penalty”, “within 24 hours”
Low Rejection Risk: “agree”, “sounds good”, “quick”, “easy”, “I’ll help”

User content

Attach video bytes/URL (per Gemini video API).
Paste the Unified JSON Schema from §7.

Tiny Example (shape only)

{
  "version":"v1",
  "transcript":{"duration_sec":120,"speakers":["S1","S2"],"utterances":[
    {"speaker":"S1","start_sec":3.2,"end_sec":6.5,"text":"How are we qualifying inbound leads today?"},
    {"speaker":"S2","start_sec":7.0,"end_sec":11.3,"text":"Um, mostly manually in the CRM."}
  ]},
  "analysis":{
    "participants":[
      {
        "id":"S1",
        "behavior_counts":{"open_question":1,"closed_question":0,"testing_understanding":0,"summarizing":0,"bringing_in":0,"proposing":0,"giving_info_fact":0,"giving_info_opinion":0,"disagreeing":0,"defending_attacking":0,"shutting_out_interrupting":0},
        "speaking_time_sec":6.5,
        "pull_push":{"pull_count":1,"push_count":0,"ratio":1.0},
        "filler_per_min":0,
        "question_quality":{"open":1,"closed":0,"ratio":1.0},
        "scores":{"clarity":90,"impact":70,"inclusion":60},
        "action_items":[{"title":"Ask follow-ups","why":"Increase Pull","how":"Use 'walk me through'","example_utterance_id":1}]
      }
    ],
    "timeline":[{"utterance_id":0,"speaker":"S1","behavior":"open_question","start_sec":3.2,"end_sec":6.5}],
    "metrics":{"speaking_time":[{"speaker":"S1","seconds":6.5},{"speaker":"S2","seconds":4.3}],"pull_push_transitions":[],"alerts":[]},
    "feedback":{"overall":{"strengths":["Good discovery"],"opportunities":["Invite S2 more"]},"by_participant":[{"id":"S1","notes":["Good opener; build with follow-ups."]}]}
  }
}

9) Backend Implementation (FastAPI)

Routes

POST   /api/uploads/init           -> { jobId, chunkSize }
POST   /api/uploads/chunk          -> index, jobId (binary body) 204
POST   /api/uploads/finish         -> { jobId }
POST   /api/jobs/:jobId/start      -> enqueue/run single-pass analysis
GET    /api/jobs/:jobId            -> { status, progress, error? }
GET    /api/analyses/:jobId        -> unified JSON
GET    /api/analyses/:jobId/pdf    -> PDF
GET    /api/history?range=30|60|90 -> summaries

Minimal Sketches

# app/main.py
from fastapi import FastAPI
from app.api.uploads import router as uploads_router
from app.api.jobs import router as jobs_router
from app.api.analyses import router as analyses_router

app = FastAPI(title="BTG Rackham Coach API")
app.include_router(uploads_router, prefix="/api/uploads")
app.include_router(jobs_router,    prefix="/api/jobs")
app.include_router(analyses_router, prefix="/api/analyses")

# app/services/gemini.py (single-pass pseudo)
import google.generativeai as genai
import json, os
from jsonschema import Draft7Validator

genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
MODEL = os.getenv("GEMINI_MODEL", "gemini-2.5-pro")

async def analyze_video_singlepass(video_path: str, unified_schema: dict, fewshot_example: dict | None = None) -> dict:
    # 1) Upload video (SDK or signed URL)
    # 2) Send system instructions + unified schema + optional tiny example
    # 3) Parse JSON; validate; if invalid, send first error back for 1–2 fix attempts
    # 4) Return parsed dict
    ...

# app/services/validation.py
from jsonschema import Draft7Validator
def first_error(validator: Draft7Validator, data: dict) -> str | None:
    for err in validator.iter_errors(data):
        return f"{err.message} at {'/'.join(map(str, err.path))}"
    return None

# app/services/pdf.py
from weasyprint import HTML
from jinja2 import Environment, PackageLoader, select_autoescape
env = Environment(loader=PackageLoader("app"), autoescape=select_autoescape())
def render_pdf(report: dict) -> bytes:
    tpl = env.get_template("templates/report.html")
    return HTML(string=tpl.render(data=report)).write_pdf()

10) Frontend (React + Vite + TS)

Upload: drag & drop, 2 GB cap, chunk progress.
Progress: stepper “Upload → Analyze → Render”.
Results:
- Behavior table per participant
- Pull:Push gauges (overall + per participant)
- Speaking-time donut
- Timeline with Pull→Push markers & inappropriate push flags
- Score cards: Clarity / Impact / Inclusion
- Coaching: 2–3 action items per participant with jump-to timestamp
- Download PDF button
History: 30/60/90 day filters.
A11y: WCAG AA, keyboard, ARIA.

BTG placeholder theme

:root{
  --btg-primary:#2B6CB0; --btg-accent:#38B2AC; --btg-warn:#DD6B20;
  --btg-fg:#E5EEF9; --btg-bg:#0B1016;
}

11) PDF Export (WeasyPrint)

Sections: Cover → Overview → Metrics → Behavior Breakdown → Timeline Highlights → Per-Participant Coaching.
Footer: “Internal use only. 90-day retention.”

12) Docker & Apache

infra/docker-compose.yml

version: "3.9"
services:
  frontend:
    build: ../frontend
    ports: ["3000:3000"]
    environment: [ "VITE_API_BASE=/api" ]
    depends_on: [ backend ]
  backend:
    build: ../backend
    ports: ["8080:8080"]
    environment:
      - MONGO_URL=mongodb://mongo:27017/btg
      - GEMINI_API_KEY=${GEMINI_API_KEY}
      - GEMINI_MODEL=gemini-2.5-pro
      - UPLOAD_DIR=/data/videos
      - TMP_DIR=/tmp/chunks
    volumes:
      - videos:/data/videos
      - tmp:/tmp/chunks
    depends_on: [ mongo ]
  mongo:
    image: mongo:7
    ports: [ "27017:27017" ]
    volumes:
      - mongodata:/data/db
volumes: { mongodata: {}, videos: {}, tmp: {} }

infra/apache/btg.conf

<VirtualHost *:80>
  ServerName btg.example.com
  ProxyPreserveHost On
  ProxyPass        /api http://backend:8080/
  ProxyPassReverse /api http://backend:8080/
  ProxyPass        /    http://frontend:3000/
  ProxyPassReverse /    http://frontend:3000/
</VirtualHost>

backend/Dockerfile

FROM python:3.11-slim
RUN apt-get update && apt-get install -y     libpango-1.0-0 libpangoft2-1.0-0 libcairo2 libjpeg62-turbo     libharfbuzz0b libgdk-pixbuf-2.0-0 fonts-dejavu-core     && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend /app
ENV PYTHONUNBUFFERED=1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

13) Security & Privacy (MVP)

Optional local auth (email+password), HTTPOnly cookies.
Validate upload size/type; sanitize filenames; prevent path traversal.
CORS locked to site origin.
TTL purge DB and filesystem.
UI reminder: “Internal meetings only; user owns data.”

14) Testing (Lightweight)

jsonschema validation tests for unified schema.
Golden fixtures: few demo videos → stable unified JSON.
PDF smoke (WeasyPrint).
Performance smoke: 60‑min video processes under target constraints (single-concurrency).

15) Suggested Build Order

Week 1 — Infra + Uploads + Models
Week 2 — Single‑pass Gemini call + Validation + Progress UI
Week 3 — Dashboard visualizations + Coaching panes
Week 4 — PDF, History, TTL cleanup, Accessibility polish

16) Tiny Unified JSON Examples

Minimal

{
  "version":"v1",
  "transcript":{"duration_sec":90,"speakers":["S1","S2"],"utterances":[
    {"speaker":"S1","start_sec":2.1,"end_sec":5.8,"text":"What outcomes matter most this quarter?"},
    {"speaker":"S2","start_sec":6.2,"end_sec":10.1,"text":"Uh, revenue from self-serve signups mainly."}
  ]},
  "analysis":{
    "participants":[
      {"id":"S1","behavior_counts":{"open_question":1,"closed_question":0,"testing_understanding":0,"summarizing":0,"bringing_in":0,"proposing":0,"giving_info_fact":0,"giving_info_opinion":0,"disagreeing":0,"defending_attacking":0,"shutting_out_interrupting":0},"speaking_time_sec":3.7,"pull_push":{"pull_count":1,"push_count":0,"ratio":1.0},"filler_per_min":0,"question_quality":{"open":1,"closed":0,"ratio":1.0},"scores":{"clarity":92,"impact":70,"inclusion":60},"action_items":[{"title":"Add follow-ups","why":"Increase Pull depth","how":"Use 'walk me through'","example_utterance_id":1}]}
    ],
    "timeline":[{"utterance_id":0,"speaker":"S1","behavior":"open_question","start_sec":2.1,"end_sec":5.8}],
    "metrics":{"speaking_time":[{"speaker":"S1","seconds":3.7},{"speaker":"S2","seconds":3.9}],"pull_push_transitions":[],"alerts":[]},
    "feedback":{"overall":{"strengths":["Strong discovery"],"opportunities":["Invite S2 explicitly"]},"by_participant":[{"id":"S1","notes":["Good opener; build with follow-ups."]}]}
  }
}

17) Future (Phase 2 — not in MVP)

Resume uploads; multi-concurrency; estimates.
Prosody/interruptions, slide detection, non‑verbal cues.
Team rollups with anonymization; trends.
Calendar/meeting-platform integrations.
Fine‑tuned classifiers if needed.

End of Document

23 KiB Raw Blame History Unescape Escape