Self-contained SSH-and-paste guide covering clone → .env → deploy → Apache reload → smoke test. Includes troubleshooting for the most likely failure modes (MSAL redirect mismatch, missing env vars, /health 503).
7.4 KiB
Dev cutover runbook — optical-dev.oliver.solutions
Self-contained "SSH in and paste these" instructions for the first
deploy of HM AI QC to the new Dev server. Phase 3 of
deploy/DEV_PROD_MIGRATION_PLAN.md.
Estimated time: 20 minutes if everything works first try.
0 — Prereqs (verify before starting)
| Check | How |
|---|---|
| Entra redirect URIs added | Confirmed 2026-05-09 — https://optical-dev.oliver.solutions/hm-aiqc/ and https://optical-prod.oliver.solutions/hm-aiqc/ are registered as SPA URIs on app 9079054c-9620-4757-a256-23413042f1ef. |
develop branch pushed |
git ls-remote origin develop from your laptop shows commit e772095 (Phase 2). |
You have SSH access to optical-dev.oliver.solutions |
ssh optical-dev.oliver.solutions lands you in. |
| You have docker permissions on the server | docker ps works without sudo. |
| You have sudo for Apache reload | sudo systemctl status apache2 works. |
If anything's missing, stop and resolve before proceeding.
1 — SSH in and clone the repo
ssh optical-dev.oliver.solutions
sudo mkdir -p /opt/hm-aiqc
sudo chown $(whoami):$(whoami) /opt/hm-aiqc
git clone git@bitbucket.org:zlalani/hm_ai_qc_report_tool.git /opt/hm-aiqc
cd /opt/hm-aiqc
git checkout develop
git log -1 --oneline # should print: e772095 Phase 2: deploy machinery for Dev/Prod cutover
If git clone fails on auth, you'll need a deploy key on the server first
(an SSH key on optical-dev whose public half is added to Bitbucket as a
read-only deploy key for this repo). Same key approach as AI QC.
2 — Create the .env file
cp deploy/.env.dev.example .env
# Generate a real Flask SECRET_KEY
python3 -c 'import secrets; print(secrets.token_urlsafe(48))'
# Paste that value into SECRET_KEY=
# Fill in real LLM API keys
$EDITOR .env
Required keys to fill in (the .env.dev.example placeholders):
SECRET_KEY— from the python3 one-liner aboveOPENAI_API_KEY— copy from sandboxoptical-web-1:/opt/hm_ai_qc/hm_ai_qc_report_tool/.envGOOGLE_API_KEY— same sourceANTHROPIC_API_KEY— same source
Confirm tenant/client IDs match the SSO plan:
grep "AZURE_" .env
# AZURE_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385
# AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
3 — Drop in the Box config
The Box service-account JSON lives outside the repo (gitignored):
mkdir -p config
scp optical-web-1:/opt/hm_ai_qc/hm_ai_qc_report_tool/config/box_config.json \
optical-dev.oliver.solutions:/opt/hm-aiqc/config/box_config.json
# On the dev server, lock it down:
chmod 600 /opt/hm-aiqc/config/box_config.json
Or if you can't ssh-copy directly, copy it to your laptop with scp and re-upload.
4 — Run the deploy
cd /opt/hm-aiqc
./deploy.sh dev --dry-run # preview — no changes
./deploy.sh dev # actual deploy, prompts y/N to confirm
The deploy script will:
- Save current HEAD to
.last_deploy_rollback(empty on first run, that's fine — it'll save the initial clone HEAD) git fetch+git reset --hard origin/developdocker compose build(first run pulls the python:3.11-slim base image — slow, ~2–5 min)docker compose up -d(entrypoint runswait_for_db.py→flask db upgrade→ gunicorn)- Poll
http://127.0.0.1:5050/healthevery 2s for up to 60s - If
/healthreturns 200 with{"status":"ok","db":true}→ done - If not → auto-rollback to the saved checkpoint and exit non-zero
Watch the logs in another terminal if you want to see the boot:
docker compose logs -f web
5 — Place the Apache config and reload
Find the existing optical-dev.oliver.solutions virtual host:
grep -rn "optical-dev.oliver.solutions" /etc/apache2/sites-available/
Open that file (likely /etc/apache2/sites-available/optical-dev.conf or similar)
and paste the contents of /opt/hm-aiqc/deploy/apache-dev.conf inside the
<VirtualHost *:443> block (the HTTPS one). Save.
Verify and reload:
sudo apache2ctl configtest # should print "Syntax OK"
sudo systemctl reload apache2
If configtest complains about missing modules:
sudo a2enmod proxy proxy_http headers rewrite
sudo systemctl reload apache2
6 — Smoke test the public URL
From your laptop:
curl -i https://optical-dev.oliver.solutions/hm-aiqc/health
# Expected: HTTP/1.1 200 OK + {"status":"ok","db":true}
If /health returns 200, open in a browser:
https://optical-dev.oliver.solutions/hm-aiqc/
You should be redirected to /auth/login-page. Click Sign in with Microsoft,
complete the popup with your *.brandtech.plus or *.oliver.agency work account.
After login, run through each module to confirm:
| Module | Quick test |
|---|---|
| Reporting | Tab loads, "Previous Box Reports" populates |
| HM QC | Upload one image, run, confirm score |
| Video QC | Upload one short MP4, run, confirm score |
| Video Master | Enter a known campaign name, confirm matches preview |
| Campaigns | List loads (will be empty on fresh start) |
| Usage | Tab loads, your email appears in the user filter |
The Usage tab is the last item because it's the proof that g.user.email
flows into usage_logs.user correctly — the whole point of Phase 1.
7 — If something goes wrong
Deploy script reports rollback
The script already restored the previous code state. Look at the container logs:
docker compose logs --tail=200 web
Most likely causes:
- Missing env var → fix
.envand re-run./deploy.sh dev - Migration error → check
flask db upgradeoutput in logs; restore DB if needed (see "Database backup" below) - Box config missing → confirm
config/box_config.jsonexists
/health returns 503 (db: false)
Container is up but can't reach SQLite. Check that ./database volume is
mounted and writable:
docker compose exec web ls -la /app/database
docker compose exec web touch /app/database/_test_write && docker compose exec web rm /app/database/_test_write
MSAL popup spins forever / "AADSTS50011" redirect URI mismatch
Confirm the URL in the address bar exactly matches the Entra-registered SPA URI:
- Must be
https://optical-dev.oliver.solutions/hm-aiqc/(trailing slash) - Apache
RewriteRuleinapache-dev.confadds the trailing slash if you visit/hm-aiqcwithout one — verify that 301 fires.
"AADSTS70001: Application not found in tenant"
The user is signed into a different Microsoft tenant. They must use a
*.brandtech.plus / *.oliver.agency work account, not a personal Microsoft
account.
Manual rollback
cd /opt/hm-aiqc
./deploy/rollback.sh last # back to the checkpoint
# or
./deploy/rollback.sh <sha> # back to a specific commit
Database backup (before risky migrations)
cd /opt/hm-aiqc
sqlite3 database/qc_platform.db ".backup database/backups/qc_platform_$(date +%F_%H%M).db"
8 — Hand-off
Once Dev is green and the smoke tests pass:
- Tell the team the URL:
https://optical-dev.oliver.solutions/hm-aiqc/ - Don't decommission the sandbox yet (
https://ai-sandbox.oliver.solutions/hm-ai-qc-report/) — leave it running until Prod is also live + soaked for ~1 week. - When ready to ship Prod: tag
v3.0.0onmain, then repeat this runbook onoptical-prod.oliver.solutionswith./deploy.sh prod v3.0.0anddeploy/apache-prod.conf. Same steps, different host.