# Dev cutover runbook — `optical-dev.oliver.solutions` Self-contained "SSH in and paste these" instructions for the first deploy of HM AI QC to the new Dev server. Phase 3 of `deploy/DEV_PROD_MIGRATION_PLAN.md`. **Estimated time:** 20 minutes if everything works first try. --- ## 0 — Prereqs (verify before starting) | Check | How | |---|---| | Entra redirect URIs added | Confirmed 2026-05-09 — `https://optical-dev.oliver.solutions/hm-aiqc/` and `https://optical-prod.oliver.solutions/hm-aiqc/` are registered as SPA URIs on app `9079054c-9620-4757-a256-23413042f1ef`. | | `develop` branch pushed | `git ls-remote origin develop` from your laptop shows commit `e772095` (Phase 2). | | You have SSH access to `optical-dev.oliver.solutions` | `ssh optical-dev.oliver.solutions` lands you in. | | You have docker permissions on the server | `docker ps` works without sudo. | | You have sudo for Apache reload | `sudo systemctl status apache2` works. | If anything's missing, stop and resolve before proceeding. --- ## 1 — SSH in and clone the repo ```bash ssh optical-dev.oliver.solutions sudo mkdir -p /opt/hm-aiqc sudo chown $(whoami):$(whoami) /opt/hm-aiqc git clone git@bitbucket.org:zlalani/hm_ai_qc_report_tool.git /opt/hm-aiqc cd /opt/hm-aiqc git checkout develop git log -1 --oneline # should print: e772095 Phase 2: deploy machinery for Dev/Prod cutover ``` If `git clone` fails on auth, you'll need a deploy key on the server first (an SSH key on `optical-dev` whose public half is added to Bitbucket as a read-only deploy key for this repo). Same key approach as AI QC. --- ## 2 — Create the `.env` file ```bash cp deploy/.env.dev.example .env # Generate a real Flask SECRET_KEY python3 -c 'import secrets; print(secrets.token_urlsafe(48))' # Paste that value into SECRET_KEY= # Fill in real LLM API keys $EDITOR .env ``` Required keys to fill in (the `.env.dev.example` placeholders): - `SECRET_KEY` — from the python3 one-liner above - `OPENAI_API_KEY` — copy from sandbox `optical-web-1:/opt/hm_ai_qc/hm_ai_qc_report_tool/.env` - `GOOGLE_API_KEY` — same source - `ANTHROPIC_API_KEY` — same source Confirm tenant/client IDs match the SSO plan: ```bash grep "AZURE_" .env # AZURE_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385 # AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef ``` --- ## 3 — Drop in the Box config The Box service-account JSON lives outside the repo (gitignored): ```bash mkdir -p config scp optical-web-1:/opt/hm_ai_qc/hm_ai_qc_report_tool/config/box_config.json \ optical-dev.oliver.solutions:/opt/hm-aiqc/config/box_config.json # On the dev server, lock it down: chmod 600 /opt/hm-aiqc/config/box_config.json ``` Or if you can't ssh-copy directly, copy it to your laptop with `scp` and re-upload. --- ## 4 — Run the deploy ```bash cd /opt/hm-aiqc ./deploy.sh dev --dry-run # preview — no changes ./deploy.sh dev # actual deploy, prompts y/N to confirm ``` The deploy script will: 1. Save current HEAD to `.last_deploy_rollback` (empty on first run, that's fine — it'll save the initial clone HEAD) 2. `git fetch` + `git reset --hard origin/develop` 3. `docker compose build` (first run pulls the python:3.11-slim base image — slow, ~2–5 min) 4. `docker compose up -d` (entrypoint runs `wait_for_db.py` → `flask db upgrade` → gunicorn) 5. Poll `http://127.0.0.1:5050/health` every 2s for up to 60s 6. If `/health` returns 200 with `{"status":"ok","db":true}` → done 7. If not → auto-rollback to the saved checkpoint and exit non-zero Watch the logs in another terminal if you want to see the boot: ```bash docker compose logs -f web ``` --- ## 5 — Place the Apache config and reload Find the existing `optical-dev.oliver.solutions` virtual host: ```bash grep -rn "optical-dev.oliver.solutions" /etc/apache2/sites-available/ ``` Open that file (likely `/etc/apache2/sites-available/optical-dev.conf` or similar) and paste the contents of `/opt/hm-aiqc/deploy/apache-dev.conf` inside the `` block (the HTTPS one). Save. Verify and reload: ```bash sudo apache2ctl configtest # should print "Syntax OK" sudo systemctl reload apache2 ``` If `configtest` complains about missing modules: ```bash sudo a2enmod proxy proxy_http headers rewrite sudo systemctl reload apache2 ``` --- ## 6 — Smoke test the public URL From your **laptop**: ```bash curl -i https://optical-dev.oliver.solutions/hm-aiqc/health # Expected: HTTP/1.1 200 OK + {"status":"ok","db":true} ``` If `/health` returns 200, open in a browser: ``` https://optical-dev.oliver.solutions/hm-aiqc/ ``` You should be redirected to `/auth/login-page`. Click **Sign in with Microsoft**, complete the popup with your `*.brandtech.plus` or `*.oliver.agency` work account. After login, run through each module to confirm: | Module | Quick test | |---|---| | Reporting | Tab loads, "Previous Box Reports" populates | | HM QC | Upload one image, run, confirm score | | Video QC | Upload one short MP4, run, confirm score | | Video Master | Enter a known campaign name, confirm matches preview | | Campaigns | List loads (will be empty on fresh start) | | Usage | Tab loads, your email appears in the user filter | The Usage tab is the last item because it's the proof that `g.user.email` flows into `usage_logs.user` correctly — the whole point of Phase 1. --- ## 7 — If something goes wrong ### Deploy script reports rollback The script already restored the previous code state. Look at the container logs: ```bash docker compose logs --tail=200 web ``` Most likely causes: - Missing env var → fix `.env` and re-run `./deploy.sh dev` - Migration error → check `flask db upgrade` output in logs; restore DB if needed (see "Database backup" below) - Box config missing → confirm `config/box_config.json` exists ### `/health` returns 503 (db: false) Container is up but can't reach SQLite. Check that `./database` volume is mounted and writable: ```bash docker compose exec web ls -la /app/database docker compose exec web touch /app/database/_test_write && docker compose exec web rm /app/database/_test_write ``` ### MSAL popup spins forever / "AADSTS50011" redirect URI mismatch Confirm the URL in the address bar exactly matches the Entra-registered SPA URI: - Must be `https://optical-dev.oliver.solutions/hm-aiqc/` (trailing slash) - Apache `RewriteRule` in `apache-dev.conf` adds the trailing slash if you visit `/hm-aiqc` without one — verify that 301 fires. ### "AADSTS70001: Application not found in tenant" The user is signed into a *different* Microsoft tenant. They must use a `*.brandtech.plus` / `*.oliver.agency` work account, not a personal Microsoft account. ### Manual rollback ```bash cd /opt/hm-aiqc ./deploy/rollback.sh last # back to the checkpoint # or ./deploy/rollback.sh # back to a specific commit ``` ### Database backup (before risky migrations) ```bash cd /opt/hm-aiqc sqlite3 database/qc_platform.db ".backup database/backups/qc_platform_$(date +%F_%H%M).db" ``` --- ## 8 — Hand-off Once Dev is green and the smoke tests pass: 1. Tell the team the URL: `https://optical-dev.oliver.solutions/hm-aiqc/` 2. **Don't** decommission the sandbox yet (`https://ai-sandbox.oliver.solutions/hm-ai-qc-report/`) — leave it running until Prod is also live + soaked for ~1 week. 3. When ready to ship Prod: tag `v3.0.0` on `main`, then repeat this runbook on `optical-prod.oliver.solutions` with `./deploy.sh prod v3.0.0` and `deploy/apache-prod.conf`. Same steps, different host.