Migrate PDF processing from Redis worker to Google Cloud Run

Replace the Redis queue + Python worker daemon with a synchronous HTTP call to a Cloud Run service, eliminating Redis and simplifying the infrastructure from 4 containers (web, worker, redis, postgres) to just web + postgres (with Cloud Run handling processing). - Add cloudrun_service.py: Flask app wrapping EnterprisePDFChecker with POST /check and GET /health endpoints, GCS image upload - Add Dockerfile.cloudrun + requirements-cloudrun.txt for Cloud Run image - Add cloudbuild.yaml for Cloud Build with custom Dockerfile - Rewrite api.php: remove all Redis code, add Cloud Run OIDC auth (getCloudRunToken), synchronous processing in handleCheck(), file-based rate limiting, GCS redirect in handleImage(), DB helper updateJobInDatabase() - Update js/upload.js: handle synchronous completed response from Cloud Run, increase poll timeout to 15 minutes - Update js/page-viewer.js: use GCS URLs directly for page images - Simplify docker-compose.yml and docker-compose.prod.yml: remove worker and redis services - Remove PHP Redis extension from Dockerfile.web - Set 900s timeouts across nginx, PHP-FPM, gunicorn, curl, and Cloud Run - Update cleanup.py: remove result_images pattern (now on GCS), add rate_limits cleanup - Update .env.example: replace Redis vars with Cloud Run/GCS config Cloud Run service deployed to: https://pdf-checker-bcb6ipdqka-uc.a.run.app GCS bucket: gs://optical-pdf-images (7-day lifecycle, public read) GCP project: optical-414516 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 14:50:38 -06:00 · 2026-02-25 14:50:38 -06:00 · 4080638856
commit 4080638856
parent 463b504d67
16 changed files with 722 additions and 223 deletions
--- a/.env.example
+++ b/.env.example
@ -27,12 +27,13 @@ DB_NAME=pdf_checker
 DB_USER=pdf_checker
 DB_PASSWORD=change_me_in_production

-# Redis - used for job queue in Docker setup
-REDIS_HOST=redis
-REDIS_PORT=6379
-
-# Worker configuration
-WORKER_COUNT=2
+# Cloud Run - PDF processing service
+# Set this to your deployed Cloud Run URL (leave empty for local Python fallback)
+CLOUD_RUN_URL=https://pdf-checker-bcb6ipdqka-uc.a.run.app
+# Path to GCP service account key for authenticating to Cloud Run
+GCP_SA_KEY_PATH=./pdf-api-invoker-key.json
+# GCS bucket for page images
+GCS_BUCKET_NAME=optical-pdf-images

 # Azure AD / MSAL Authentication
 AZURE_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385
--- a/.gitignore
+++ b/.gitignore
@ -33,7 +33,13 @@ Thumbs.db

 # Docker volumes (local data)
 pg-data/
-redis-data/
+
+# GCP service account keys
+*-key.json
+*-credentials.json
+
+# Rate limit data
+rate_limits/

 # Coverage
 .coverage
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,100 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+AI-powered PDF accessibility checker that validates documents against WCAG 2.1 Level A & AA standards. Combines traditional PDF analysis (pypdf, pdfplumber) with AI models (Anthropic Claude, Google Cloud Vision) for ~95% automated WCAG coverage. Branded for "Oliver" (Montserrat font, black/#FFC407 palette).
+
+## Commands
+
+### Testing
+```bash
+source venv/bin/activate
+pytest tests/ -v                          # Run all tests (31 tests)
+pytest tests/ --cov=. --cov-report=html   # With coverage report
+pytest tests/test_checker.py -v           # Single test file
+pytest tests/ -m "not integration"        # Skip integration tests
+```
+
+### Running Locally
+```bash
+source venv/bin/activate
+php -S localhost:8000                     # Start PHP dev server
+```
+
+### Docker
+```bash
+docker-compose up                                      # Development stack
+docker-compose -f docker-compose.prod.yml up -d        # Production stack
+docker-compose exec worker pytest tests/ -v            # Tests in container
+```
+
+### CLI Usage
+```bash
+python enterprise_pdf_checker.py document.pdf --output report.json   # Full check
+python enterprise_pdf_checker.py document.pdf --quick                # Skip AI checks
+python pdf_remediation.py document.pdf --output fixed.pdf --all      # Auto-remediate
+```
+
+## Architecture
+
+### Three Interfaces
+- **Web UI** (`index.html` + `js/` + `css/`) — vanilla JS, drag-drop upload, visual inspector
+- **REST API** (`api.php`) — PHP endpoints: upload, check, status, result, remediate, download
+- **CLI** (`enterprise_pdf_checker.py`) — direct Python execution
+
+### Request Flow (Docker/Production)
+1. `api.php` receives upload, validates via `auth.php`, saves to `uploads/`
+2. Job pushed to Redis queue (`pdf:queue`) and tracked in PostgreSQL
+3. `worker.py` daemon pops jobs, runs `EnterprisePDFChecker.check_all()`
+4. Results written to `results/{job_id}.result.json`, DB updated
+5. Client polls `api.php?action=status` then fetches results
+
+### Key Source Files
+| File | Purpose |
+|------|---------|
+| `enterprise_pdf_checker.py` | Core engine — 30+ WCAG checks, AI image analysis, scoring |
+| `api.php` | REST API — file handling, job queue integration, CORS |
+| `auth.php` | Authentication — Bearer/X-API-Key, dev mode localhost bypass |
+| `worker.py` | Background daemon — Redis queue consumer, graceful shutdown |
+| `db_manager.py` | PostgreSQL ORM — jobs CRUD, audit logging |
+| `redis_queue.py` | Redis operations — job queue, status tracking, rate limiting |
+| `pdf_remediation.py` | Auto-fix — metadata, tagging, language tags |
+| `retry_helper.py` | Exponential backoff for external API calls |
+| `report_generator.py` | Result formatting and report generation |
+| `logger_config.py` | Structured logging with rotation (10MB max) |
+| `cleanup.py` | File retention cleanup (24h for uploads/results) |
+
+### Data Layer
+- **PostgreSQL** — `jobs` table (status, score, grade, result JSON), `audit_log` table. Schema in `db/init.sql`
+- **Redis** — Job queue (`pdf:queue`), status tracking (`pdf:status:*`), rate limiting (`pdf:rate:*`)
+
+### External APIs
+- **Anthropic Claude 3.5 Sonnet** — alt text validation, image classification, text-in-images
+- **Google Cloud Vision** — OCR, text detection
+- **veraPDF** (optional) — PDF/UA-1 compliance validation
+
+### Frontend Structure
+`js/app.js` (controller), `js/upload.js` (drag-drop), `js/api.js` (HTTP client), `js/results.js` (display), `js/page-viewer.js` (PDF inspector), `js/batch.js` (batch processing), `js/utils.js` (helpers)
+
+## Tech Stack
+- **Backend**: Python 3.11 (processing), PHP 8.2 (API)
+- **Frontend**: Vanilla HTML/CSS/JS
+- **Database**: PostgreSQL 16, Redis 7
+- **Infrastructure**: Docker, Nginx/Apache, PHP-FPM
+- **System deps**: Tesseract OCR, Poppler, Ghostscript
+
+## Configuration
+Environment variables via `.env` (see `.env.example`). Key settings:
+- `ANTHROPIC_API_KEY` / `GOOGLE_API_KEY` — AI API credentials
+- `DEV_MODE=true` — bypasses auth for localhost requests
+- `DB_HOST`, `DB_PORT`, `REDIS_HOST`, `REDIS_PORT` — infrastructure endpoints
+- Production uses ports 1220 (Redis) and 1221 (PostgreSQL) to avoid host conflicts
+
+## Testing
+- pytest with markers: `integration`, `slow`, `api`
+- Config in `pytest.ini`
+- Fixtures in `tests/conftest.py`
+- Sample PDFs in `Test_files/`
+- No linter currently configured
--- a/Dockerfile.cloudrun
+++ b/Dockerfile.cloudrun
@ -0,0 +1,29 @@
+FROM python:3.11-slim
+
+# Install system dependencies for PDF processing
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    tesseract-ocr \
+    tesseract-ocr-eng \
+    poppler-utils \
+    ghostscript \
+    libgl1 \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+# Install Python dependencies
+COPY requirements-cloudrun.txt .
+RUN pip install --no-cache-dir -r requirements-cloudrun.txt
+
+# Copy application code (no worker, redis_queue, or db_manager)
+COPY cloudrun_service.py .
+COPY enterprise_pdf_checker.py .
+COPY pdf_remediation.py .
+COPY logger_config.py .
+COPY retry_helper.py .
+
+# Cloud Run sets $PORT; gunicorn binds to it
+# --workers 1 --threads 1: Cloud Run concurrency=1, one request at a time
+# --timeout 900: allow up to 15 minutes for large PDFs
+CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 --timeout 900 cloudrun_service:app
--- a/Dockerfile.web
+++ b/Dockerfile.web
@ -4,12 +4,6 @@ FROM php:8.2-fpm-alpine
 RUN apk add --no-cache nginx python3 postgresql-dev && \
    docker-php-ext-install pdo pdo_pgsql

-# Install php-redis via PECL
-RUN apk add --no-cache --virtual .build-deps $PHPIZE_DEPS && \
-    pecl install redis && \
-    docker-php-ext-enable redis && \
-    apk del .build-deps
-
 # Copy Nginx config
 COPY nginx.conf /etc/nginx/http.d/default.conf

--- a/api.php
+++ b/api.php
@ -1,8 +1,10 @@
 <?php
 /**
 * Enterprise PDF Accessibility Checker - API Backend
- * 
- * Handles file uploads, job processing, and result retrieval
+ *
+ * Handles file uploads, sends PDFs to Cloud Run for processing,
+ * and serves results. No Redis dependency — uses Cloud Run for
+ * processing and file-based rate limiting.
 */

 // Load .env file if getenv doesn't work (Apache doesn't set env vars by default)
@ -29,45 +31,53 @@ define('PYTHON_SCRIPT', __DIR__ . '/enterprise_pdf_checker.py');
 define('MAX_FILE_SIZE', 50 * 1024 * 1024); // 50MB
 define('ALLOWED_EXTENSIONS', ['pdf']);

-// Redis configuration
-define('REDIS_HOST', getenv('REDIS_HOST') ?: 'localhost');
-define('REDIS_PORT', intval(getenv('REDIS_PORT') ?: 6379));
-define('REDIS_QUEUE', 'pdf:queue');
-define('REDIS_STATUS_PREFIX', 'pdf:status:');
-define('REDIS_RATE_PREFIX', 'pdf:rate:');
+// Cloud Run configuration
+define('CLOUD_RUN_URL', getenv('CLOUD_RUN_URL') ?: '');
+define('CLOUD_RUN_TIMEOUT', 900); // 15 minutes
+define('GCP_SA_KEY_PATH', getenv('GCP_SA_KEY_PATH') ?: __DIR__ . '/pdf-api-invoker-key.json');
+define('RATE_LIMIT_DIR', __DIR__ . '/rate_limits');
+
+// Database configuration
+define('DB_HOST', getenv('DB_HOST') ?: 'localhost');
+define('DB_PORT', intval(getenv('DB_PORT') ?: 5432));
+define('DB_NAME', getenv('DB_NAME') ?: 'pdf_checker');
+define('DB_USER', getenv('DB_USER') ?: 'pdf_checker');
+define('DB_PASSWORD', getenv('DB_PASSWORD') ?: 'dev_password');

 // Create directories if they don't exist
 if (!is_dir(UPLOAD_DIR)) mkdir(UPLOAD_DIR, 0755, true);
 if (!is_dir(RESULTS_DIR)) mkdir(RESULTS_DIR, 0755, true);
+if (!is_dir(RATE_LIMIT_DIR)) mkdir(RATE_LIMIT_DIR, 0755, true);

 /**
- * Get Redis connection (lazy singleton)
- */
-function getRedis() {
-    static $redis = null;
-    if ($redis === null) {
-        $redis = new Redis();
-        $redis->connect(REDIS_HOST, REDIS_PORT);
-    }
-    return $redis;
-}
-
-/**
- * Check rate limit via Redis. Returns true if allowed.
+ * Check rate limit via filesystem. Returns true if allowed.
+ * Stores timestamps in JSON files per IP+action.
 */
 function checkRateLimit($action, $limit, $window) {
-    try {
-        $redis = getRedis();
-        $ip = $_SERVER['REMOTE_ADDR'] ?? 'unknown';
-        $key = REDIS_RATE_PREFIX . $ip . ':' . $action;
-        $current = $redis->incr($key);
-        if ($current === 1) {
-            $redis->expire($key, $window);
+    $ip = $_SERVER['REMOTE_ADDR'] ?? 'unknown';
+    $key = preg_replace('/[^a-zA-Z0-9_-]/', '_', $ip . '_' . $action);
+    $file = RATE_LIMIT_DIR . '/' . $key . '.json';
+
+    $now = time();
+    $timestamps = [];
+
+    if (file_exists($file)) {
+        $data = json_decode(file_get_contents($file), true);
+        if (is_array($data)) {
+            // Filter to only timestamps within the window
+            $timestamps = array_filter($data, function($ts) use ($now, $window) {
+                return ($now - $ts) < $window;
+            });
        }
-        return $current <= $limit;
-    } catch (Exception $e) {
-        return true; // Allow if Redis is down
    }
+
+    if (count($timestamps) >= $limit) {
+        return false;
+    }
+
+    $timestamps[] = $now;
+    file_put_contents($file, json_encode(array_values($timestamps)));
+    return true;
 }

 /**
@ -80,6 +90,171 @@ function sanitizeJobId($job_id) {
    return $job_id;
 }

+/**
+ * Get an OIDC identity token for authenticating to Cloud Run.
+ * Uses a GCP service account key to create a self-signed JWT,
+ * then exchanges it for an identity token via Google's OAuth endpoint.
+ */
+function getCloudRunToken() {
+    static $cachedToken = null;
+    static $cachedExpiry = 0;
+
+    // Return cached token if still valid (with 5-min buffer)
+    if ($cachedToken && time() < ($cachedExpiry - 300)) {
+        return $cachedToken;
+    }
+
+    $keyPath = GCP_SA_KEY_PATH;
+    if (!file_exists($keyPath)) {
+        throw new Exception("GCP service account key not found: $keyPath");
+    }
+
+    $sa = json_decode(file_get_contents($keyPath), true);
+    if (!$sa || !isset($sa['client_email']) || !isset($sa['private_key'])) {
+        throw new Exception("Invalid service account key file");
+    }
+
+    $now = time();
+    $expiry = $now + 3600;
+
+    // Build JWT header and claims
+    $header = base64url_encode(json_encode(['alg' => 'RS256', 'typ' => 'JWT']));
+    $claims = base64url_encode(json_encode([
+        'iss' => $sa['client_email'],
+        'sub' => $sa['client_email'],
+        'aud' => 'https://oauth2.googleapis.com/token',
+        'iat' => $now,
+        'exp' => $expiry,
+        'target_audience' => CLOUD_RUN_URL,
+    ]));
+
+    // Sign with RSA-SHA256
+    $signingInput = "$header.$claims";
+    $signature = '';
+    $privateKey = openssl_pkey_get_private($sa['private_key']);
+    if (!$privateKey) {
+        throw new Exception("Failed to parse service account private key");
+    }
+    openssl_sign($signingInput, $signature, $privateKey, OPENSSL_ALGO_SHA256);
+    $jwt = $signingInput . '.' . base64url_encode($signature);
+
+    // Exchange JWT for identity token
+    $ch = curl_init('https://oauth2.googleapis.com/token');
+    curl_setopt_array($ch, [
+        CURLOPT_POST => true,
+        CURLOPT_POSTFIELDS => http_build_query([
+            'grant_type' => 'urn:ietf:params:oauth:grant-type:jwt-bearer',
+            'assertion' => $jwt,
+        ]),
+        CURLOPT_RETURNTRANSFER => true,
+        CURLOPT_TIMEOUT => 10,
+    ]);
+    $response = curl_exec($ch);
+    $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
+    curl_close($ch);
+
+    if ($httpCode !== 200) {
+        throw new Exception("Failed to get identity token: HTTP $httpCode - $response");
+    }
+
+    $tokenData = json_decode($response, true);
+    if (!isset($tokenData['id_token'])) {
+        throw new Exception("No id_token in response: $response");
+    }
+
+    $cachedToken = $tokenData['id_token'];
+    $cachedExpiry = $expiry;
+
+    return $cachedToken;
+}
+
+/**
+ * Base64url encode (no padding, URL-safe)
+ */
+function base64url_encode($data) {
+    return rtrim(strtr(base64_encode($data), '+/', '-_'), '=');
+}
+
+/**
+ * Get PostgreSQL PDO connection (lazy singleton)
+ */
+function getDB() {
+    static $pdo = null;
+    if ($pdo === null) {
+        $dsn = sprintf('pgsql:host=%s;port=%d;dbname=%s', DB_HOST, DB_PORT, DB_NAME);
+        $pdo = new PDO($dsn, DB_USER, DB_PASSWORD, [
+            PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
+        ]);
+    }
+    return $pdo;
+}
+
+/**
+ * Insert or update a job record in PostgreSQL
+ */
+function updateJobInDatabase($job_id, $filename, $status, $results = null) {
+    try {
+        $pdo = getDB();
+
+        $score = null;
+        $grade = null;
+        $total_issues = null;
+        $critical_count = null;
+        $error_count = null;
+        $warning_count = null;
+        $result_json = null;
+        $processing_time = null;
+
+        if ($results) {
+            $score = $results['accessibility_score'] ?? null;
+            $grade = $results['grade'] ?? null;
+            $issues = $results['issues'] ?? [];
+            $total_issues = count($issues);
+            $critical_count = count(array_filter($issues, fn($i) => ($i['severity'] ?? '') === 'CRITICAL'));
+            $error_count = count(array_filter($issues, fn($i) => ($i['severity'] ?? '') === 'ERROR'));
+            $warning_count = count(array_filter($issues, fn($i) => ($i['severity'] ?? '') === 'WARNING'));
+            $result_json = json_encode($results);
+            $processing_time = $results['stats']['processing_time'] ?? null;
+        }
+
+        $sql = "INSERT INTO jobs (job_id, filename, status, score, grade, total_issues,
+                    critical_count, error_count, warning_count, result_json, processing_time,
+                    completed_at)
+                VALUES (:job_id, :filename, :status, :score, :grade, :total_issues,
+                    :critical_count, :error_count, :warning_count, :result_json::jsonb, :processing_time,
+                    CASE WHEN :status2 = 'completed' THEN NOW() ELSE NULL END)
+                ON CONFLICT (job_id) DO UPDATE SET
+                    status = EXCLUDED.status,
+                    score = COALESCE(EXCLUDED.score, jobs.score),
+                    grade = COALESCE(EXCLUDED.grade, jobs.grade),
+                    total_issues = COALESCE(EXCLUDED.total_issues, jobs.total_issues),
+                    critical_count = COALESCE(EXCLUDED.critical_count, jobs.critical_count),
+                    error_count = COALESCE(EXCLUDED.error_count, jobs.error_count),
+                    warning_count = COALESCE(EXCLUDED.warning_count, jobs.warning_count),
+                    result_json = COALESCE(EXCLUDED.result_json, jobs.result_json),
+                    processing_time = COALESCE(EXCLUDED.processing_time, jobs.processing_time),
+                    completed_at = CASE WHEN EXCLUDED.status = 'completed' THEN NOW() ELSE jobs.completed_at END";
+
+        $stmt = $pdo->prepare($sql);
+        $stmt->execute([
+            ':job_id' => $job_id,
+            ':filename' => $filename,
+            ':status' => $status,
+            ':score' => $score,
+            ':grade' => $grade,
+            ':total_issues' => $total_issues,
+            ':critical_count' => $critical_count,
+            ':error_count' => $error_count,
+            ':warning_count' => $warning_count,
+            ':result_json' => $result_json,
+            ':processing_time' => $processing_time,
+            ':status2' => $status,
+        ]);
+    } catch (Exception $e) {
+        error_log("DB update failed for $job_id: " . $e->getMessage());
+    }
+}
+
 // CORS headers for API
 $allowed_origins = [
    'https://ai-sandbox.oliver.solutions',
@ -173,18 +348,18 @@ function handleUpload() {
    if (!isset($_FILES['pdf'])) {
        error('No file uploaded');
    }
-    
+
    $file = $_FILES['pdf'];
-    
+
    // Validate file
    if ($file['error'] !== UPLOAD_ERR_OK) {
        error('Upload error: ' . $file['error']);
    }
-    
+
    if ($file['size'] > MAX_FILE_SIZE) {
        error('File too large. Max size: ' . (MAX_FILE_SIZE / 1024 / 1024) . 'MB');
    }
-    
+
    $ext = strtolower(pathinfo($file['name'], PATHINFO_EXTENSION));
    if (!in_array($ext, ALLOWED_EXTENSIONS)) {
        error('Invalid file type. Only PDF files allowed.');
@ -200,12 +375,12 @@ function handleUpload() {
    $job_id = 'pdf_' . bin2hex(random_bytes(16));
    $filename = $job_id . '.pdf';
    $filepath = UPLOAD_DIR . '/' . $filename;
-    
+
    // Move file
    if (!move_uploaded_file($file['tmp_name'], $filepath)) {
        error('Failed to save file');
    }
-    
+
    // Create job metadata
    $job_data = [
        'job_id' => $job_id,
@ -215,12 +390,12 @@ function handleUpload() {
        'status' => 'uploaded',
        'filepath' => $filepath
    ];
-    
+
    file_put_contents(
        RESULTS_DIR . '/' . $job_id . '.meta.json',
        json_encode($job_data, JSON_PRETTY_PRINT)
    );
-    
+
    success([
        'job_id' => $job_id,
        'filename' => $file['name'],
@ -229,9 +404,11 @@ function handleUpload() {
 }

 /**
- * Handle PDF accessibility check — push job to Redis queue
+ * Handle PDF accessibility check — send PDF to Cloud Run synchronously
 */
 function handleCheck() {
+    set_time_limit(900); // Allow up to 15 minutes
+
    $job_id = $_POST['job_id'] ?? '';

    if (empty($job_id)) {
@ -253,32 +430,98 @@ function handleCheck() {
    }

    $job_data = json_decode(file_get_contents($meta_file), true);
-
    $quick_mode = $_POST['quick_mode'] ?? false;

-    // Push job to Redis queue for worker processing
-    try {
-        $redis = getRedis();
-        $payload = json_encode([
-            'job_id' => $job_id,
-            'pdf_path' => $job_data['filepath'],
-            'original_filename' => $job_data['original_filename'] ?? '',
-            'options' => [
-                'quick_mode' => (bool)$quick_mode,
-            ],
-            'queued_at' => time()
-        ]);
-        $redis->lPush(REDIS_QUEUE, $payload);
+    // Update meta to processing
+    $job_data['status'] = 'processing';
+    $job_data['started_at'] = date('Y-m-d H:i:s');
+    file_put_contents($meta_file, json_encode($job_data, JSON_PRETTY_PRINT));

-        // Set initial status in Redis
-        $redis->setex(REDIS_STATUS_PREFIX . $job_id, 86400, json_encode([
-            'status' => 'queued',
-            'progress' => 0,
-            'message' => 'Waiting in queue',
-            'updated_at' => time()
-        ]));
-    } catch (Exception $e) {
-        // Fallback to direct exec if Redis is unavailable (local dev without Docker)
+    // If Cloud Run URL is configured, send to Cloud Run
+    if (!empty(CLOUD_RUN_URL)) {
+        try {
+            $token = getCloudRunToken();
+            $pdf_path = $job_data['filepath'];
+
+            if (!file_exists($pdf_path)) {
+                error('PDF file not found on server');
+            }
+
+            // Build multipart POST to Cloud Run
+            $ch = curl_init(CLOUD_RUN_URL . '/check');
+            $postFields = [
+                'pdf' => new CURLFile($pdf_path, 'application/pdf', basename($pdf_path)),
+                'job_id' => $job_id,
+                'quick_mode' => $quick_mode ? 'true' : 'false',
+                'original_filename' => $job_data['original_filename'] ?? '',
+            ];
+
+            curl_setopt_array($ch, [
+                CURLOPT_POST => true,
+                CURLOPT_POSTFIELDS => $postFields,
+                CURLOPT_RETURNTRANSFER => true,
+                CURLOPT_TIMEOUT => CLOUD_RUN_TIMEOUT,
+                CURLOPT_HTTPHEADER => [
+                    'Authorization: Bearer ' . $token,
+                ],
+            ]);
+
+            $response = curl_exec($ch);
+            $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
+            $curlError = curl_error($ch);
+            curl_close($ch);
+
+            if ($curlError) {
+                throw new Exception("Cloud Run request failed: $curlError");
+            }
+
+            if ($httpCode !== 200) {
+                $errorBody = json_decode($response, true);
+                $errorMsg = $errorBody['error'] ?? "HTTP $httpCode";
+                throw new Exception("Cloud Run returned error: $errorMsg");
+            }
+
+            $result = json_decode($response, true);
+            if (!$result || !isset($result['success'])) {
+                throw new Exception("Invalid response from Cloud Run");
+            }
+
+            if (!$result['success']) {
+                throw new Exception($result['error'] ?? 'Unknown Cloud Run error');
+            }
+
+            $checkResult = $result['data'];
+
+            // Write result JSON to disk
+            $result_file = RESULTS_DIR . '/' . $job_id . '.result.json';
+            file_put_contents($result_file, json_encode($checkResult, JSON_PRETTY_PRINT));
+
+            // Update meta
+            $job_data['status'] = 'completed';
+            $job_data['completed_at'] = date('Y-m-d H:i:s');
+            file_put_contents($meta_file, json_encode($job_data, JSON_PRETTY_PRINT));
+
+            // Update PostgreSQL
+            updateJobInDatabase($job_id, $job_data['original_filename'] ?? '', 'completed', $checkResult);
+
+            success([
+                'job_id' => $job_id,
+                'status' => 'completed',
+                'message' => 'Check completed'
+            ]);
+
+        } catch (Exception $e) {
+            // Mark as failed
+            $job_data['status'] = 'failed';
+            $job_data['error'] = $e->getMessage();
+            file_put_contents($meta_file, json_encode($job_data, JSON_PRETTY_PRINT));
+
+            updateJobInDatabase($job_id, $job_data['original_filename'] ?? '', 'failed');
+
+            error('Processing failed: ' . $e->getMessage());
+        }
+    } else {
+        // Fallback to local exec (development without Cloud Run)
        $pdf_path = $job_data['filepath'];
        $output_path = RESULTS_DIR . '/' . $job_id . '.result.json';
        $venv_python = __DIR__ . '/venv/bin/python3';
@ -312,22 +555,17 @@ function handleCheck() {
        $error_log = RESULTS_DIR . '/' . $job_id . '.error.log';
        $cmd .= ' > ' . escapeshellarg($error_log) . ' 2>&1 &';
        exec($cmd, $output, $return_code);
+
+        success([
+            'job_id' => $job_id,
+            'status' => 'processing',
+            'message' => 'Check started (local mode)'
+        ]);
    }
-
-    // Update meta file
-    $job_data['status'] = 'queued';
-    $job_data['started_at'] = date('Y-m-d H:i:s');
-    file_put_contents($meta_file, json_encode($job_data, JSON_PRETTY_PRINT));
-
-    success([
-        'job_id' => $job_id,
-        'status' => 'queued',
-        'message' => 'Check queued for processing'
-    ]);
 }

 /**
- * Check job status — reads from Redis (real-time) with file fallback
+ * Check job status — pure file-based
 */
 function handleStatus() {
    $job_id = $_GET['job_id'] ?? '';
@ -347,30 +585,15 @@ function handleStatus() {

    $job_data = json_decode(file_get_contents($meta_file), true);

-    // Try Redis first for real-time progress
-    try {
-        $redis = getRedis();
-        $redis_status = $redis->get(REDIS_STATUS_PREFIX . $job_id);
-        if ($redis_status) {
-            $status_data = json_decode($redis_status, true);
-            $job_data['status'] = $status_data['status'];
-            $job_data['progress'] = $status_data['progress'] ?? 0;
-            $job_data['status_message'] = $status_data['message'] ?? '';
-        }
-    } catch (Exception $e) {
-        // Redis unavailable — fall through to file-based check
-    }
-
-    // File-based fallback: check if result exists
+    // Check if result file exists (definitive completion signal)
    if (file_exists($result_file)) {
        $job_data['status'] = 'completed';
-        $job_data['completed_at'] = date('Y-m-d H:i:s', filemtime($result_file));
-        file_put_contents($meta_file, json_encode($job_data, JSON_PRETTY_PRINT));
-    } else if (file_exists($error_log) && $job_data['status'] === 'processing') {
+        $job_data['completed_at'] = $job_data['completed_at'] ?? date('Y-m-d H:i:s', filemtime($result_file));
+    } else if (file_exists($error_log) && in_array($job_data['status'], ['processing', 'queued'])) {
        $error_content = file_get_contents($error_log);
        if (!empty($error_content)) {
            $started = strtotime($job_data['started_at'] ?? 'now');
-            if (time() - $started > 300) {
+            if (time() - $started > 900) {
                $job_data['status'] = 'failed';
                $job_data['error'] = 'Process timeout or error';
                $job_data['error_log'] = substr($error_content, -1000);
@ -391,15 +614,15 @@ function handleResult() {
        error('Job ID required');
    }
    $job_id = sanitizeJobId($job_id);
-    
+
    $result_file = RESULTS_DIR . '/' . $job_id . '.result.json';
-    
+
    if (!file_exists($result_file)) {
        error('Results not found. Check may still be processing.');
    }
-    
+
    $result = json_decode(file_get_contents($result_file), true);
-    
+
    success($result);
 }

@ -408,26 +631,26 @@ function handleResult() {
 */
 function handleList() {
    $jobs = [];
-    
+
    $files = glob(RESULTS_DIR . '/*.meta.json');
-    
+
    foreach ($files as $file) {
        $job_data = json_decode(file_get_contents($file), true);
-        
+
        // Check if completed
        $result_file = str_replace('.meta.json', '.result.json', $file);
        if (file_exists($result_file)) {
            $job_data['status'] = 'completed';
        }
-        
+
        $jobs[] = $job_data;
    }
-    
+
    // Sort by upload time (newest first)
    usort($jobs, function($a, $b) {
        return strtotime($b['uploaded_at']) - strtotime($a['uploaded_at']);
    });
-    
+
    success(['jobs' => $jobs]);
 }

@ -441,20 +664,20 @@ function handleDelete() {
        error('Job ID required');
    }
    $job_id = sanitizeJobId($job_id);
-    
+
    $meta_file = RESULTS_DIR . '/' . $job_id . '.meta.json';
-    
+
    if (!file_exists($meta_file)) {
        error('Job not found');
    }
-    
+
    $job_data = json_decode(file_get_contents($meta_file), true);
-    
+
    // Delete files
    @unlink($job_data['filepath']);
    @unlink($meta_file);
    @unlink(RESULTS_DIR . '/' . $job_id . '.result.json');
-    
+
    success(['message' => 'Job deleted']);
 }

@ -484,6 +707,7 @@ function handleDebug() {
        'meta_exists' => file_exists($meta_file),
        'result_exists' => file_exists($result_file),
        'error_log_exists' => file_exists($error_log),
+        'cloud_run_url' => CLOUD_RUN_URL ?: '(not configured — local mode)',
        'files' => []
    ];

@ -508,7 +732,7 @@ function handleDebug() {
 }

 /**
- * Serve page images
+ * Serve page images — redirect to GCS URL or serve local file
 */
 function handleImage() {
    $job_id = $_GET['job_id'] ?? '';
@ -518,10 +742,28 @@ function handleImage() {
        error('Job ID and page number required');
    }
    $job_id = sanitizeJobId($job_id);
+    $page_num = intval($page_num);

-    // Find the image file
+    // Check result JSON for GCS URLs
+    $result_file = RESULTS_DIR . '/' . $job_id . '.result.json';
+    if (file_exists($result_file)) {
+        $result = json_decode(file_get_contents($result_file), true);
+        $page_images = $result['page_images'] ?? [];
+
+        // Check if the page image value is a URL (GCS)
+        $image_value = $page_images[$page_num] ?? $page_images[strval($page_num)] ?? null;
+        if ($image_value && (strpos($image_value, 'http://') === 0 || strpos($image_value, 'https://') === 0)) {
+            // Redirect to GCS URL
+            header('HTTP/1.1 302 Found');
+            header('Location: ' . $image_value);
+            header('Cache-Control: public, max-age=86400');
+            exit;
+        }
+    }
+
+    // Fallback: serve local image file
    $images_dir = RESULTS_DIR . '/' . $job_id . '.result_images';
-    $image_file = $images_dir . '/page_' . intval($page_num) . '.png';
+    $image_file = $images_dir . '/page_' . $page_num . '.png';

    if (!file_exists($image_file)) {
        http_response_code(404);
@ -657,7 +899,6 @@ function handleStats() {
        'completed' => 0,
        'failed' => 0,
        'processing' => 0,
-        'queue_length' => 0
    ];

    // Count jobs from meta files
@ -675,14 +916,6 @@ function handleStats() {
        }
    }

-    // Get queue length from Redis
-    try {
-        $redis = getRedis();
-        $stats['queue_length'] = $redis->lLen(REDIS_QUEUE);
-    } catch (Exception $e) {
-        // Redis unavailable
-    }
-
    success($stats);
 }

--- a/cleanup.py
+++ b/cleanup.py
@ -2,8 +2,9 @@
 """
 PDF Accessibility Checker — File Cleanup

-Deletes uploaded PDFs, result JSON files, result images, and error logs
-older than RETENTION_HOURS (default 24h).
+Deletes uploaded PDFs, result JSON files, error logs, and rate limit files
+older than RETENTION_HOURS (default 24h). Page images are on GCS with
+a 7-day lifecycle policy.

 Usage:
  python cleanup.py              # dry-run (show what would be deleted)
@ -28,6 +29,7 @@ logger = logging.getLogger('cleanup')

 UPLOADS_DIR = Path(os.getenv('UPLOADS_DIR', '/opt/pdf-accessibility/uploads'))
 RESULTS_DIR = Path(os.getenv('RESULTS_DIR', '/opt/pdf-accessibility/results'))
+RATE_LIMIT_DIR = Path(os.getenv('RATE_LIMIT_DIR', '/opt/pdf-accessibility/rate_limits'))
 RETENTION_HOURS = int(os.getenv('RETENTION_HOURS', '24'))


@ -109,8 +111,13 @@ def main():
    total_deleted += d
    total_freed += f

-    # Clean results (JSON, error logs, image directories)
-    d, f = cleanup_directory(RESULTS_DIR, ['*.result.json', '*.error.log', '*.result_images'], dry_run)
+    # Clean results (JSON, error logs — page images are on GCS with 7-day lifecycle)
+    d, f = cleanup_directory(RESULTS_DIR, ['*.result.json', '*.error.log', '*.meta.json'], dry_run)
+    total_deleted += d
+    total_freed += f
+
+    # Clean rate limit files
+    d, f = cleanup_directory(RATE_LIMIT_DIR, ['*.json'], dry_run)
    total_deleted += d
    total_freed += f

--- a/cloudbuild.yaml
+++ b/cloudbuild.yaml
@ -0,0 +1,14 @@
+steps:
+  - name: 'gcr.io/cloud-builders/docker'
+    args:
+      - 'build'
+      - '-t'
+      - 'us-central1-docker.pkg.dev/optical-414516/pdf-accessibility/checker:latest'
+      - '-f'
+      - 'Dockerfile.cloudrun'
+      - '.'
+
+images:
+  - 'us-central1-docker.pkg.dev/optical-414516/pdf-accessibility/checker:latest'
+
+timeout: '600s'
--- a/cloudrun_service.py
+++ b/cloudrun_service.py
@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+"""
+PDF Accessibility Checker — Cloud Run HTTP Service
+
+Flask app wrapping EnterprisePDFChecker for serverless execution.
+Receives PDF via multipart POST, runs checks, uploads page images to GCS,
+returns full result JSON.
+"""
+
+import os
+import json
+import tempfile
+import logging
+from pathlib import Path
+
+from flask import Flask, request, jsonify
+from google.cloud import storage
+
+from enterprise_pdf_checker import EnterprisePDFChecker
+
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s [cloudrun] %(levelname)s: %(message)s'
+)
+logger = logging.getLogger('cloudrun')
+
+app = Flask(__name__)
+
+GCS_BUCKET_NAME = os.getenv('GCS_BUCKET_NAME', 'optical-pdf-images')
+
+
+def upload_images_to_gcs(images_dir: Path, job_id: str) -> dict:
+    """Upload page images to GCS and return {page_num: public_url} mapping."""
+    client = storage.Client()
+    bucket = client.bucket(GCS_BUCKET_NAME)
+    page_images = {}
+
+    for image_file in sorted(images_dir.glob('page_*.png')):
+        # Extract page number from filename (page_1.png -> 1)
+        page_num = int(image_file.stem.split('_')[1])
+        blob_name = f"{job_id}/{image_file.name}"
+        blob = bucket.blob(blob_name)
+        blob.upload_from_filename(str(image_file), content_type='image/png')
+        # Bucket has uniform bucket-level access with allUsers objectViewer,
+        # so objects are public by default — no need for blob.make_public()
+        public_url = f"https://storage.googleapis.com/{GCS_BUCKET_NAME}/{blob_name}"
+        page_images[page_num] = public_url
+        logger.info("Uploaded %s -> %s", image_file.name, public_url)
+
+    return page_images
+
+
+@app.route('/check', methods=['POST'])
+def check_pdf():
+    """Accept multipart PDF upload, run accessibility checks, return results."""
+    pdf_file = request.files.get('pdf')
+    if not pdf_file:
+        return jsonify({'success': False, 'error': 'No PDF file provided'}), 400
+
+    job_id = request.form.get('job_id', 'unknown')
+    quick_mode = request.form.get('quick_mode', 'false').lower() in ('true', '1', 'yes')
+    original_filename = request.form.get('original_filename', pdf_file.filename or 'document.pdf')
+
+    logger.info("Received job %s: %s (quick=%s)", job_id, original_filename, quick_mode)
+
+    tmp_pdf = None
+    images_dir = None
+
+    try:
+        # Save uploaded PDF to temp file
+        tmp_pdf = tempfile.NamedTemporaryFile(suffix='.pdf', delete=False)
+        pdf_file.save(tmp_pdf)
+        tmp_pdf.close()
+
+        # Run accessibility checks
+        config = {
+            'anthropic_api_key': os.getenv('ANTHROPIC_API_KEY'),
+            'google_api_key': os.getenv('GOOGLE_API_KEY'),
+        }
+
+        checker = EnterprisePDFChecker(tmp_pdf.name, config, quick_mode=quick_mode)
+        checker.check_all()
+
+        # Generate page images to a temp directory
+        images_dir = tempfile.mkdtemp(prefix='pdf_images_')
+        images_path = Path(images_dir)
+        checker._generate_page_images(images_path)
+
+        # Get results before uploading images (page_images has local filenames)
+        results = checker.to_dict()
+
+        # Upload images to GCS and replace local filenames with public URLs
+        if checker.page_images:
+            gcs_urls = upload_images_to_gcs(images_path, job_id)
+            results['page_images'] = gcs_urls
+
+        # Add grade based on score
+        score = results.get('accessibility_score', 0)
+        if score >= 90:
+            results['grade'] = 'A'
+        elif score >= 80:
+            results['grade'] = 'B'
+        elif score >= 70:
+            results['grade'] = 'C'
+        elif score >= 60:
+            results['grade'] = 'D'
+        else:
+            results['grade'] = 'F'
+
+        logger.info("Job %s completed: score=%s grade=%s issues=%d",
+                     job_id, results['accessibility_score'],
+                     results['grade'], results['total_issues'])
+
+        return jsonify({'success': True, 'data': results})
+
+    except Exception as e:
+        logger.error("Job %s failed: %s", job_id, str(e), exc_info=True)
+        return jsonify({'success': False, 'error': str(e)}), 500
+
+    finally:
+        # Clean up temp files
+        if tmp_pdf and os.path.exists(tmp_pdf.name):
+            os.unlink(tmp_pdf.name)
+        if images_dir and os.path.exists(images_dir):
+            import shutil
+            shutil.rmtree(images_dir, ignore_errors=True)
+
+
+@app.route('/health', methods=['GET'])
+def health():
+    return jsonify({'status': 'ok'})
+
+
+if __name__ == '__main__':
+    port = int(os.getenv('PORT', 8080))
+    app.run(host='0.0.0.0', port=port, debug=False)
--- a/docker-compose.prod.yml
+++ b/docker-compose.prod.yml
@ -1,50 +1,9 @@
-# Production Docker Compose — worker + Redis + PostgreSQL
+# Production Docker Compose — PostgreSQL only
 # Apache/Nginx on host serves PHP + frontend files natively
-# Redis on 1220, PostgreSQL on 1221 to avoid host conflicts
+# PDF processing handled by Cloud Run (no local worker)
+# PostgreSQL on 1221 to avoid host conflicts

 services:
-  worker:
-    build:
-      context: .
-      dockerfile: Dockerfile.worker
-    volumes:
-      - ${WEB_DIR:-/opt/pdf-accessibility}/uploads:${WEB_DIR:-/opt/pdf-accessibility}/uploads
-      - ${WEB_DIR:-/opt/pdf-accessibility}/results:${WEB_DIR:-/opt/pdf-accessibility}/results
-      - ./logs:/app/logs
-    depends_on:
-      redis:
-        condition: service_healthy
-      postgres:
-        condition: service_healthy
-    environment:
-      - REDIS_HOST=redis
-      - REDIS_PORT=6379
-      - DB_HOST=postgres
-      - DB_PORT=5432
-      - DB_NAME=${DB_NAME:-pdf_checker}
-      - DB_USER=${DB_USER:-pdf_checker}
-      - DB_PASSWORD=${DB_PASSWORD:-dev_password}
-      - RESULTS_DIR=${WEB_DIR:-/opt/pdf-accessibility}/results
-      - UPLOADS_DIR=${WEB_DIR:-/opt/pdf-accessibility}/uploads
-      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
-      - GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
-    deploy:
-      replicas: ${WORKER_COUNT:-2}
-    restart: unless-stopped
-
-  redis:
-    image: redis:7-alpine
-    ports:
-      - "127.0.0.1:1220:6379"
-    volumes:
-      - redis-data:/data
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 10s
-      timeout: 3s
-      retries: 3
-    restart: unless-stopped
-
  postgres:
    image: postgres:16-alpine
    ports:
@ -64,5 +23,4 @@ services:
    restart: unless-stopped

 volumes:
-  redis-data:
  pg-data:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -9,42 +9,11 @@ services:
      - pdf-uploads:/app/uploads
      - pdf-results:/app/results
    depends_on:
-      redis:
-        condition: service_healthy
      postgres:
        condition: service_healthy
    env_file: .env
    restart: unless-stopped

-  worker:
-    build:
-      context: .
-      dockerfile: Dockerfile.worker
-    volumes:
-      - pdf-uploads:/app/uploads
-      - pdf-results:/app/results
-      - pdf-logs:/app/logs
-    depends_on:
-      redis:
-        condition: service_healthy
-      postgres:
-        condition: service_healthy
-    env_file: .env
-    deploy:
-      replicas: ${WORKER_COUNT:-2}
-    restart: unless-stopped
-
-  redis:
-    image: redis:7-alpine
-    volumes:
-      - redis-data:/data
-    healthcheck:
-      test: ["CMD", "redis-cli", "ping"]
-      interval: 10s
-      timeout: 3s
-      retries: 3
-    restart: unless-stopped
-
  postgres:
    image: postgres:16-alpine
    volumes:
@ -64,6 +33,4 @@ services:
 volumes:
  pdf-uploads:
  pdf-results:
-  pdf-logs:
-  redis-data:
  pg-data:
--- a/docker-entrypoint-web.sh
+++ b/docker-entrypoint-web.sh
@ -5,6 +5,9 @@ set -e
 # By default PHP-FPM clears the environment; this disables that behavior
 echo 'clear_env = no' >> /usr/local/etc/php-fpm.d/www.conf

+# 15-minute timeout for Cloud Run PDF processing
+echo 'request_terminate_timeout = 900' >> /usr/local/etc/php-fpm.d/www.conf
+
 # Start PHP-FPM in background
 php-fpm -D

--- a/js/page-viewer.js
+++ b/js/page-viewer.js
@ -47,7 +47,13 @@ function loadVisualPage(pageNum) {

    const img = document.getElementById('pageImage');
    img.onload = () => drawMarkers(pageNum);
-    img.src = `api.php?action=image&job_id=${currentJobId}&page=${pageNum}`;
+    // Use GCS URL directly if available, otherwise fall back to api.php
+    const imageUrl = currentPageData.page_images[pageNum];
+    if (imageUrl && (imageUrl.startsWith('http://') || imageUrl.startsWith('https://'))) {
+        img.src = imageUrl;
+    } else {
+        img.src = `api.php?action=image&job_id=${currentJobId}&page=${pageNum}`;
+    }
 }

 function drawMarkers(pageNum) {
--- a/js/upload.js
+++ b/js/upload.js
@ -78,13 +78,21 @@ async function beginCheck() {
    if (quickMode) addLog('Quick mode enabled — skipping expensive checks', 'info');

    try {
-        updateProgress(30, 'Starting analysis...');
+        updateProgress(30, 'Analyzing PDF (this may take a few minutes)...');
        const result = await startCheck(currentJobId, quickMode);

        if (result.success) {
-            updateProgress(35, 'Analysis queued');
-            addLog('Job queued for processing', 'success');
-            pollJobStatus();
+            if (result.data && result.data.status === 'completed') {
+                // Synchronous Cloud Run response — results are ready
+                updateProgress(98, 'Loading results...');
+                addLog('Analysis complete!', 'success');
+                loadResults();
+            } else {
+                // Async/local mode fallback — poll for status
+                updateProgress(35, 'Analysis started');
+                addLog('Job processing...', 'success');
+                pollJobStatus();
+            }
        } else {
            addLog('Check failed: ' + result.error, 'error');
            alert('Check failed: ' + result.error);
@ -142,9 +150,9 @@ async function pollJobStatus() {
                    if (data.error_log) addLog('Error: ' + data.error_log.substring(0, 500), 'error');
                    document.getElementById('progressContainer').style.display = 'none';
                    alert('Analysis failed. Check the error log for details.');
-                } else if (pollCount > 150) {
+                } else if (pollCount > 450) {
                    clearInterval(pollInterval);
-                    addLog('Analysis timed out after 5 minutes', 'error');
+                    addLog('Analysis timed out after 15 minutes', 'error');
                    addLog('Try using Quick Mode for faster results', 'info');
                    document.getElementById('progressContainer').style.display = 'none';
                }
--- a/nginx.conf
+++ b/nginx.conf
@ -17,6 +17,10 @@ server {
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
+
+        # 15-minute timeout for Cloud Run PDF processing
+        fastcgi_read_timeout 900s;
+        fastcgi_send_timeout 900s;
    }

    # Serve page images from results
--- a/requirements-cloudrun.txt
+++ b/requirements-cloudrun.txt
@ -0,0 +1,33 @@
+# Cloud Run PDF Accessibility Checker - Python Dependencies
+
+# Core PDF processing
+pypdf>=4.0.0
+pdfplumber>=0.11.0
+
+# Image processing
+Pillow>=10.0.0
+pdf2image>=1.16.0
+
+# OCR
+pytesseract>=0.3.10
+
+# Scientific computing
+numpy>=1.24.0
+
+# NLP and readability
+textblob>=0.17.1
+
+# Google Cloud APIs
+google-cloud-vision>=3.4.0
+google-cloud-documentai>=2.20.0
+
+# Anthropic Claude API
+anthropic>=0.18.0
+
+# Additional utilities
+python-dotenv>=1.0.0
+
+# Cloud Run specific
+flask>=3.0.0
+gunicorn>=21.2.0
+google-cloud-storage>=2.14.0