Replace the Redis queue + Python worker daemon with a synchronous HTTP call to a Cloud Run service, eliminating Redis and simplifying the infrastructure from 4 containers (web, worker, redis, postgres) to just web + postgres (with Cloud Run handling processing). - Add cloudrun_service.py: Flask app wrapping EnterprisePDFChecker with POST /check and GET /health endpoints, GCS image upload - Add Dockerfile.cloudrun + requirements-cloudrun.txt for Cloud Run image - Add cloudbuild.yaml for Cloud Build with custom Dockerfile - Rewrite api.php: remove all Redis code, add Cloud Run OIDC auth (getCloudRunToken), synchronous processing in handleCheck(), file-based rate limiting, GCS redirect in handleImage(), DB helper updateJobInDatabase() - Update js/upload.js: handle synchronous completed response from Cloud Run, increase poll timeout to 15 minutes - Update js/page-viewer.js: use GCS URLs directly for page images - Simplify docker-compose.yml and docker-compose.prod.yml: remove worker and redis services - Remove PHP Redis extension from Dockerfile.web - Set 900s timeouts across nginx, PHP-FPM, gunicorn, curl, and Cloud Run - Update cleanup.py: remove result_images pattern (now on GCS), add rate_limits cleanup - Update .env.example: replace Redis vars with Cloud Run/GCS config Cloud Run service deployed to: https://pdf-checker-bcb6ipdqka-uc.a.run.app GCS bucket: gs://optical-pdf-images (7-day lifecycle, public read) GCP project: optical-414516 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29 lines
890 B
Text
29 lines
890 B
Text
FROM python:3.11-slim
|
|
|
|
# Install system dependencies for PDF processing
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
tesseract-ocr \
|
|
tesseract-ocr-eng \
|
|
poppler-utils \
|
|
ghostscript \
|
|
libgl1 \
|
|
libglib2.0-0 \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
WORKDIR /app
|
|
|
|
# Install Python dependencies
|
|
COPY requirements-cloudrun.txt .
|
|
RUN pip install --no-cache-dir -r requirements-cloudrun.txt
|
|
|
|
# Copy application code (no worker, redis_queue, or db_manager)
|
|
COPY cloudrun_service.py .
|
|
COPY enterprise_pdf_checker.py .
|
|
COPY pdf_remediation.py .
|
|
COPY logger_config.py .
|
|
COPY retry_helper.py .
|
|
|
|
# Cloud Run sets $PORT; gunicorn binds to it
|
|
# --workers 1 --threads 1: Cloud Run concurrency=1, one request at a time
|
|
# --timeout 900: allow up to 15 minutes for large PDFs
|
|
CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 --timeout 900 cloudrun_service:app
|