Replace the Redis queue + Python worker daemon with a synchronous HTTP call to a Cloud Run service, eliminating Redis and simplifying the infrastructure from 4 containers (web, worker, redis, postgres) to just web + postgres (with Cloud Run handling processing). - Add cloudrun_service.py: Flask app wrapping EnterprisePDFChecker with POST /check and GET /health endpoints, GCS image upload - Add Dockerfile.cloudrun + requirements-cloudrun.txt for Cloud Run image - Add cloudbuild.yaml for Cloud Build with custom Dockerfile - Rewrite api.php: remove all Redis code, add Cloud Run OIDC auth (getCloudRunToken), synchronous processing in handleCheck(), file-based rate limiting, GCS redirect in handleImage(), DB helper updateJobInDatabase() - Update js/upload.js: handle synchronous completed response from Cloud Run, increase poll timeout to 15 minutes - Update js/page-viewer.js: use GCS URLs directly for page images - Simplify docker-compose.yml and docker-compose.prod.yml: remove worker and redis services - Remove PHP Redis extension from Dockerfile.web - Set 900s timeouts across nginx, PHP-FPM, gunicorn, curl, and Cloud Run - Update cleanup.py: remove result_images pattern (now on GCS), add rate_limits cleanup - Update .env.example: replace Redis vars with Cloud Run/GCS config Cloud Run service deployed to: https://pdf-checker-bcb6ipdqka-uc.a.run.app GCS bucket: gs://optical-pdf-images (7-day lifecycle, public read) GCP project: optical-414516 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
100 lines
4.4 KiB
Markdown
100 lines
4.4 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
AI-powered PDF accessibility checker that validates documents against WCAG 2.1 Level A & AA standards. Combines traditional PDF analysis (pypdf, pdfplumber) with AI models (Anthropic Claude, Google Cloud Vision) for ~95% automated WCAG coverage. Branded for "Oliver" (Montserrat font, black/#FFC407 palette).
|
|
|
|
## Commands
|
|
|
|
### Testing
|
|
```bash
|
|
source venv/bin/activate
|
|
pytest tests/ -v # Run all tests (31 tests)
|
|
pytest tests/ --cov=. --cov-report=html # With coverage report
|
|
pytest tests/test_checker.py -v # Single test file
|
|
pytest tests/ -m "not integration" # Skip integration tests
|
|
```
|
|
|
|
### Running Locally
|
|
```bash
|
|
source venv/bin/activate
|
|
php -S localhost:8000 # Start PHP dev server
|
|
```
|
|
|
|
### Docker
|
|
```bash
|
|
docker-compose up # Development stack
|
|
docker-compose -f docker-compose.prod.yml up -d # Production stack
|
|
docker-compose exec worker pytest tests/ -v # Tests in container
|
|
```
|
|
|
|
### CLI Usage
|
|
```bash
|
|
python enterprise_pdf_checker.py document.pdf --output report.json # Full check
|
|
python enterprise_pdf_checker.py document.pdf --quick # Skip AI checks
|
|
python pdf_remediation.py document.pdf --output fixed.pdf --all # Auto-remediate
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Three Interfaces
|
|
- **Web UI** (`index.html` + `js/` + `css/`) — vanilla JS, drag-drop upload, visual inspector
|
|
- **REST API** (`api.php`) — PHP endpoints: upload, check, status, result, remediate, download
|
|
- **CLI** (`enterprise_pdf_checker.py`) — direct Python execution
|
|
|
|
### Request Flow (Docker/Production)
|
|
1. `api.php` receives upload, validates via `auth.php`, saves to `uploads/`
|
|
2. Job pushed to Redis queue (`pdf:queue`) and tracked in PostgreSQL
|
|
3. `worker.py` daemon pops jobs, runs `EnterprisePDFChecker.check_all()`
|
|
4. Results written to `results/{job_id}.result.json`, DB updated
|
|
5. Client polls `api.php?action=status` then fetches results
|
|
|
|
### Key Source Files
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `enterprise_pdf_checker.py` | Core engine — 30+ WCAG checks, AI image analysis, scoring |
|
|
| `api.php` | REST API — file handling, job queue integration, CORS |
|
|
| `auth.php` | Authentication — Bearer/X-API-Key, dev mode localhost bypass |
|
|
| `worker.py` | Background daemon — Redis queue consumer, graceful shutdown |
|
|
| `db_manager.py` | PostgreSQL ORM — jobs CRUD, audit logging |
|
|
| `redis_queue.py` | Redis operations — job queue, status tracking, rate limiting |
|
|
| `pdf_remediation.py` | Auto-fix — metadata, tagging, language tags |
|
|
| `retry_helper.py` | Exponential backoff for external API calls |
|
|
| `report_generator.py` | Result formatting and report generation |
|
|
| `logger_config.py` | Structured logging with rotation (10MB max) |
|
|
| `cleanup.py` | File retention cleanup (24h for uploads/results) |
|
|
|
|
### Data Layer
|
|
- **PostgreSQL** — `jobs` table (status, score, grade, result JSON), `audit_log` table. Schema in `db/init.sql`
|
|
- **Redis** — Job queue (`pdf:queue`), status tracking (`pdf:status:*`), rate limiting (`pdf:rate:*`)
|
|
|
|
### External APIs
|
|
- **Anthropic Claude 3.5 Sonnet** — alt text validation, image classification, text-in-images
|
|
- **Google Cloud Vision** — OCR, text detection
|
|
- **veraPDF** (optional) — PDF/UA-1 compliance validation
|
|
|
|
### Frontend Structure
|
|
`js/app.js` (controller), `js/upload.js` (drag-drop), `js/api.js` (HTTP client), `js/results.js` (display), `js/page-viewer.js` (PDF inspector), `js/batch.js` (batch processing), `js/utils.js` (helpers)
|
|
|
|
## Tech Stack
|
|
- **Backend**: Python 3.11 (processing), PHP 8.2 (API)
|
|
- **Frontend**: Vanilla HTML/CSS/JS
|
|
- **Database**: PostgreSQL 16, Redis 7
|
|
- **Infrastructure**: Docker, Nginx/Apache, PHP-FPM
|
|
- **System deps**: Tesseract OCR, Poppler, Ghostscript
|
|
|
|
## Configuration
|
|
Environment variables via `.env` (see `.env.example`). Key settings:
|
|
- `ANTHROPIC_API_KEY` / `GOOGLE_API_KEY` — AI API credentials
|
|
- `DEV_MODE=true` — bypasses auth for localhost requests
|
|
- `DB_HOST`, `DB_PORT`, `REDIS_HOST`, `REDIS_PORT` — infrastructure endpoints
|
|
- Production uses ports 1220 (Redis) and 1221 (PostgreSQL) to avoid host conflicts
|
|
|
|
## Testing
|
|
- pytest with markers: `integration`, `slow`, `api`
|
|
- Config in `pytest.ini`
|
|
- Fixtures in `tests/conftest.py`
|
|
- Sample PDFs in `Test_files/`
|
|
- No linter currently configured
|