# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview AI-powered PDF accessibility checker that validates documents against WCAG 2.1 Level A & AA standards. Combines traditional PDF analysis (pypdf, pdfplumber) with AI models (Anthropic Claude, Google Cloud Vision) for ~95% automated WCAG coverage. Branded for "Oliver" (Montserrat font, black/#FFC407 palette). ## Commands ### Testing ```bash source venv/bin/activate pytest tests/ -v # Run all tests (31 tests) pytest tests/ --cov=. --cov-report=html # With coverage report pytest tests/test_checker.py -v # Single test file pytest tests/ -m "not integration" # Skip integration tests ``` ### Running Locally ```bash source venv/bin/activate php -S localhost:8000 # Start PHP dev server ``` ### Docker ```bash docker-compose up # Development stack docker-compose -f docker-compose.prod.yml up -d # Production stack docker-compose exec worker pytest tests/ -v # Tests in container ``` ### CLI Usage ```bash python enterprise_pdf_checker.py document.pdf --output report.json # Full check python enterprise_pdf_checker.py document.pdf --quick # Skip AI checks python pdf_remediation.py document.pdf --output fixed.pdf --all # Auto-remediate ``` ## Architecture ### Three Interfaces - **Web UI** (`index.html` + `js/` + `css/`) — vanilla JS, drag-drop upload, visual inspector - **REST API** (`api.php`) — PHP endpoints: upload, check, status, result, remediate, download - **CLI** (`enterprise_pdf_checker.py`) — direct Python execution ### Request Flow (Docker/Production) 1. `api.php` receives upload, validates via `auth.php`, saves to `uploads/` 2. Job pushed to Redis queue (`pdf:queue`) and tracked in PostgreSQL 3. `worker.py` daemon pops jobs, runs `EnterprisePDFChecker.check_all()` 4. Results written to `results/{job_id}.result.json`, DB updated 5. Client polls `api.php?action=status` then fetches results ### Key Source Files | File | Purpose | |------|---------| | `enterprise_pdf_checker.py` | Core engine — 30+ WCAG checks, AI image analysis, scoring | | `api.php` | REST API — file handling, job queue integration, CORS | | `auth.php` | Authentication — Bearer/X-API-Key, dev mode localhost bypass | | `worker.py` | Background daemon — Redis queue consumer, graceful shutdown | | `db_manager.py` | PostgreSQL ORM — jobs CRUD, audit logging | | `redis_queue.py` | Redis operations — job queue, status tracking, rate limiting | | `pdf_remediation.py` | Auto-fix — metadata, tagging, language tags | | `retry_helper.py` | Exponential backoff for external API calls | | `report_generator.py` | Result formatting and report generation | | `logger_config.py` | Structured logging with rotation (10MB max) | | `cleanup.py` | File retention cleanup (24h for uploads/results) | ### Data Layer - **PostgreSQL** — `jobs` table (status, score, grade, result JSON), `audit_log` table. Schema in `db/init.sql` - **Redis** — Job queue (`pdf:queue`), status tracking (`pdf:status:*`), rate limiting (`pdf:rate:*`) ### External APIs - **Anthropic Claude 3.5 Sonnet** — alt text validation, image classification, text-in-images - **Google Cloud Vision** — OCR, text detection - **veraPDF** (optional) — PDF/UA-1 compliance validation ### Frontend Structure `js/app.js` (controller), `js/upload.js` (drag-drop), `js/api.js` (HTTP client), `js/results.js` (display), `js/page-viewer.js` (PDF inspector), `js/batch.js` (batch processing), `js/utils.js` (helpers) ## Tech Stack - **Backend**: Python 3.11 (processing), PHP 8.2 (API) - **Frontend**: Vanilla HTML/CSS/JS - **Database**: PostgreSQL 16, Redis 7 - **Infrastructure**: Docker, Nginx/Apache, PHP-FPM - **System deps**: Tesseract OCR, Poppler, Ghostscript ## Configuration Environment variables via `.env` (see `.env.example`). Key settings: - `ANTHROPIC_API_KEY` / `GOOGLE_API_KEY` — AI API credentials - `DEV_MODE=true` — bypasses auth for localhost requests - `DB_HOST`, `DB_PORT`, `REDIS_HOST`, `REDIS_PORT` — infrastructure endpoints - Production uses ports 1220 (Redis) and 1221 (PostgreSQL) to avoid host conflicts ## Testing - pytest with markers: `integration`, `slow`, `api` - Config in `pytest.ini` - Fixtures in `tests/conftest.py` - Sample PDFs in `Test_files/` - No linter currently configured