Commit graph

18 commits

Author SHA1 Message Date
Vadym Samoilenko
dafef834d2 Fix CP14 heading detection via RoleMap + add manual pass support
- enterprise_pdf_checker.py: resolve custom tag names through PDF RoleMap
  in _check_headings so PDFs using /Heading1-style tags (mapped to /H1)
  are correctly detected; add depth guard to walk_tree
- js/results.js: add CP14 (Heading Structure) to CP_TO_CHECK; relax
  H-type restriction so M-type CPs with a linked check also get
  Mark as Passed / Undo buttons
- api.php: add 'Heading Structure' => ['14'] to $check_to_cp for
  server-side recalculate score with heading override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 13:37:19 +00:00
Vadym Samoilenko
2a8db06f0d Fix: load adjusted score when reopening document from history
handleResult() now overlays accessibility_score/wcag_compliance from
.adjusted.json (if it exists) while keeping the original severity_counts
as the recalculation baseline — prevents double-subtraction.

displayResults() auto-calls applyScoreRecalc() on load when the result
was previously adjusted, restoring the (Adjusted) label and WCAG badges
without triggering another server save.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 11:04:07 +00:00
Vadym Samoilenko
a5784feda6 Address client feedback: WCAG badges, table grouping, retention, history UX, AI prompt ethics
- Issue 1: Recompute WCAG A/AA compliance badges after dismissing issues (JS +
  backend); exported reports now reflect updated pass/fail status
- Issue 2: Group document-wide table issues into collapsible cards with
  Dismiss All button; reduces noise for multi-table documents
- Issue 3: Split cleanup retention — uploads deleted after 24h, result/meta
  JSONs retained 30 days (RESULTS_RETENTION_HOURS env var, default 720h)
- Issue 4A: Library shows adjusted score when available (.adjusted.json preferred)
- Issue 4B: History page groups documents by retention countdown (red/yellow/green
  sections); adds 30-day retention banner
- Issue 5+6: AI prompt updated — describe people by role/action not appearance,
  use specific brand names; flags images with people for human review

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 10:51:10 +00:00
Vadym Samoilenko
79aaf050bf PDF report reflects adjusted score + manual pass for Matterhorn H-type CPs
- api.php: add save_adjusted_result action that merges dismissed issues,
  check overrides and recalculated score into {job_id}.adjusted.json;
  handleExport() now prefers .adjusted.json over .result.json
- js/results.js: displayMatterhorn() shows Mark as Passed / Undo buttons
  for H-type CPs (CP04, CP13) linked to overridden checks; overrideCheck /
  unoverrideCheck refresh Matterhorn table and recompute overall banner
- js/batch.js: exportReport() saves adjusted result before opening export
  URL, using pre-opened window to avoid popup blockers
- report_generator.py: filter dismissed issues, show (Adjusted) badge,
  Manual Pass in checks and Matterhorn tables; switch generate_html() to
  Montserrat + Oliver branding (#1a1a1a header, #FFC407 skip-link)
- css/styles.css: fix dark-mode log-header from blue-ish #252840 to #242424

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 16:28:53 +00:00
Vadym Samoilenko
1126c8a700 Fix history: correct score field name + add delete button
- api.php: read accessibility_score (not score) from result.json
- api.php: handleDelete() also removes .dismissed.json, .overrides.json, .error.log
- js/app.js: add Delete button to each history row with confirm dialog
- css/styles.css: red hover style for delete button

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 15:13:30 +00:00
Vadym Samoilenko
0443cb450a history: show legacy jobs (no user_id) to authenticated users
Jobs created before user isolation was added have null user_id.
Previously they were hidden from authenticated users. Now authenticated
users see their own jobs + all legacy jobs (no user_id). Jobs belonging
to a different authenticated user are still excluded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 15:10:15 +00:00
Vadym Samoilenko
e60639c58d Add SSO user isolation and document history dashboard
- api.php: extractUserFromToken() decodes Azure AD JWT payload (oid/name/email)
- Upload: stores user_id, user_name, user_email in job .meta.json
- handleList(): filters jobs by authenticated user's oid — full user isolation
  (jobs without user_id are excluded for authenticated users to prevent leakage);
  enriches each entry with score, grade, critical/error counts from result JSON
- index.html: "My Documents" history section, shown after login
- js/app.js: showAuthenticatedUI() triggers loadHistory(); full renderHistory()
  renders sortable table with score, grade, severity badges, and Open/HTML/PDF/JSON
  action buttons; openHistoryJob() loads any past result into the results panel
- js/results.js: calls loadHistory() after displayResults() so table refreshes
  immediately after a new check completes
- css/styles.css: history table styles with colour-coded score/grade/severity badges

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 14:58:18 +00:00
Vadym Samoilenko
8b70da3584 Add "Mark as Passed" check overrides and "Recalculate Score" feature
- api.php: override_check / unoverride_check endpoints write per-job
  .overrides.json; handleResult() injects overridden_checks on reload
- js/results.js: score breakdown rows show "Mark as Passed" / "Undo"
  buttons; recalculateScore() adjusts penalty for dismissed issues and
  base score for manual overrides without mutating original data
- index.html: score display gains hidden (Adjusted) badge and
  Recalculate Score button, revealed after first check
- css/styles.css: btn-mark-passed, btn-unoverride, check-manual-pass,
  btn-recheck, score-adjusted-label styles
- js/utils.js: escapeAttr() helper for safe inline onclick values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 14:01:52 +00:00
Vadym Samoilenko
c4ffb94351 Merge Cloud Run migration; resolve handleResult() conflict
Keep dismissed_indices injection in handleResult() from our QA
fixes alongside the Cloud Run rewrite from origin/master.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:08:04 +00:00
Vadym Samoilenko
ac8aedf4a3 Implement QA report fixes: scoring, Matterhorn, dismiss, PDF report, UX
Part 1 — CSS/Contrast/Accessibility:
- Raise --text-muted contrast to WCAG AA (#696969 light, #9a9a9a dark)
- Add body font-size: 16px baseline
- Enlarge #themeToggle to 15px / 10px 20px padding

Part 2 — Start Button (user-controlled analysis):
- Upload no longer auto-starts check; shows ready state with filename/size
- New showReadyState() / removeFile() functions in upload.js
- beginCheck() now shows progress + hides ready state on click
- Add prominent "Check Another PDF" button at bottom of results

Part 3 — Scoring recalibration:
- Replace deduction formula with check-pass ratio + soft penalty (cap 20)
- Fix run_check() to only examine issues added by the current check
- Add score_breakdown (per-check table) to JSON output + results UI
- Downgrade readability ERROR → WARNING (advisory, not hard failure)

Part 4 — Auto-fix debugging:
- Remediation failure now returns up to 2000 chars of log (was 500)
- pdf_remediation.py: stderr output, sys.exit(0/1), output dir creation

Part 5 — Error location: View on Page button on each issue card

Part 6 — Matterhorn Protocol PDF/UA-1:
- _build_matterhorn_summary() maps 19 checks → 31 checkpoints
- Matterhorn card in index.html with grouped PASS/FAIL/Not-tested table
- Correct M/H badges per checkpoint

Part 7 — Dismiss / False Positive:
- dismissed_issues table in db/init.sql + dismiss/undismiss in db_manager.py
- api.php: dismiss/undismiss endpoints (file-backed), dismissed_indices
  injected into both handleStatus and handleResult responses
- results.js: dismissIssue/undismissIssue with visual strikethrough
- CSS: .dismissed, .btn-dismiss, .btn-undismiss styles

Part 8 — PDF Report (WeasyPrint):
- generate_pdf() in report_generator.py: PAC-style A4, Oliver branding
- api.php handleExport() supports format=pdf
- index.html: "PDF Report" download button in results header
- requirements.txt: weasyprint>=60.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:06:32 +00:00
michael
4080638856 Migrate PDF processing from Redis worker to Google Cloud Run
Replace the Redis queue + Python worker daemon with a synchronous HTTP
call to a Cloud Run service, eliminating Redis and simplifying the
infrastructure from 4 containers (web, worker, redis, postgres) to just
web + postgres (with Cloud Run handling processing).

- Add cloudrun_service.py: Flask app wrapping EnterprisePDFChecker with
  POST /check and GET /health endpoints, GCS image upload
- Add Dockerfile.cloudrun + requirements-cloudrun.txt for Cloud Run image
- Add cloudbuild.yaml for Cloud Build with custom Dockerfile
- Rewrite api.php: remove all Redis code, add Cloud Run OIDC auth
  (getCloudRunToken), synchronous processing in handleCheck(), file-based
  rate limiting, GCS redirect in handleImage(), DB helper updateJobInDatabase()
- Update js/upload.js: handle synchronous completed response from Cloud Run,
  increase poll timeout to 15 minutes
- Update js/page-viewer.js: use GCS URLs directly for page images
- Simplify docker-compose.yml and docker-compose.prod.yml: remove worker
  and redis services
- Remove PHP Redis extension from Dockerfile.web
- Set 900s timeouts across nginx, PHP-FPM, gunicorn, curl, and Cloud Run
- Update cleanup.py: remove result_images pattern (now on GCS), add
  rate_limits cleanup
- Update .env.example: replace Redis vars with Cloud Run/GCS config

Cloud Run service deployed to:
  https://pdf-checker-bcb6ipdqka-uc.a.run.app
GCS bucket: gs://optical-pdf-images (7-day lifecycle, public read)
GCP project: optical-414516

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 14:50:38 -06:00
Vadym Samoilenko
112719b2c5 Add Docker stack, frontend redesign, and visual page inspector fix
- Redesigned frontend with Outfit/Figtree typography, coral accent palette,
  noise texture, glassmorphism header, and staggered animations
- Split monolithic index.html into modular JS (app, api, upload, batch,
  results, page-viewer, utils) and extracted CSS
- Fixed worker.py to generate page images for Visual Page Inspector
- Added Docker Compose stack (web, worker, redis, postgres)
- Added batch upload, HTML report export, rate limiting, and Redis queue
- Extended test suite with checker, remediation, worker, and DB tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 18:12:44 +00:00
Vadym Samoilenko
ac00b1af43 Fix venv path to use relative directory reference
- Change hardcoded venv path to __DIR__ . '/venv/bin/python3'
- Makes the application portable across different installations
- Ensures Python dependencies from venv are used correctly

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-25 13:47:18 +00:00
Vadym Samoilenko
0e24602096 Add production readiness: authentication, logging, retry logic, and test suite
Phase 1: Critical bug fixes
- Fix missing os/sys imports in pdf_remediation.py (line 427 crash)
- Install Python dependencies (venv with 11 packages)
- Create runtime directories (uploads, results, .cache)
- Configure environment (.env from .env.example)

Phase 2: Production features
- Add authentication module (auth.php) with API key support
- Integrate auth into api.php with CORS headers update
- Add structured logging framework (logger_config.py) with rotation
- Add retry helper (retry_helper.py) with exponential backoff
- Apply retry decorators to AI API calls (Claude and Google Vision)
- Create comprehensive test suite (31 tests, 34% coverage)
  * Unit tests for checker and remediation
  * Integration tests for API and authentication
  * pytest configuration with coverage reporting

Documentation:
- Add requirements specifications (BRS, FRS, SAD) to docs_req/
- Add PDF-UA-1 technical background
- Add sample accessibility report

All tests passing (31/31). Ready for production deployment.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-25 13:26:02 +00:00
DJP
f93fa977ae Implement auto-fix functionality with download
FEATURE COMPLETE: One-Click Auto-Remediation 

API Endpoints:
 POST api.php?action=remediate
   - Takes job_id
   - Runs Python remediation script
   - Applies all auto-fixable issues
   - Returns download URL

 GET api.php?action=download&job_id=X&type=remediated
   - Downloads fixed PDF
   - Filename: original_name_fixed.pdf

Auto-Fixes Applied:
 Add missing document title (from filename)
 Add missing author (Unknown Author)
 Add missing subject/description
 Set document language (en-US or detected)
 Add navigation bookmarks (auto-generated)
 Mark as tagged (if structure exists)

Web Interface Flow:
1. User uploads PDF → analysis runs
2. If fixable issues found → "🔧 Auto-Fix Available" card appears
3. Shows what will be fixed with suggestions
4. User clicks " Apply Automatic Fixes"
5. API processes in background (1-2 seconds)
6. Success message with "📥 Download Fixed PDF" button
7. User downloads remediated PDF instantly

JavaScript Updates:
- applyFixes() now actually calls API
- Shows loading state during processing
- Displays success/error messages
- Download link with proper filename
- Button disabled after fix applied

PHP Updates:
- handleRemediate() - runs remediation script
- handleDownload() - serves original or remediated PDF
- Error logging to .remediation.log files
- Stores remediated PDF path in job metadata

Python Updates:
- Fixed --all flag logic
- Accepts custom metadata values
- Skips veraPDF validation when run from web (stdout check)
- Better error handling
- Preserves existing metadata

User Experience:
Before:
- See 5 issues
- Manually fix each in Adobe Acrobat (20 minutes)

After:
- See 5 issues, 3 are auto-fixable
- Click button (2 seconds)
- Download fixed PDF
- Only 2 issues left to fix manually (5 minutes)

Value: 60% time savings on common fixes!

Files Modified:
- api.php - Added remediate + download endpoints
- index.html - Working applyFixes() function
- pdf_remediation.py - Improved CLI handling

Test Files Created:
- test_auto_fixed.pdf - Example of remediated PDF
- test_fixed.pdf - Another test

Ready to use in production!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 10:17:51 -04:00
DJP
3a81d2623d Fix poppler PATH for MAMP environments
Issue: Page images weren't being generated in web interface
Cause: MAMP/PHP doesn't include /opt/homebrew/bin in PATH
Fix: Add poppler paths before executing Python script

Now page images will generate correctly and Visual Page Inspector will appear!

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-10-20 16:28:02 -04:00
DJP
59efe72607 Add Visual Page Inspector with interactive issue markers
Frontend Features:
 NEW: Visual Page Inspector component
- Display PDF pages as images with zoom controls
- SVG overlay system for precise issue highlighting
- Color-coded markers by severity (red/orange/yellow/blue)
- Numbered badges on each issue for easy reference
- Interactive hover tooltips with issue details
- Click-through to see exact locations on page

User Experience:
📄 Page selector sidebar shows all pages
- Color-coded badges indicate issue severity per page
- Click any page to view it
- Pages with no issues show in green

🔍 Zoom Controls:
- Zoom in/out buttons (50% - 300%)
- Reset to 100%
- Markers scale with zoom level

🎯 Interactive Markers:
- Dashed rectangles highlight issue locations
- Hover to see full issue description + fix recommendation
- Semi-transparent overlays don't obscure content
- Numbered circles for easy cross-reference

Backend Support:
- API endpoint: api.php?action=image&job_id=X&page=Y
- Serves PNG images with proper caching headers
- Coordinate system conversion (PDF → screen coords)

How It Works:
1. Python generates page images at 100 DPI
2. Issues with coordinates get visual markers
3. SVG overlays drawn at correct positions
4. Tooltips show on hover

Perfect for:
- Seeing exactly where image/contrast issues are
- Visual verification of accessibility problems
- Training teams on what to fix
- Before/after comparisons

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 16:01:52 -04:00
DJP
bf83a409bb Initial commit: Enterprise PDF Accessibility Checker
- Complete WCAG 2.1 accessibility checking system
- AI-powered analysis with Claude 4.5 and Google Vision
- Web interface with drag-and-drop upload
- REST API backend (PHP)
- Python checker with parallel processing
- Quick mode for fast scans (~10 seconds)
- Full mode with AI analysis (~2 minutes)
- .env file support for API keys
- Error logging and debugging tools
- Comprehensive documentation

Performance improvements:
- Parallel image processing (3x faster)
- Smart API timeouts (10s)
- Reduced DPI for faster conversions
- Real-time progress updates

🤖 Generated with Claude Code
2025-10-20 15:50:56 -04:00