- enterprise_pdf_checker.py: resolve custom tag names through PDF RoleMap
in _check_headings so PDFs using /Heading1-style tags (mapped to /H1)
are correctly detected; add depth guard to walk_tree
- js/results.js: add CP14 (Heading Structure) to CP_TO_CHECK; relax
H-type restriction so M-type CPs with a linked check also get
Mark as Passed / Undo buttons
- api.php: add 'Heading Structure' => ['14'] to $check_to_cp for
server-side recalculate score with heading override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
handleResult() now overlays accessibility_score/wcag_compliance from
.adjusted.json (if it exists) while keeping the original severity_counts
as the recalculation baseline — prevents double-subtraction.
displayResults() auto-calls applyScoreRecalc() on load when the result
was previously adjusted, restoring the (Adjusted) label and WCAG badges
without triggering another server save.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
recalculateScore() was only updating the DOM — it never called
save_adjusted_result, so .adjusted.json was never written and the
library always showed the original score. Now saves automatically
after each recalculate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Issue 1: Recompute WCAG A/AA compliance badges after dismissing issues (JS +
backend); exported reports now reflect updated pass/fail status
- Issue 2: Group document-wide table issues into collapsible cards with
Dismiss All button; reduces noise for multi-table documents
- Issue 3: Split cleanup retention — uploads deleted after 24h, result/meta
JSONs retained 30 days (RESULTS_RETENTION_HOURS env var, default 720h)
- Issue 4A: Library shows adjusted score when available (.adjusted.json preferred)
- Issue 4B: History page groups documents by retention countdown (red/yellow/green
sections); adds 30-day retention banner
- Issue 5+6: AI prompt updated — describe people by role/action not appearance,
use specific brand names; flags images with people for human review
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
app-history.js was redefining toggleDarkMode() and loadTheme() with a
broken implementation (body.classList + wrong localStorage key 'darkMode')
that overrode the correct versions in utils.js. Removed duplicates so
utils.js handles both pages consistently via data-theme on :root.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- api.php: add save_adjusted_result action that merges dismissed issues,
check overrides and recalculated score into {job_id}.adjusted.json;
handleExport() now prefers .adjusted.json over .result.json
- js/results.js: displayMatterhorn() shows Mark as Passed / Undo buttons
for H-type CPs (CP04, CP13) linked to overridden checks; overrideCheck /
unoverrideCheck refresh Matterhorn table and recompute overall banner
- js/batch.js: exportReport() saves adjusted result before opening export
URL, using pre-opened window to avoid popup blockers
- report_generator.py: filter dismissed issues, show (Adjusted) badge,
Manual Pass in checks and Matterhorn tables; switch generate_html() to
Montserrat + Oliver branding (#1a1a1a header, #FFC407 skip-link)
- css/styles.css: fix dark-mode log-header from blue-ish #252840 to #242424
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- history.html: standalone page with My Documents table + auth
- js/history.js: renderHistory, loadHistory, deleteHistoryJob logic
- js/app-history.js: MSAL auth init for history.html
- index.html: remove history section, add 'My Documents' link in header
- js/app.js: show historyLink after auth, open job from ?job_id= URL param
- deploy.sh: include history.html in deploy
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- api.php: read accessibility_score (not score) from result.json
- api.php: handleDelete() also removes .dismissed.json, .overrides.json, .error.log
- js/app.js: add Delete button to each history row with confirm dialog
- css/styles.css: red hover style for delete button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Jobs created before user isolation was added have null user_id.
Previously they were hidden from authenticated users. Now authenticated
users see their own jobs + all legacy jobs (no user_id). Jobs belonging
to a different authenticated user are still excluded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alcdn.msauth.net/browser/2.38.3/js/msal-browser.min.js returns 404.
Using cdn.jsdelivr.net (npm mirror) with @azure/msal-browser@2 instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When deploy.sh runs via sudo, git tried to use root's SSH key which
doesn't have Bitbucket access. Now detects repo owner and runs git
commands as that user so the user's SSH key is used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the git fetch/reset block was commented out and the script only
deployed whatever was already in the repo dir. Uncommented it and added
git config core.fileMode false to prevent permission-drift merge conflicts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- api.php: extractUserFromToken() decodes Azure AD JWT payload (oid/name/email)
- Upload: stores user_id, user_name, user_email in job .meta.json
- handleList(): filters jobs by authenticated user's oid — full user isolation
(jobs without user_id are excluded for authenticated users to prevent leakage);
enriches each entry with score, grade, critical/error counts from result JSON
- index.html: "My Documents" history section, shown after login
- js/app.js: showAuthenticatedUI() triggers loadHistory(); full renderHistory()
renders sortable table with score, grade, severity badges, and Open/HTML/PDF/JSON
action buttons; openHistoryJob() loads any past result into the results panel
- js/results.js: calls loadHistory() after displayResults() so table refreshes
immediately after a new check completes
- css/styles.css: history table styles with colour-coded score/grade/severity badges
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import langdetect with graceful fallback if not installed
- _check_language(): detect actual document language via langdetect on first
3 pages of text; store in self._detected_lang; warn when declared /Lang tag
doesn't match detected language; suggest correct BCP-47 tag when missing
- _check_readability(): skip Flesch Reading Ease / Flesch-Kincaid (English-only
formulas) for non-English documents; long-sentence check remains language-agnostic
- _check_links(): extend unclear-link patterns to Ukrainian, Russian, German,
French, Spanish, and Polish
- requirements-cloudrun.txt: add langdetect>=1.0.9
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
.score-display had overflow:hidden which cut off the right half of the
btn-recheck button. Changed to overflow:visible — decorative ::before
and ::after pseudo-elements use position:absolute;inset:0 so they remain
visually contained within the border-radius.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pdf_remediation.py:
- _suggest_title() now detects temp filenames (tmp + random chars) and
extracts the first line of content instead of using the useless
temp name (e.g. "Tmp9H15Ocsl" → actual document text)
report_generator.py — HTML report:
- Add skip-to-main-content link (WCAG 2.4.1)
- Wrap content in <main id="main-content"> landmark
- Proper <header>/<footer> semantic elements
- <section> + aria-labelledby on each card
- Tables: <caption>, scope="col" on all <th> (WCAG 1.3.1)
- Severity badges: aria-label="Severity: X", class-based color
(not inline style) so not color-only (WCAG 1.4.1)
- Score ring: role="img" + aria-label with numeric value + grade
- Stats grid: role="group" + aria-label
- Improved contrast: stat labels #475569 not #64748b
- @media (prefers-reduced-motion) block
- Links on WCAG criterion column
report_generator.py — PDF report HTML:
- Add <title> and <meta name="description"> to <head>
- <header role="banner">, <main>, <footer> semantic elements
- Matterhorn/issues tables: <caption>, scope="col" on <th>
- Score block: role="img" + aria-label
- Stats: role="group" + aria-label
- "Not tested" text instead of "—" in status cells
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each issue card's WCAG criterion (e.g. "1.4.3") is now a link to the
WAI Understanding page at w3.org. Comma-separated multi-criteria and
PDF/UA are handled separately. Links open in a new tab.
- js/utils.js: WCAG_SLUGS map + wcagCriterionLinks() helper
- js/results.js: issue-meta now calls wcagCriterionLinks()
- css/styles.css: .wcag-link style (dotted underline, hover accent)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced full-page text scan with annotation-based extraction — now only
checks the text inside actual URI hyperlink bounding boxes, eliminating
false positives from vague words in body prose.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Page viewer:
- loadVisualPage() now accepts highlightNum; highlights marker after image onload
(was using fixed 300ms timeout which fired before GCS image finished loading)
- viewOnPage() passes markerNum directly to loadVisualPage()
Image analysis:
- Quality concerns downgraded WARNING → INFO (advisory, not WCAG violations)
- Cap at 2 concerns per image (was unlimited)
- Google Vision label detections removed — not actionable accessibility issues
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- run_check() wraps each check in a ThreadPoolExecutor future with a timeout
- Heavy checks: Image 180s, OCR 180s, Color Contrast 120s, veraPDF 120s
- Default per-check timeout: 90s
- Timed-out checks emit WARNING instead of hanging the whole request
- Cloud Run service timeout raised to 3600s (gcloud run services update)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cloud Run processes PDFs synchronously (2-5 min). The await startCheck()
call blocks JS, so progress never advanced past 30%. Add a setInterval
timer before the await that advances through realistic stages every 18s,
covering the full processing window. Timer is cleared on completion/error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep dismissed_indices injection in handleResult() from our QA
fixes alongside the Cloud Run rewrite from origin/master.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Part 1 — CSS/Contrast/Accessibility:
- Raise --text-muted contrast to WCAG AA (#696969 light, #9a9a9a dark)
- Add body font-size: 16px baseline
- Enlarge #themeToggle to 15px / 10px 20px padding
Part 2 — Start Button (user-controlled analysis):
- Upload no longer auto-starts check; shows ready state with filename/size
- New showReadyState() / removeFile() functions in upload.js
- beginCheck() now shows progress + hides ready state on click
- Add prominent "Check Another PDF" button at bottom of results
Part 3 — Scoring recalibration:
- Replace deduction formula with check-pass ratio + soft penalty (cap 20)
- Fix run_check() to only examine issues added by the current check
- Add score_breakdown (per-check table) to JSON output + results UI
- Downgrade readability ERROR → WARNING (advisory, not hard failure)
Part 4 — Auto-fix debugging:
- Remediation failure now returns up to 2000 chars of log (was 500)
- pdf_remediation.py: stderr output, sys.exit(0/1), output dir creation
Part 5 — Error location: View on Page button on each issue card
Part 6 — Matterhorn Protocol PDF/UA-1:
- _build_matterhorn_summary() maps 19 checks → 31 checkpoints
- Matterhorn card in index.html with grouped PASS/FAIL/Not-tested table
- Correct M/H badges per checkpoint
Part 7 — Dismiss / False Positive:
- dismissed_issues table in db/init.sql + dismiss/undismiss in db_manager.py
- api.php: dismiss/undismiss endpoints (file-backed), dismissed_indices
injected into both handleStatus and handleResult responses
- results.js: dismissIssue/undismissIssue with visual strikethrough
- CSS: .dismissed, .btn-dismiss, .btn-undismiss styles
Part 8 — PDF Report (WeasyPrint):
- generate_pdf() in report_generator.py: PAC-style A4, Oliver branding
- api.php handleExport() supports format=pdf
- index.html: "PDF Report" download button in results header
- requirements.txt: weasyprint>=60.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove stale Redis/worker references, add Cloud Run and rate_limits
config. Comment out git pull section for manual control.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the Redis queue + Python worker daemon with a synchronous HTTP
call to a Cloud Run service, eliminating Redis and simplifying the
infrastructure from 4 containers (web, worker, redis, postgres) to just
web + postgres (with Cloud Run handling processing).
- Add cloudrun_service.py: Flask app wrapping EnterprisePDFChecker with
POST /check and GET /health endpoints, GCS image upload
- Add Dockerfile.cloudrun + requirements-cloudrun.txt for Cloud Run image
- Add cloudbuild.yaml for Cloud Build with custom Dockerfile
- Rewrite api.php: remove all Redis code, add Cloud Run OIDC auth
(getCloudRunToken), synchronous processing in handleCheck(), file-based
rate limiting, GCS redirect in handleImage(), DB helper updateJobInDatabase()
- Update js/upload.js: handle synchronous completed response from Cloud Run,
increase poll timeout to 15 minutes
- Update js/page-viewer.js: use GCS URLs directly for page images
- Simplify docker-compose.yml and docker-compose.prod.yml: remove worker
and redis services
- Remove PHP Redis extension from Dockerfile.web
- Set 900s timeouts across nginx, PHP-FPM, gunicorn, curl, and Cloud Run
- Update cleanup.py: remove result_images pattern (now on GCS), add
rate_limits cleanup
- Update .env.example: replace Redis vars with Cloud Run/GCS config
Cloud Run service deployed to:
https://pdf-checker-bcb6ipdqka-uc.a.run.app
GCS bucket: gs://optical-pdf-images (7-day lifecycle, public read)
GCP project: optical-414516
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Font: Outfit/Figtree → Montserrat
- Accent: coral #e8553d → Oliver yellow #FFC407 with black text
- Dark mode: neutral blacks instead of blue-tinted navy
- Fix score display, stat cards, and log entries contrast in dark mode
- Replace hardcoded light-mode colors in JS with CSS variables
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mount uploads/results at the same absolute path as host so
pdf_path stored by api.php matches inside the container.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server already has Redis and PostgreSQL running. Worker uses
network_mode: host to connect directly, no port conflicts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New Features Documented:
- API authentication with key-based access control
- Structured logging framework with rotation
- Automatic retry logic for API resilience
- Comprehensive test suite (31 tests, 34% coverage)
- veraPDF integration for PDF/UA validation
- Virtual environment setup instructions
Updated Sections:
- Core capabilities list with new features
- File structure with new modules
- Installation guide with venv approach
- Testing section with pytest instructions
- Security section with authentication details
- Production features comprehensive section
- Status table with completed features
- Quick start checklist with all steps
Status: 95% production-ready, all critical fixes complete.
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
- Change hardcoded venv path to __DIR__ . '/venv/bin/python3'
- Makes the application portable across different installations
- Ensures Python dependencies from venv are used correctly
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
- Add isDevelopmentMode() function to check for localhost
- Allow localhost requests without API key in dev mode
- Enables web interface to work without auth configuration
- Production deployments still require API keys
This allows the web UI to function on localhost:8000 without
requiring developers to configure API keys for local testing.
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>