- Import langdetect with graceful fallback if not installed - _check_language(): detect actual document language via langdetect on first 3 pages of text; store in self._detected_lang; warn when declared /Lang tag doesn't match detected language; suggest correct BCP-47 tag when missing - _check_readability(): skip Flesch Reading Ease / Flesch-Kincaid (English-only formulas) for non-English documents; long-sentence check remains language-agnostic - _check_links(): extend unclear-link patterns to Ukrainian, Russian, German, French, Spanish, and Polish - requirements-cloudrun.txt: add langdetect>=1.0.9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
34 lines
539 B
Text
34 lines
539 B
Text
# Cloud Run PDF Accessibility Checker - Python Dependencies
|
|
|
|
# Core PDF processing
|
|
pypdf>=4.0.0
|
|
pdfplumber>=0.11.0
|
|
|
|
# Image processing
|
|
Pillow>=10.0.0
|
|
pdf2image>=1.16.0
|
|
|
|
# OCR
|
|
pytesseract>=0.3.10
|
|
|
|
# Scientific computing
|
|
numpy>=1.24.0
|
|
|
|
# NLP and readability
|
|
textblob>=0.17.1
|
|
|
|
# Google Cloud APIs
|
|
google-cloud-vision>=3.4.0
|
|
google-cloud-documentai>=2.20.0
|
|
|
|
# Anthropic Claude API
|
|
anthropic>=0.18.0
|
|
|
|
# Additional utilities
|
|
python-dotenv>=1.0.0
|
|
|
|
# Cloud Run specific
|
|
flask>=3.0.0
|
|
gunicorn>=21.2.0
|
|
google-cloud-storage>=2.14.0
|
|
langdetect>=1.0.9
|