19 KiB
Third-Party Tool Integration Options
Executive Summary
Instead of building screen reader and keyboard testing from scratch, here are the best tools to integrate, ranked by value, cost, and ease of integration.
🏆 Top Recommendations (Best ROI)
1. veraPDF - FREE ✅ BEST OPTION
What it is: Open-source PDF/UA validation engine License: GPL/MPL (Free for commercial use) Language: Java (has CLI)
What it adds to our tool:
- ✅ Complete PDF/UA (ISO 14289) validation
- ✅ Structure tree validation (headings, reading order)
- ✅ Tag hierarchy checking
- ✅ Accessibility tree inspection
- ✅ Reading order verification
- ✅ Semantic structure validation
- ✅ FREE - no API costs!
Integration method:
# Call veraPDF CLI from Python
result = subprocess.run([
'verapdf',
'--flavour', 'ua1', # PDF/UA standard
'--format', 'json',
pdf_file
], capture_output=True)
validation_results = json.loads(result.stdout)
What we get:
{
"compliant": false,
"errors": [
"Figure element missing alt text on page 3",
"Heading hierarchy skip: H1 to H3 without H2",
"Table missing TH elements for headers",
"Reading order not defined for multi-column layout"
]
}
Effort to integrate: 1-2 days Cost: $0 (open source) Value: ⭐⭐⭐⭐⭐ (Adds 30-40% more coverage)
Website: https://verapdf.org/ GitHub: https://github.com/veraPDF/veraPDF-library
2. PAC (PDF Accessibility Checker) - FREE ⚠️ GOOD BUT LIMITED
What it is: Free PDF/UA checker by PDF/UA Foundation License: Free (closed source) Platform: Windows only (no CLI, has GUI)
What it adds:
- ✅ PDF/UA validation
- ✅ Screen reader preview mode
- ✅ Tag structure viewer
- ✅ Reading order checker
- ⚠️ Windows only
- ⚠️ No API/CLI (GUI only)
Integration challenges:
- ❌ No command-line interface
- ❌ No API
- ❌ Must automate GUI (fragile)
- ❌ Windows-only (you're on Mac)
Effort to integrate: 1-2 weeks (GUI automation) Cost: $0 Value: ⭐⭐ (Not worth automation effort)
Recommendation: Use manually, don't integrate
Website: https://pdfua.foundation/en/pdf-accessibility-checker-pac
3. PDFix SDK - COMMERCIAL 💰 POWERFUL BUT EXPENSIVE
What it is: Commercial SDK for PDF accessibility and remediation
License: Commercial ($)
Language: C++ with Python bindings
What it adds:
- ✅ Full structure tree parsing
- ✅ Reading order detection
- ✅ Auto-tagging capabilities
- ✅ Tag editing/remediation
- ✅ Accessibility API
- ✅ Cross-platform (Mac, Windows, Linux)
Pricing:
- Startup: $499/month
- Professional: $999/month
- Enterprise: $2,499/month
Integration method:
import pdfix
# Initialize
pdfix_lib = pdfix.GetPdfix()
doc = pdfix_lib.OpenDoc(pdf_path)
# Get accessibility tree
struct_tree = doc.GetStructTree()
for element in struct_tree.GetChildren():
print(f"{element.GetType()}: {element.GetTitle()}")
Effort to integrate: 3-5 days Cost: $500-2,500/month Value: ⭐⭐⭐⭐ (Very powerful but expensive)
Website: https://pdfix.net/
4. axe-core (Deque Systems) - FREE/COMMERCIAL ❌ NOT FOR PDFs
What it is: Leading web accessibility testing library License: MPL 2.0 (Free) + Commercial support
Why it doesn't work:
- ❌ Designed for HTML/web, not PDFs
- ❌ Can't parse PDF structure
- ❌ Can't test PDF-specific issues
Recommendation: Great for web apps, not applicable here
5. Adobe Acrobat Pro SDK - COMMERCIAL 💰 POSSIBLE BUT COMPLEX
What it is: Adobe's official PDF SDK License: Commercial (complex licensing) Language: C++ (with COM interfaces)
What it could add:
- ✅ Full accessibility checking
- ✅ Tag tree manipulation
- ✅ Reading order validation
- ✅ Industry standard (Adobe is the authority)
Problems:
- 💰 Expensive licensing (~$10K+ setup)
- 🔧 Complex integration (C++ COM interfaces)
- 📚 Steep learning curve
- ⚠️ Requires Acrobat Pro installation
- 🐌 Slow (launches full Acrobat)
Effort to integrate: 4-6 weeks Cost: $10K+ license + dev time Value: ⭐⭐⭐ (Powerful but overkill)
Recommendation: Only for enterprise clients with budget
6. NVDA API Integration - FREE ⚠️ WINDOWS ONLY
What it is: Open-source screen reader with Python API License: GPL (Free) Platform: Windows only
What it could do:
- ✅ Actually run NVDA programmatically
- ✅ Capture screen reader output
- ✅ Test real SR behavior
Integration approach:
# Use NVDA's Python API (Windows only)
import nvdaController
nvdaController.speakText("Test")
output = nvdaController.getLastSpokenText()
Problems:
- ❌ Windows only (you're on Mac)
- ❌ Requires NVDA installed on server
- ❌ GUI automation (flaky)
- ❌ Slow (1-2 minutes per PDF)
- ❌ Can't run headless
Effort to integrate: 2-3 weeks Cost: $0 Value: ⭐⭐ (Platform limited)
Recommendation: Not worth it for Mac-based system
📊 Comparison Matrix
| Tool | Cost | Effort | Value | Platform | API | Our Use Case |
|---|---|---|---|---|---|---|
| veraPDF | $0 | 2 days | ⭐⭐⭐⭐⭐ | All | CLI ✅ | BEST - Add structure validation |
| PAC | $0 | 2 weeks | ⭐⭐ | Windows | No ❌ | Skip - manual only |
| PDFix SDK | $500-2K/mo | 5 days | ⭐⭐⭐⭐ | All | Yes ✅ | Good if budget allows |
| Acrobat SDK | $10K+ | 6 weeks | ⭐⭐⭐ | All | COM | Overkill |
| NVDA API | $0 | 3 weeks | ⭐⭐ | Windows | Limited | Skip - wrong platform |
| axe-core | $0 | N/A | N/A | Web | N/A | Not for PDFs |
🎯 My Strong Recommendation: veraPDF
Why veraPDF is Perfect:
1. It's FREE and Open Source
- No licensing costs
- Active community
- Well-maintained
- Industry standard for PDF/UA
2. Excellent Coverage
- ✅ Structure tree validation
- ✅ Heading hierarchy checking
- ✅ Reading order verification
- ✅ Tag structure correctness
- ✅ Table header validation
- ✅ Alt text presence (not quality)
- ✅ Form field labels
3. Easy Integration
- Simple CLI interface
- JSON output (parse easily)
- Works on Mac, Windows, Linux
- No GUI needed (headless)
- Fast (2-3 seconds per PDF)
4. Fills Our Gaps Our tool checks: Images (AI), Contrast, Readability, OCR veraPDF checks: Structure, Tags, Reading Order, PDF/UA compliance
Together = 60-70% total WCAG coverage!
🚀 Integration Plan: veraPDF
Step 1: Install veraPDF (5 minutes)
# Mac (Homebrew)
brew install verapdf
# Or download from website
wget https://software.verapdf.org/releases/verapdf-installer.zip
unzip verapdf-installer.zip
./verapdf-install
Step 2: Test It (5 minutes)
# Run validation
verapdf --flavour ua1 --format json test.pdf > validation.json
# Check output
cat validation.json | jq '.compliant'
Step 3: Integrate into Python (2 hours)
def run_verapdf_validation(pdf_path: str) -> Dict:
"""Run veraPDF validation and parse results"""
result = subprocess.run([
'verapdf',
'--flavour', 'ua1', # PDF/UA-1 standard
'--format', 'json',
pdf_path
], capture_output=True, text=True, timeout=30)
data = json.loads(result.stdout)
# Parse validation results
is_compliant = data['compliant']
validation_errors = []
for report in data.get('report', {}).get('details', []):
for rule in report.get('rules', []):
if rule['status'] == 'failed':
validation_errors.append({
'clause': rule['clause'],
'description': rule['description'],
'page': rule.get('page', None)
})
return {
'compliant': is_compliant,
'errors': validation_errors,
'total_errors': len(validation_errors)
}
Step 4: Add to Web Interface (4 hours)
// Add new section to results
if (data.verapdf_results) {
html += `
<div class="card">
<h2>📋 PDF/UA Validation (veraPDF)</h2>
<div>
Compliance: ${data.verapdf_results.compliant ? '✅ PASS' : '❌ FAIL'}
</div>
<div>
${data.verapdf_results.errors.map(error => `
<div class="issue ERROR">
${error.description}
<div>Clause: ${error.clause}</div>
</div>
`).join('')}
</div>
</div>
`;
}
Step 5: Update Scoring (1 hour)
# Add veraPDF errors to scoring
score -= verapdf_error_count * 5 # Each PDF/UA error = -5 points
Total integration time: 1 day Cost: $0 Value added: +30-40% more issues detected!
📋 What veraPDF Catches That We Don't
Structure Issues:
- ✅ Heading hierarchy skips (H1 → H3 without H2)
- ✅ Missing alt text in structure tree (we suggest, it validates)
- ✅ Table headers not properly marked
- ✅ List structure incorrect
- ✅ Reading order undefined
- ✅ Required tags missing
Technical Issues:
- ✅ PDF/UA compliance violations
- ✅ Incorrect tag nesting
- ✅ Missing role mappings
- ✅ Artifact tagging errors
- ✅ Structure tree corruption
Form Issues:
- ✅ Form fields missing TU (tooltip) - we check this too, but veraPDF more thorough
- ✅ Form field role errors
- ✅ Form not in tab order
💰 Alternative: Commercial Options (If Budget Exists)
PDFix SDK - $499/month (Best Commercial Option)
When to use:
- Need auto-remediation (fix issues automatically)
- Want to tag untagged PDFs
- Need structure tree editing
- Have budget for enterprise solution
What you get:
- Everything veraPDF has
- PLUS: Auto-tagging
- PLUS: Remediation tools
- PLUS: Structure editing API
- PLUS: Commercial support
ROI Calculation:
Cost: $500/month = $6K/year
Benefit: Auto-tag PDFs (saves 30 min per PDF @ $50/hr = $25/PDF)
Break-even: 240 PDFs/year (20/month)
If processing >20 PDFs/month → worth it
If processing <20 PDFs/month → use veraPDF free
CommonLook PDF - $1,295/year
What it is: Desktop PDF remediation software with API Platform: Windows only
What it adds:
- ✅ Visual tag editor
- ✅ Reading order tool
- ✅ Auto-tagging
- ✅ Batch processing
- ⚠️ GUI-based (harder to integrate)
- ⚠️ Windows only
Integration: Medium (2-3 weeks via GUI automation) Value: ⭐⭐⭐ (Good for manual workflow, not automated)
Website: https://commonlook.com/
Adobe Acrobat Pro DC - $239.88/year
What it is: Industry standard PDF editor API: Limited (PDF Services API available)
What it adds:
- ✅ Full accessibility checker
- ✅ Reading order tool
- ✅ Tag editor
- ✅ Most trusted solution
- ⚠️ Expensive at scale
- ⚠️ GUI-based
- ⚠️ Slow to automate
Integration: Complex (GUI automation or paid API) Cost: $20/month + API costs Value: ⭐⭐⭐ (Great manually, hard to automate)
🔧 For Keyboard/Focus Testing
No Good Automated Options Exist
Why:
- Keyboard behavior is interactive (requires PDF reader)
- Each PDF reader handles keyboard differently
- Must test in actual application
- Automation is brittle and slow
Best approach:
- ✅ Check tab order programmatically (we can build this - 1 day)
- ✅ Validate focus indicators exist (check PDF structure)
- ❌ Manual testing for actual keyboard navigation (15 minutes per PDF)
Recommendation: Document keyboard test procedure, don't automate
📊 Integration Priority Ranking
Tier 1: Integrate NOW (High Value, Low Cost)
1. veraPDF - FREE ⭐⭐⭐⭐⭐
- Time: 1 day integration
- Cost: $0
- Value: +40% coverage
- Status: STRONGLY RECOMMEND
2. Build Tab Order Validator ⭐⭐⭐⭐
- Time: 1 day
- Cost: $0
- Value: Catches common form issues
- Status: RECOMMEND
Tier 2: Consider if Budget Allows
3. PDFix SDK - $499/month ⭐⭐⭐⭐
- When: Processing >20 PDFs/month
- Why: Auto-remediation saves time
- ROI: Positive if volume is high
Tier 3: Skip (Not Worth It)
4. PAC - Free but no API
- Use manually for verification
- Don't integrate (GUI automation not worth it)
5. Adobe Acrobat SDK - Too expensive/complex
- $10K+ setup
- 6+ weeks integration
- Use Acrobat manually instead
6. NVDA/JAWS APIs - Platform specific
- Won't work on Mac
- Slow and brittle
- Manual testing better
🎯 My Recommended Integration Stack
Phase 1: Add veraPDF (Week 1)
What we build:
def enhanced_check(pdf_path):
# Our existing checks
our_results = run_our_checks(pdf_path)
# Add veraPDF validation
verapdf_results = run_verapdf_validation(pdf_path)
# Merge results
combined_score = calculate_combined_score(our_results, verapdf_results)
return {
'our_checks': our_results,
'structure_validation': verapdf_results,
'combined_score': combined_score,
'total_issues': our_results.issues + verapdf_results.errors
}
New web interface section:
╔═══════════════════════════════════════════╗
║ PDF/UA Structure Validation (veraPDF) ║
╠═══════════════════════════════════════════╣
║ ✅ PDF/UA-1 Compliant ║
║ ║
║ Structure Issues Found: 5 ║
║ ├─ ❌ Heading skip: H1 → H3 on page 2 ║
║ ├─ ❌ Table missing headers on page 5 ║
║ ├─ ⚠️ Figure #3 missing alt text ║
║ ├─ ⚠️ Reading order not set (page 8) ║
║ └─ ℹ️ List not marked as <L> element ║
╚═══════════════════════════════════════════╝
Benefits:
- Free
- Fast (1-2 seconds)
- Catches structure issues we miss
- Industry-standard validation
- Easy to integrate
Phase 2: Build Tab Order Validator (Week 2)
What we build:
def check_tab_order(pdf):
"""Validate form field tab order"""
fields = extract_form_fields(pdf)
issues = []
for page_num, page_fields in group_by_page(fields):
# Get visual positions
positions = [(f.x, f.y, f.name) for f in page_fields]
# Get tab order
tab_order = [f.tab_index for f in page_fields]
# Check for issues
if not all(tab_order):
issues.append(f"Page {page_num}: Some fields missing tab order")
# Check if tab order matches visual order (top-to-bottom, left-to-right)
expected_order = sort_by_visual_position(positions)
actual_order = sort_by_tab_index(page_fields)
if expected_order != actual_order:
issues.append(f"Page {page_num}: Tab order doesn't match visual layout")
return issues
Value: Catches common form accessibility issues
💡 What This Achieves
Coverage After Integration:
| Check Type | Before | After veraPDF | After Tab Order |
|---|---|---|---|
| Our Checks | 24% | 24% | 24% |
| Structure (veraPDF) | 0% | +30% | +30% |
| Tab Order | 0% | 0% | +5% |
| TOTAL COVERAGE | 24% | 54% | 59% |
What Still Requires Manual:
- ❌ Alt text quality (is it accurate?)
- ❌ Content clarity (is text understandable?)
- ❌ Actual keyboard testing (does Tab work?)
- ❌ Screen reader testing (does it sound right?)
- ❌ Subjective judgment (is this appropriate?)
= Still 41% requires human review
💰 Cost Analysis
Option A: veraPDF Only (FREE)
- Integration time: 1-2 days
- Ongoing cost: $0
- Coverage: 24% → 54% (+30%)
- ROI: EXCELLENT
Option B: veraPDF + Tab Order (FREE)
- Integration time: 2-3 days
- Ongoing cost: $0
- Coverage: 24% → 59% (+35%)
- ROI: EXCELLENT
Option C: veraPDF + PDFix SDK ($500/mo)
- Integration time: 1 week
- Ongoing cost: $6K/year
- Coverage: 24% → 65% (+41%)
- ROI: Good if processing >20 PDFs/month
Option D: Build Screen Reader Simulator (FREE)
- Development time: 3-4 days
- Ongoing cost: $0
- Coverage: 24% → 35% (+11% - reading order preview)
- ROI: Good for UX, medium for coverage
🏆 Final Recommendation
Implement This Week:
1. Integrate veraPDF (1-2 days) - FREE ✅
- Adds structure tree validation
- PDF/UA compliance checking
- Heading hierarchy validation
- Reading order verification
- No brainer - do this!
2. Build Tab Order Validator (1 day) - FREE ✅
- Check form field tab indices
- Detect illogical tab sequences
- Quick win for form-heavy PDFs
- Worth building
Consider Later:
3. Build Screen Reader Simulator (3-4 days) - FREE 🤔
- Shows what SR would announce
- Great UX feature
- Educational value
- Nice to have, not critical
4. PDFix SDK ($500/month) - PAID 💰
- Only if processing >30 PDFs/month
- Only if need auto-remediation
- Not needed yet
Don't Bother:
5. PAC Integration - Too hard to automate (GUI only) 6. Acrobat SDK - Too expensive and complex 7. NVDA API - Wrong platform (Windows only)
🎯 Action Plan
This Week:
- ✅ Integrate veraPDF (I can do this in 1-2 days)
- ✅ Build tab order validator (I can do this in 1 day)
Result:
- Coverage: 24% → 59% (+35%)
- Cost: $0
- Time: 3 days
- Huge value add!
Next Month: 3. 🤔 Consider building Screen Reader Simulator (optional) 4. 🤔 Evaluate PDFix SDK if volume increases
❓ What Should I Do?
Recommended approach:
Option A: Integrate veraPDF NOW ✅
- I can integrate it in 1-2 days
- FREE
- Massive coverage boost (+30%)
- Industry-standard validation
Option B: Wait and evaluate
- Keep tool as-is
- Use PAC/Acrobat manually for structure checks
Option C: Build Screen Reader Simulator
- 3-4 days development
- Great UX feature
- Medium coverage improvement
🚀 My Suggestion:
Let me integrate veraPDF this week!
It will add:
- ✅ Structure tree validation
- ✅ Heading hierarchy checking
- ✅ Reading order verification
- ✅ PDF/UA compliance
- ✅ Tag structure validation
- ✅ 30% more coverage
- ✅ $0 cost
Then we'll have ~60% total WCAG coverage which is genuinely enterprise-grade!
Want me to integrate veraPDF? It's the best bang-for-buck improvement we can make! 🎯