- docling removed: PDF now parsed by PyMuPDF (fitz), PPTX by python-pptx
- layoutparser removed: already optional with graceful fallback (returns [])
- torch/pytorch index removed: no longer needed by any dependency
- pymupdf added: ~20MB wheel, no ML deps, faster than docling for text extraction
- All existing DOCX parsing kept (python-docx, already working)
- extract_text_from_image_via_vision() unchanged (Gemini API)
Result: api/worker Docker image ~3-4GB lighter, no NVIDIA libs on CPU server
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>