Enhanced metadata_analyzer.py with production-ready capabilities:
- Token counting with tiktoken for accurate OpenAI usage tracking
- Exponential backoff retry logic with tenacity library
- Intelligent content truncation based on token limits (not characters)
- Configurable timeout and max retries from Config
- Graceful fallback when tiktoken/tenacity unavailable
- Enhanced error reporting with _ai_error and _tokens_used metadata
Integrated AI generation in web interface:
- AI analyzer lazy initialization in web_app.py
- Real content extraction and AI analysis in upload endpoint
- Error handling for insufficient content or API failures
- Token usage logging for monitoring and optimization
UI improvements for AI experience:
- Special loading message for AI processing (10-30s per file)
- Display token usage for AI-generated metadata
- Show AI errors prominently with helpful messages
- Filter internal metadata fields (_tokens_used, _ai_error) from forms
Dependencies leveraged:
- tiktoken: Proper OpenAI token counting (10x more accurate)
- tenacity: Exponential backoff retry (3 attempts, 2-10s delays)
- openai: Production timeout support (30s default)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added ExifTool integration to support 300+ file formats with improved
performance and unified API for metadata operations.
Changes:
- Added PyExifTool>=0.5.6 to requirements.txt
- Created comprehensive ExifTool setup guide (docs/EXIFTOOL_SETUP.md)
- Created ExifToolExtractor for reading metadata from images/video/PDF
- Created ExifToolUpdater for writing metadata to images/video/PDF
- Updated README with ExifTool installation instructions
ExifTool Benefits:
- Unified API for images, videos, PDFs (vs 5+ separate libraries)
- Support for 300+ formats (HEIC, RAW, MKV, and more)
- 10-60x faster batch operations with stay_open mode
- Better PDF metadata writing (current pypdf is read-only)
- Battle-tested tool with 20+ years of development
Architecture:
- Hybrid approach: ExifTool for images/video/PDF, Python libs for Office
- Graceful fallback if ExifTool not installed
- Automatic detection on startup with helpful messages
- Tag mapping from ExifTool tags to standard fields (title/subject/keywords)
Implementation follows existing extractor/updater patterns for consistency.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive startup checks in web_app.py:
- Check for Excel file existence with helpful error message
- Validate OpenAI API key availability (optional)
- Check ExifTool installation (optional)
- Display available metadata sources based on configuration
- Updated branding in startup messages
Benefits:
- Users see clear error messages for missing dependencies
- Easy troubleshooting of configuration issues
- Graceful degradation when optional features unavailable
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added tiktoken>=0.5.0 for proper token counting
- Added tenacity>=8.2.0 for retry logic with exponential backoff
- Added openai>=1.0.0 for AI metadata generation
- Updated Flask version specification
These dependencies enable:
- Accurate token usage tracking for OpenAI API calls
- Automatic retry on API failures with smart backoff
- Production-ready AI integration
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Updated application name to "Oliver Metadata Tool"
- Updated version to 3.0.0
- Added App Info constants to config.py (APP_NAME, APP_VERSION, APP_DESCRIPTION)
- Updated web interface (title, header, footer)
- Updated README with new branding and description
- Added AI configuration settings to config.py
- Added ExifTool check method to config.py
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added Flask web interface for batch metadata processing
- Added Excel-based metadata lookup (Celum ID mapping)
- Dual-sheet support: DSB (primary) and Medsurg (fallback)
- Unicode/hieroglyph support for CGA region (Chinese, Japanese, Korean)
- Multi-format support: PDF, images, Office docs, video
- OCR with multi-language support (Tesseract)
- Filename matching without extension (case-insensitive)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>