solventum-image-metadata/docs/EXIFTOOL_SETUP.md
SamoilenkoVadym ae19179752 Phase 1.4: ExifTool integration for enhanced metadata support
Added ExifTool integration to support 300+ file formats with improved
performance and unified API for metadata operations.

Changes:
- Added PyExifTool>=0.5.6 to requirements.txt
- Created comprehensive ExifTool setup guide (docs/EXIFTOOL_SETUP.md)
- Created ExifToolExtractor for reading metadata from images/video/PDF
- Created ExifToolUpdater for writing metadata to images/video/PDF
- Updated README with ExifTool installation instructions

ExifTool Benefits:
- Unified API for images, videos, PDFs (vs 5+ separate libraries)
- Support for 300+ formats (HEIC, RAW, MKV, and more)
- 10-60x faster batch operations with stay_open mode
- Better PDF metadata writing (current pypdf is read-only)
- Battle-tested tool with 20+ years of development

Architecture:
- Hybrid approach: ExifTool for images/video/PDF, Python libs for Office
- Graceful fallback if ExifTool not installed
- Automatic detection on startup with helpful messages
- Tag mapping from ExifTool tags to standard fields (title/subject/keywords)

Implementation follows existing extractor/updater patterns for consistency.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:26:01 +00:00

243 lines
5.7 KiB
Markdown

# ExifTool Setup Guide
ExifTool is a powerful command-line application for reading, writing, and editing metadata in a wide variety of files. Oliver Metadata Tool uses ExifTool to provide enhanced metadata support for 300+ file formats.
## Why ExifTool?
- **Unified API**: Single tool handles images, videos, PDFs, and more
- **300+ formats**: Support for virtually all media file types
- **Better performance**: Optimized batch operations (10-60x faster)
- **Battle-tested**: 20+ years of development and widespread use
- **PDF writing support**: Can write PDF metadata (unlike pypdf)
## Installation
### macOS
```bash
brew install exiftool
```
Verify installation:
```bash
exiftool -ver
# Should show version 12.15 or higher
```
### Linux (Ubuntu/Debian)
```bash
sudo apt-get update
sudo apt-get install libimage-exiftool-perl
```
Verify installation:
```bash
exiftool -ver
```
### Linux (Fedora/RHEL/CentOS)
```bash
sudo yum install perl-Image-ExifTool
```
### Windows
**Option 1: Chocolatey**
```powershell
choco install exiftool
```
**Option 2: Manual installation**
1. Download from https://exiftool.org/
2. Extract the `.zip` file
3. Rename `exiftool(-k).exe` to `exiftool.exe`
4. Add the directory to your PATH
Verify installation:
```powershell
exiftool -ver
```
## Verification
After installation, verify ExifTool is accessible:
```bash
# Check version
exiftool -ver
# Check location
which exiftool # macOS/Linux
where exiftool # Windows
# Test with a file
exiftool your-image.jpg
```
## What Oliver Metadata Tool Uses ExifTool For
### Supported Operations
1. **Images (JPEG, PNG, GIF, TIFF, HEIC, RAW formats)**
- Read/write Title, Description, Keywords
- Access EXIF, IPTC, XMP metadata
- Support for camera metadata
2. **Videos (MP4, MOV, AVI, MKV)**
- Read/write Title, Description, Keywords
- QuickTime metadata support
- Unified API across formats
3. **PDFs**
- Read/write PDF metadata fields
- Better than pypdf for metadata writing
- Preserves document structure
### Format Coverage
ExifTool provides support for these additional formats beyond Python libraries:
- **Images**: HEIC, CR2, NEF, ARW, DNG (RAW formats)
- **Video**: MKV, WebM, FLV, WMV (extended video formats)
- **Audio**: MP3, FLAC, WAV, OGG (audio files)
- **Documents**: EPUB, MOBI (ebook formats)
- **3D/CAD**: STL, DWG, DXF
- And 250+ more formats
## PyExifTool Python Wrapper
Oliver Metadata Tool uses the PyExifTool library to interact with ExifTool from Python:
```python
from exiftool import ExifToolHelper
# Read metadata
with ExifToolHelper() as et:
metadata = et.get_metadata(["image.jpg"])
print(metadata[0])
# Write metadata
with ExifToolHelper() as et:
et.set_tags(
["image.jpg"],
tags={"EXIF:ImageDescription": "New Title"},
params=["-overwrite_original"]
)
```
### Batch Mode Performance
PyExifTool uses ExifTool's `-stay_open` mode, which keeps one ExifTool process running for multiple operations:
- **Single file operations**: ~50-100ms overhead
- **Batch operations (100 files)**: 10-60x faster than spawning new processes
- **Memory efficient**: One process handles all operations
## Troubleshooting
### ExifTool not found
**Error:** `ExifTool not found` or `exiftool command not available`
**Solution:**
1. Install ExifTool using the instructions above
2. Restart your terminal/command prompt
3. Verify with `exiftool -ver`
4. If still not found, check your PATH environment variable
### Permission denied
**Error:** `Permission denied when executing exiftool`
**Solution (macOS/Linux):**
```bash
chmod +x /path/to/exiftool
```
### PyExifTool import error
**Error:** `ModuleNotFoundError: No module named 'exiftool'`
**Solution:**
```bash
pip install PyExifTool>=0.5.6
```
### Encoding issues with Unicode filenames
ExifTool handles Unicode filenames natively. If you encounter issues:
1. Ensure your terminal supports UTF-8
2. Use the PyExifTool wrapper (handles encoding automatically)
3. Check file system supports Unicode filenames
## Performance Tips
### Use batch mode for multiple files
```python
# Good: Process multiple files in one batch
with ExifToolHelper() as et:
et.set_tags(
["file1.jpg", "file2.jpg", "file3.jpg"],
tags={"EXIF:ImageDescription": "Title"},
params=["-overwrite_original"]
)
# Avoid: Processing files one at a time
for file in files:
with ExifToolHelper() as et:
et.set_tags([file], tags={...})
```
### Use specific tag names
```python
# Good: Specific tag queries
et.get_tags(["image.jpg"], tags=["EXIF:ImageDescription", "XMP:Title"])
# Slower: Extract all tags
et.get_metadata(["image.jpg"]) # Returns 100+ tags
```
### Skip unnecessary tags with -fast
For read-only operations where you only need basic metadata:
```python
et.execute("-fast", "-json", "image.jpg")
```
## Integration with Oliver Metadata Tool
Oliver Metadata Tool automatically detects ExifTool and uses it when available:
1. **On startup**: Checks for ExifTool installation
2. **Hybrid approach**: Uses ExifTool for images/video/PDF, Python libraries for Office docs
3. **Graceful fallback**: Falls back to pure Python if ExifTool unavailable
### Check ExifTool status
```python
from src.config import Config
if Config.check_exiftool():
print("ExifTool available")
else:
print("Using Python libraries")
```
## References
- [ExifTool Official Website](https://exiftool.org/)
- [ExifTool Documentation](https://exiftool.org/exiftool_pod.html)
- [PyExifTool GitHub](https://github.com/sylikc/pyexiftool)
- [PyExifTool Documentation](https://sylikc.github.io/pyexiftool/)
- [Supported File Types](https://exiftool.org/#supported)
- [Tag Names Reference](https://exiftool.org/TagNames/)
## License
ExifTool is free software licensed under the Perl Artistic License or GPL version 1 or later.