# Oliver Metadata Tool Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface. ## Features - **Excel-based metadata lookup**: Reads metadata from "Celum ID to Adobe Asset Path Mapping Spreadsheet" - **Multi-format support**: PDF, images (JPG, PNG, etc.), Office documents (Word, Excel, PowerPoint), video files - **Unicode support**: Full support for Chinese, Japanese, Korean characters (CGA region) - **OCR capabilities**: Multi-language text extraction with Tesseract - **Web interface**: Flask-based UI for easy batch processing - **Dual-sheet Excel lookup**: Primary lookup from DSB sheet, fallback to Medsurg sheet ## Requirements - Python 3.8+ - Tesseract OCR (for image text extraction) - Poppler (for PDF processing) - **ExifTool 12.15+** (recommended - enables 300+ file formats and improved performance) ## Installation 1. Install system dependencies: ```bash # macOS brew install tesseract tesseract-lang poppler exiftool # Linux (Ubuntu/Debian) sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils libimage-exiftool-perl ``` **Note:** ExifTool is optional but highly recommended. It provides: - Support for 300+ file formats - 10-60x faster batch operations - Better PDF metadata writing - See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions 2. Create virtual environment and install Python packages: ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt ``` 3. Set up environment variables (create `.env` file): ``` UPLOAD_FOLDER=uploads OUTPUT_FOLDER=output TESSERACT_PATH=/opt/homebrew/bin/tesseract OCR_LANGUAGES=eng+chi_sim+chi_tra+jpn+kor ``` ## Usage ### Web Interface ```bash python web_app.py ``` Open browser at `http://localhost:5001` ### GUI Application ```bash python run_gui.py ``` ## Excel Data Structure The tool reads metadata from Excel file with two sheets: ### Sheet 1: DSB Celum ID to Path mapping (Primary) - Column B: Celum ID - Column E: Title - Column F: External Description/Alt Text ### Sheet 2: Medsurg Metadata Cheat (Fallback) - Column: Solventum DAM Asset Path (contains filename) - Metadata columns for Title and Description Lookup is performed by filename (without extension), case-insensitive. ## Architecture - `web_app.py` - Flask web application - `run_gui.py` - GUI launcher - `src/` - Core modules - `extractors/` - Content extraction for different file types - `updaters/` - Metadata update for different file types - `excel_metadata_lookup.py` - Excel-based metadata lookup - `main.py` - Core processing logic - `config.py` - Configuration management ## License Proprietary - Solventum