No description

Find a file

SamoilenkoVadym fa2b4da2f7 Phase 2.1 & 2.2: Manual metadata editing and multiple sources Implemented manual metadata editing UI: - Added editable input fields for title (200 chars), subject (300 chars), keywords (500 chars) - Character counters with warning/danger indicators at 90%/100% - Real-time validation with visual feedback - Save and Reset buttons for each file - Individual file metadata updates via /update-manual endpoint Implemented multiple metadata sources: - Added metadata source selector dropdown (Excel, Manual, AI, Import) - Modified /upload endpoint to handle different metadata sources - Excel lookup: existing functionality (fastest) - Manual entry: empty fields for user input - AI generation: placeholder for Phase 2.3 - Import: placeholder for Phase 2.4 Technical improvements: - Session-based metadata storage for persistence - Graceful success/error feedback with visual indicators - Sanitized metadata input with length limits - Backup creation before updates Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>		2026-01-25 15:34:05 +00:00
docs	Phase 1.4: ExifTool integration for enhanced metadata support	2026-01-25 15:26:01 +00:00
src	Phase 1.4: ExifTool integration for enhanced metadata support	2026-01-25 15:26:01 +00:00
templates	Phase 2.1 & 2.2: Manual metadata editing and multiple sources	2026-01-25 15:34:05 +00:00
.gitignore	Initial commit: Universal metadata tool with Excel-based lookup	2026-01-25 14:23:42 +00:00
README.md	Phase 1.4: ExifTool integration for enhanced metadata support	2026-01-25 15:26:01 +00:00
requirements.txt	Phase 1.4: ExifTool integration for enhanced metadata support	2026-01-25 15:26:01 +00:00
run_gui.py	Initial commit: Universal metadata tool with Excel-based lookup	2026-01-25 14:23:42 +00:00
web_app.py	Phase 2.1 & 2.2: Manual metadata editing and multiple sources	2026-01-25 15:34:05 +00:00

README.md

Oliver Metadata Tool

Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface.

Features

Excel-based metadata lookup: Reads metadata from "Celum ID to Adobe Asset Path Mapping Spreadsheet"
Multi-format support: PDF, images (JPG, PNG, etc.), Office documents (Word, Excel, PowerPoint), video files
Unicode support: Full support for Chinese, Japanese, Korean characters (CGA region)
OCR capabilities: Multi-language text extraction with Tesseract
Web interface: Flask-based UI for easy batch processing
Dual-sheet Excel lookup: Primary lookup from DSB sheet, fallback to Medsurg sheet

Requirements

Python 3.8+
Tesseract OCR (for image text extraction)
Poppler (for PDF processing)
ExifTool 12.15+ (recommended - enables 300+ file formats and improved performance)

Installation

Install system dependencies:

# macOS
brew install tesseract tesseract-lang poppler exiftool

# Linux (Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils libimage-exiftool-perl

Note: ExifTool is optional but highly recommended. It provides:

Support for 300+ file formats
10-60x faster batch operations
Better PDF metadata writing
See docs/EXIFTOOL_SETUP.md for detailed setup instructions

Create virtual environment and install Python packages:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Set up environment variables (create .env file):

UPLOAD_FOLDER=uploads
OUTPUT_FOLDER=output
TESSERACT_PATH=/opt/homebrew/bin/tesseract
OCR_LANGUAGES=eng+chi_sim+chi_tra+jpn+kor

Usage

Web Interface

python web_app.py

Open browser at http://localhost:5001

GUI Application

python run_gui.py

Excel Data Structure

The tool reads metadata from Excel file with two sheets:

Sheet 1: DSB Celum ID to Path mapping (Primary)

Column B: Celum ID
Column E: Title
Column F: External Description/Alt Text

Sheet 2: Medsurg Metadata Cheat (Fallback)

Column: Solventum DAM Asset Path (contains filename)
Metadata columns for Title and Description

Lookup is performed by filename (without extension), case-insensitive.

Architecture

web_app.py - Flask web application
run_gui.py - GUI launcher
src/ - Core modules
- extractors/ - Content extraction for different file types
- updaters/ - Metadata update for different file types
- excel_metadata_lookup.py - Excel-based metadata lookup
- main.py - Core processing logic
- config.py - Configuration management

License

Proprietary - Solventum