solventum-image-metadata/README.md
SamoilenkoVadym 3deaa5ef40 Initial commit: Oliver Metadata Tool (FastAPI)
Complete Flask → FastAPI migration with:
- FastAPI app with session auth, Azure AD SSO, rate limiting
- SQLite-backed session store (survives restarts)
- Bulk AI metadata generation with SSE progress
- Admin panel (user management, audit log, AI usage)
- Subpath deployment support (ROOT_PATH config)
- Docker + deploy.sh for production deployment
- Test suite (auth, upload, templates, imports, admin, sessions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 21:23:42 +00:00

13 KiB

Oliver Metadata Tool v3.1 Enterprise Edition

Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface, user authentication, and AI-powered metadata generation.

Developer: Vadym Samoilenko License: Corporate License - Oliver Marketing Version: 3.1 (Enterprise Edition)


Features

Multiple Metadata Sources

  • 📂 File Import: Import metadata from CSV, Excel, or JSON with smart column mapping and sheet selection
  • 🤖 AI Generation: OpenAI-powered intelligent metadata generation
  • ✏️ Manual Entry: Direct editing with real-time validation
  • 📋 Templates: Reusable metadata templates with variables

Enterprise Features

  • 🔐 Authentication: Local user authentication + Microsoft SSO support
  • 👥 User Management: SQLite database for users and sessions
  • 📊 Audit Logging: Track all user actions and metadata changes
  • 🔍 AI Usage Tracking: Monitor OpenAI token usage and costs

File Support

  • 300+ File Formats via ExifTool integration
  • PDF Files: Full metadata support (title, subject, keywords, author, copyright)
  • Images: JPEG, PNG, GIF, HEIC, TIFF, RAW formats
  • Office Documents: Word, Excel, PowerPoint
  • Video Files: MP4, MOV, AVI, MKV
  • Unicode Support: Full support for Chinese, Japanese, Korean characters

Advanced Capabilities

  • Smart Field Mapping: Auto-detect columns with fuzzy matching
  • Batch Processing: Process multiple files with selective updates
  • Custom Metadata Fields: Add unlimited custom fields
  • CSV Export: Export metadata and processing results
  • Template Variables: {filename}, {date}, {user}, custom variables

Requirements

System Dependencies

  • Python 3.8+
  • ExifTool 12.15+ (required for 300+ format support)
  • Tesseract OCR (optional - for image text extraction)
  • Poppler (optional - for PDF content extraction)

Python Dependencies

All listed in requirements.txt:

  • Flask 2.3.0+ (Web framework)
  • pandas, openpyxl (Excel/CSV processing)
  • PyExifTool 0.5.6+ (Metadata operations)
  • openai 1.0.0+ (AI generation)
  • tiktoken 0.5.0+ (Token counting)
  • tenacity 8.2.0+ (Retry logic)
  • msal (Microsoft SSO - optional)

Installation

1. Install System Dependencies

macOS:

brew install exiftool tesseract tesseract-lang poppler

Linux (Ubuntu/Debian):

sudo apt-get install libimage-exiftool-perl tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils

Windows:

# Install ExifTool from: https://exiftool.org/
choco install exiftool tesseract

Verify ExifTool Installation:

exiftool -ver
# Should show version 12.15 or higher

See docs/EXIFTOOL_SETUP.md for detailed setup instructions.

2. Create Virtual Environment

python3 -m venv venv_local
source venv_local/bin/activate  # On Windows: venv_local\Scripts\activate

3. Install Python Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file in the project root:

# Required: OpenAI API Key (for AI metadata generation)
OPENAI_API_KEY=your-openai-api-key-here

# Optional: Microsoft SSO (for enterprise authentication)
# AZURE_CLIENT_ID=your-azure-client-id
# AZURE_CLIENT_SECRET=your-azure-client-secret
# AZURE_TENANT_ID=your-azure-tenant-id
# REDIRECT_URI=http://localhost:5001/auth/callback

# Optional: Flask secret key (auto-generated if not set)
# SECRET_KEY=your-secret-key-here

# Optional: AI settings (defaults shown)
# AI_MODEL=gpt-4o-mini
# MAX_TOKENS=500
# TEMPERATURE=0.5
# API_TIMEOUT=30
# API_MAX_RETRIES=3

5. Initialize Database

The database will be created automatically on first run. To manually initialize:

python -c "from src.database import Database; db = Database(); print('Database initialized')"

Quick Start with Docker

# Build and start
docker-compose up -d

# Or use the helper script
./docker-run.sh build
./docker-run.sh start

# Access at http://localhost:5001

Benefits:

  • No manual dependency installation
  • Consistent environment across systems
  • Persistent data storage via volumes
  • Easy updates and rollbacks
  • Production-ready configuration

See DOCKER.md for complete Docker deployment guide.


Usage

Starting the Web Application

Local Development:

python web_app.py

Docker:

docker-compose up -d

The application will:

  1. Check for ExifTool availability
  2. Initialize SQLite database (users, sessions, audit_log)
  3. Start Flask server on http://localhost:5001
  4. 🌐 Open browser automatically (local mode only)

Login

Test Account:

  • Username: tester
  • Password: oliveradmin

Microsoft SSO (if configured):

  • Click "Sign in with Microsoft" button
  • Authenticate via Azure AD
  • Users auto-created on first login

Using Metadata Sources

1. Import from File

  1. Select "Import from File (CSV/Excel/JSON)" from metadata source dropdown (default)
  2. Click "Choose File" and select your metadata file
  3. Configure mapping modal:
    • For Excel files: Select sheet name
    • Map columns: Filename (required), Title, Description, Keywords
    • Auto-detection suggests best matches
    • Preview first 3 rows
  4. Confirm mapping
  5. Upload files to process - tool matches files by filename

2. AI Generation

  1. Select "AI Generation" from metadata source dropdown
  2. Upload files
  3. AI generates metadata (10-30 seconds per file)
  4. Review and edit generated metadata
  5. Save changes

3. Manual Entry

  1. Select "Manual Entry"
  2. Upload files
  3. Fill in metadata fields manually
  4. Save changes

4. Templates

  1. Create template with variables
  2. Select template from dropdown
  3. Apply to selected files
  4. Review and save

Batch Operations

  1. Upload multiple files
  2. Use checkboxes to select files
  3. "Select All" / "Deselect All" buttons
  4. Edit metadata individually
  5. Click "Update Selected Files" to save all at once
  6. Export results to CSV

Configuration

Database Schema

Users Table:

  • id, username, password_hash, email, full_name
  • auth_method (local/sso)
  • created_at, last_login, is_active

Sessions Table:

  • session_id, user_id, created_at, expires_at
  • ip_address, user_agent

Audit Log Table:

  • id, user_id, action, details, timestamp

AI Usage Tracking

Every AI metadata generation is logged with:

  • User ID
  • Timestamp
  • Tokens used (prompt + completion)
  • Cost estimate (based on gpt-4o-mini pricing)

View logs in database:

SELECT * FROM audit_log WHERE action = 'ai_generation' ORDER BY timestamp DESC;

User Management

Create New User:

from src.database import Database
db = Database()
db.create_user(
    username='newuser',
    password='password123',
    email='user@example.com',
    full_name='New User',
    auth_method='local'
)

List All Users:

users = db.get_all_users()
for user in users:
    print(f"{user['username']} - Last login: {user['last_login']}")

Architecture

File Structure

oliver-metadata-tool/
├── web_app.py              # Flask web application (main entry point)
├── requirements.txt        # Python dependencies
├── .env                    # Environment configuration
├── oliver_metadata.db      # SQLite database (auto-created)
├── src/
│   ├── config.py           # Configuration management
│   ├── database.py         # Database operations
│   ├── auth.py             # Authentication logic
│   ├── metadata_analyzer.py    # AI metadata generation
│   ├── metadata_importer.py    # Import from files
│   ├── template_manager.py     # Template system
│   ├── field_mapper.py         # Column mapping
│   ├── excel_metadata_lookup.py # Excel lookup
│   ├── extractors/
│   │   ├── pdf_extractor.py
│   │   ├── image_extractor.py
│   │   ├── office_extractor.py
│   │   ├── video_extractor.py
│   │   └── exiftool_extractor.py
│   └── updaters/
│       ├── pdf_updater.py
│       ├── image_updater.py
│       ├── office_updater.py
│       ├── video_updater.py
│       └── exiftool_updater.py
├── templates/
│   ├── index.html          # Main UI
│   └── login.html          # Login page
└── docs/
    └── EXIFTOOL_SETUP.md   # ExifTool setup guide

Technology Stack

  • Backend: Flask (Python)
  • Database: SQLite
  • Frontend: HTML5, CSS3, JavaScript (Vanilla)
  • Design: Montserrat font, Dark & Gold theme
  • Authentication: Flask-Session, werkzeug.security, MSAL
  • AI: OpenAI API (gpt-4o-mini)
  • Metadata: PyExifTool, pypdf, python-docx, openpyxl

API Endpoints

Authentication

  • GET /login - Login page
  • POST /login - Authenticate user
  • GET /logout - Destroy session
  • GET /login/microsoft - Microsoft SSO redirect
  • GET /auth/callback - SSO callback

File Operations

  • POST /upload - Upload files and generate metadata
  • POST /update-manual - Update file metadata manually
  • GET /download/<filename> - Download processed file

Metadata Sources

  • POST /upload-excel - Upload Excel file for mapping
  • POST /preview-excel-sheet - Preview Excel sheet structure
  • POST /configure-excel-mapping - Configure Excel column mapping
  • POST /import-metadata - Upload import file for mapping
  • POST /configure-import-mapping - Configure import column mapping

Templates

  • GET /templates/list - List all templates
  • POST /templates/save - Save new template
  • POST /templates/load - Load template by name
  • DELETE /templates/delete - Delete template
  • POST /templates/apply - Apply template to files
  • POST /templates/preview - Preview template output

Security & Privacy

Authentication

  • Passwords hashed with werkzeug.security (pbkdf2:sha256)
  • Session tokens: 32-byte cryptographically secure random strings
  • Sessions expire after 24 hours
  • Microsoft SSO via OAuth2 + Azure AD

Data Protection

  • All credentials stored in .env (excluded from git)
  • Database file excluded from git
  • API keys never logged or exposed to frontend
  • Audit trail for all user actions

Production Recommendations

  1. HTTPS: Use SSL/TLS certificates in production
  2. Database: Migrate to PostgreSQL for better concurrency
  3. Rate Limiting: Add rate limits to prevent abuse
  4. CSRF Protection: Enable Flask-WTF for form security
  5. Error Tracking: Integrate Sentry or similar service
  6. Backups: Regular database backups
  7. Monitoring: Track AI token usage for cost management

Troubleshooting

Common Issues

ExifTool not found:

# Verify installation
exiftool -ver

# macOS: Reinstall with Homebrew
brew reinstall exiftool

# Linux: Reinstall with apt
sudo apt-get install --reinstall libimage-exiftool-perl

Database locked error:

# Stop all instances
lsof -ti:5001 | xargs kill -9

# Restart application
python web_app.py

OpenAI API errors:

Import failed - column not found:

  • Use the mapping modal to manually select columns
  • Check that your file has headers in the first row
  • Verify file encoding is UTF-8

Development

Running Tests

# Unit tests (if implemented)
pytest tests/

# Manual integration test
python -c "from src.database import Database; from src.config import Config; print('✅ All imports successful')"

Git Workflow

# Check status
git status

# Add changes
git add .

# Commit with message
git commit -m "Your commit message"

# Push to remote
git push origin main

License & Credits

License: Corporate License - Oliver Marketing All rights reserved. Unauthorized copying, distribution, or modification is prohibited.

Developer: Vadym Samoilenko Company: Oliver Marketing Version: 3.1 Enterprise Edition Release Date: January 2026

Third-Party Software:

  • ExifTool by Phil Harvey (Perl Artistic License)
  • Flask by Pallets (BSD License)
  • OpenAI API (Commercial License)
  • PyExifTool (LGPL License)

Support

For issues, questions, or feature requests:

  • Internal Support: Contact IT department
  • Developer: Vadym Samoilenko
  • Documentation: See docs/ folder

Changelog

v3.1 (January 2026) - Enterprise Edition

  • User authentication (local + Microsoft SSO)
  • SQLite database with audit logging
  • Unified import from file (CSV/Excel/JSON) with smart column mapping
  • Excel sheet selection and preview
  • Custom metadata fields support
  • AI usage tracking and cost monitoring
  • Dark & Gold UI redesign
  • Template variables and preview
  • Batch selection and CSV export
  • Consolidated metadata sources (removed redundant Excel Lookup)

v3.0 (January 2026)

  • ExifTool integration (300+ formats)
  • Multiple metadata sources (Import, AI, Manual)
  • Field mapping with fuzzy matching
  • Metadata templates system
  • Rebranded to Oliver Metadata Tool

v2.x (Prior)

  • Basic Excel lookup functionality
  • Multi-format file support
  • Web interface