No description
Find a file
SamoilenkoVadym e9784d7da8 Phase 4 Complete: Authentication, Database, and Microsoft SSO
This commit implements a complete authentication system with local users,
session management, and Microsoft SSO support for enterprise environments.

New Files Created:
- src/database.py: SQLite database management with users, sessions, audit_log
- src/auth.py: Authentication module with login, SSO, and session management
- templates/login.html: Modern login page with SSO button

Database Schema:
- users table: username, password_hash, email, full_name, auth_method
- sessions table: session management with expiration
- audit_log table: user activity tracking
- Indexes for performance optimization

Authentication Features:
- Local authentication with test user (tester/oliveradmin)
- Password hashing with Werkzeug
- Session management with 24-hour expiration
- @login_required decorator for route protection
- Automatic session cleanup

Microsoft SSO Integration:
- MSAL library integration for Azure AD
- OAuth2 authorization code flow
- Microsoft Graph API user info retrieval
- Automatic user creation/update from SSO
- CSRF protection with state parameter
- Graceful fallback when SSO not configured

Security Improvements:
- All routes protected with @login_required
- Session-based authentication with database storage
- IP address and user agent logging
- Audit trail for user actions
- Secure session token generation

Configuration:
- Environment variables for Azure AD (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
- SECRET_KEY for Flask session encryption
- Optional MSAL dependency (SSO works only if configured)

Dependencies Added:
- Werkzeug>=3.0.0 for password hashing
- msal>=1.20.0 for Microsoft SSO (optional)

Test Credentials:
- Username: tester
- Password: oliveradmin

Phase 4 Status: Complete
Next Phase: Phase 5 (Modern UI Overhaul) for v3.1 release

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:57:47 +00:00
docs Phase 1.4: ExifTool integration for enhanced metadata support 2026-01-25 15:26:01 +00:00
src Phase 4 Complete: Authentication, Database, and Microsoft SSO 2026-01-25 15:57:47 +00:00
templates Phase 4 Complete: Authentication, Database, and Microsoft SSO 2026-01-25 15:57:47 +00:00
.gitignore Initial commit: Universal metadata tool with Excel-based lookup 2026-01-25 14:23:42 +00:00
README.md Phase 1.4: ExifTool integration for enhanced metadata support 2026-01-25 15:26:01 +00:00
requirements.txt Phase 4 Complete: Authentication, Database, and Microsoft SSO 2026-01-25 15:57:47 +00:00
run_gui.py Initial commit: Universal metadata tool with Excel-based lookup 2026-01-25 14:23:42 +00:00
web_app.py Phase 4 Complete: Authentication, Database, and Microsoft SSO 2026-01-25 15:57:47 +00:00

Oliver Metadata Tool

Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface.

Features

  • Excel-based metadata lookup: Reads metadata from "Celum ID to Adobe Asset Path Mapping Spreadsheet"
  • Multi-format support: PDF, images (JPG, PNG, etc.), Office documents (Word, Excel, PowerPoint), video files
  • Unicode support: Full support for Chinese, Japanese, Korean characters (CGA region)
  • OCR capabilities: Multi-language text extraction with Tesseract
  • Web interface: Flask-based UI for easy batch processing
  • Dual-sheet Excel lookup: Primary lookup from DSB sheet, fallback to Medsurg sheet

Requirements

  • Python 3.8+
  • Tesseract OCR (for image text extraction)
  • Poppler (for PDF processing)
  • ExifTool 12.15+ (recommended - enables 300+ file formats and improved performance)

Installation

  1. Install system dependencies:
# macOS
brew install tesseract tesseract-lang poppler exiftool

# Linux (Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils libimage-exiftool-perl

Note: ExifTool is optional but highly recommended. It provides:

  • Support for 300+ file formats
  • 10-60x faster batch operations
  • Better PDF metadata writing
  • See docs/EXIFTOOL_SETUP.md for detailed setup instructions
  1. Create virtual environment and install Python packages:
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
  1. Set up environment variables (create .env file):
UPLOAD_FOLDER=uploads
OUTPUT_FOLDER=output
TESSERACT_PATH=/opt/homebrew/bin/tesseract
OCR_LANGUAGES=eng+chi_sim+chi_tra+jpn+kor

Usage

Web Interface

python web_app.py

Open browser at http://localhost:5001

GUI Application

python run_gui.py

Excel Data Structure

The tool reads metadata from Excel file with two sheets:

Sheet 1: DSB Celum ID to Path mapping (Primary)

  • Column B: Celum ID
  • Column E: Title
  • Column F: External Description/Alt Text

Sheet 2: Medsurg Metadata Cheat (Fallback)

  • Column: Solventum DAM Asset Path (contains filename)
  • Metadata columns for Title and Description

Lookup is performed by filename (without extension), case-insensitive.

Architecture

  • web_app.py - Flask web application
  • run_gui.py - GUI launcher
  • src/ - Core modules
    • extractors/ - Content extraction for different file types
    • updaters/ - Metadata update for different file types
    • excel_metadata_lookup.py - Excel-based metadata lookup
    • main.py - Core processing logic
    • config.py - Configuration management

License

Proprietary - Solventum