Initial commit: Oliver Metadata Tool (FastAPI)

Complete Flask → FastAPI migration with: - FastAPI app with session auth, Azure AD SSO, rate limiting - SQLite-backed session store (survives restarts) - Bulk AI metadata generation with SSE progress - Admin panel (user management, audit log, AI usage) - Subpath deployment support (ROOT_PATH config) - Docker + deploy.sh for production deployment - Test suite (auth, upload, templates, imports, admin, sessions) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 21:23:42 +00:00 · 2026-02-09 21:23:42 +00:00 · 3deaa5ef40
commit 3deaa5ef40
82 changed files with 15590 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,29 @@
+# Solventum Image Metadata Tool — Environment Configuration
+# Copy this file to .env and fill in your secrets:
+#   cp .env.example .env
+
+# === Required ===
+# Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
+SECRET_KEY=CHANGE_ME_GENERATE_A_RANDOM_KEY
+DOCKER_MODE=true
+# Subpath prefix (must match Apache reverse proxy config, no trailing slash)
+ROOT_PATH=/solventum-image-metadata
+
+# === Azure AD / SSO ===
+AZURE_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385
+AZURE_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
+AZURE_CLIENT_SECRET=YOUR_AZURE_CLIENT_SECRET_HERE
+# Must match Azure AD App Registration > Authentication > Redirect URIs exactly
+REDIRECT_URI=https://ai-sandbox.oliver.solutions/solventum-image-metadata/auth/callback
+
+# === OpenAI (optional — for AI metadata generation) ===
+OPENAI_API_KEY=
+
+# === Admin ===
+# This email will be auto-created as admin on first startup (SSO login)
+SUPERADMIN_EMAIL=vadymsamoilenko@oliver.agency
+
+# === Options ===
+ENABLE_TEST_USER=false
+HTTPS_ONLY=true
+DEBUG=false
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,105 @@
+# These are some examples of commonly ignored file patterns.
+# You should customize this list as applicable to your project.
+# Learn more about .gitignore:
+#     https://www.atlassian.com/git/tutorials/saving-changes/gitignore
+
+# Node artifact files
+node_modules/
+dist/
+
+# Compiled Java class files
+*.class
+
+# Compiled Python bytecode
+*.py[cod]
+
+# Log files
+*.log
+
+# Package files
+*.jar
+
+# Maven
+target/
+dist/
+
+# JetBrains IDE
+.idea/
+
+# Unit test reports
+TEST*.xml
+
+# Generated by MacOS
+.DS_Store
+
+# Generated by Windows
+Thumbs.db
+
+# Applications
+*.app
+*.exe
+*.war
+
+# Large media files
+*.mp4
+*.tiff
+*.avi
+*.flv
+*.mov
+*.wmv
+
+# Python virtual environments
+venv/
+venv_new/
+venv_local/
+env/
+ENV/
+.venv/
+
+# Python cache
+__pycache__/
+*.pyc
+
+# Environment variables
+.env
+.env.local
+
+# Excel files with data
+*.xlsx
+*.xls
+
+# Uploads and output directories
+uploads/
+output/
+Files/
+
+# IDE
+.vscode/
+.claude/
+CLAUDE.md
+
+# Database files
+*.db
+*.sqlite
+*.sqlite3
+
+# Server files
+server.pid
+server.log
+nohup.out
+
+# Test files
+test_*.csv
+test_*.xlsx
+test_*.json
+TEST_REPORT.md
+
+# Docker
+.dockerignore
+docker-compose.override.yml
+
+# Backup files
+*.tar.gz
+*.zip
+backup-*/
+
--- a/DOCKER.md
+++ b/DOCKER.md
@ -0,0 +1,385 @@
+# Docker Deployment Guide
+
+Complete guide for deploying Oliver Metadata Tool using Docker.
+
+## Prerequisites
+
+- Docker 20.10+ installed
+- Docker Compose 2.0+ installed
+- 2GB+ available disk space
+- Network access for pulling base images
+
+## Quick Start
+
+### 1. Build and Start
+
+```bash
+# Using docker-compose directly
+docker-compose up -d
+
+# Or using the helper script
+./docker-run.sh build
+./docker-run.sh start
+```
+
+### 2. Access Application
+
+Open browser at: **http://localhost:5001**
+
+Default credentials:
+- Username: `tester`
+- Password: `oliveradmin`
+
+### 3. View Logs
+
+```bash
+# Using docker-compose
+docker-compose logs -f
+
+# Or using the helper script
+./docker-run.sh logs
+```
+
+## Configuration
+
+### Environment Variables
+
+Create `.env` file in project root (optional):
+
+```env
+# Required for AI metadata generation
+OPENAI_API_KEY=your-openai-api-key-here
+
+# Optional: AI Configuration
+AI_MODEL=gpt-4o-mini
+MAX_TOKENS=500
+TEMPERATURE=0.5
+
+# Optional: Microsoft SSO
+AZURE_CLIENT_ID=your-azure-client-id
+AZURE_CLIENT_SECRET=your-azure-client-secret
+AZURE_TENANT_ID=your-azure-tenant-id
+REDIRECT_URI=http://localhost:5001/auth/callback
+
+# Optional: Flask secret key
+SECRET_KEY=your-secret-key-here
+```
+
+### Docker Compose Configuration
+
+The `docker-compose.yml` file includes:
+
+- **Port mapping**: `5001:5001`
+- **Persistent volumes**:
+  - `uploads:/app/uploads` - Temporary file uploads
+  - `database:/app/data` - SQLite database
+  - `output:/app/output` - Processed files, backups, reports
+- **Auto-restart**: Container restarts unless explicitly stopped
+- **Health checks**: Every 30 seconds
+
+## Management Commands
+
+### Using docker-run.sh Script
+
+```bash
+# Build image
+./docker-run.sh build
+
+# Start application
+./docker-run.sh start
+
+# Stop application
+./docker-run.sh stop
+
+# Restart application
+./docker-run.sh restart
+
+# View logs
+./docker-run.sh logs
+
+# Show status
+./docker-run.sh status
+
+# Clean up (removes data!)
+./docker-run.sh clean
+```
+
+### Using Docker Compose Directly
+
+```bash
+# Build image
+docker-compose build
+
+# Start in background
+docker-compose up -d
+
+# Start with logs
+docker-compose up
+
+# Stop
+docker-compose down
+
+# Restart
+docker-compose restart
+
+# View logs
+docker-compose logs -f
+
+# Check status
+docker-compose ps
+
+# Remove containers and volumes (deletes data!)
+docker-compose down -v
+```
+
+## Data Persistence
+
+### Volumes
+
+Three Docker volumes persist data between container restarts:
+
+1. **uploads** - `/app/uploads`
+   - Temporary file uploads during processing
+   - Cleared when files are downloaded
+
+2. **database** - `/app/data`
+   - SQLite database (`oliver_metadata.db`)
+   - User accounts, sessions, audit logs
+
+3. **output** - `/app/output`
+   - Processed files
+   - Backups
+   - Reports
+   - Templates
+
+### Backup Data
+
+```bash
+# Backup database
+docker-compose exec oliver-metadata tar -czf /tmp/backup.tar.gz /app/data
+docker cp oliver-metadata-tool:/tmp/backup.tar.gz ./backup-$(date +%Y%m%d).tar.gz
+
+# Or backup entire volumes
+docker run --rm -v oliver-metadata_database:/data -v $(pwd):/backup alpine tar -czf /backup/database-backup.tar.gz -C /data .
+```
+
+### Restore Data
+
+```bash
+# Stop container
+docker-compose down
+
+# Remove old volume
+docker volume rm oliver-metadata_database
+
+# Recreate volume and restore
+docker run --rm -v oliver-metadata_database:/data -v $(pwd):/backup alpine tar -xzf /backup/database-backup.tar.gz -C /data
+
+# Start container
+docker-compose up -d
+```
+
+## Troubleshooting
+
+### Container won't start
+
+```bash
+# Check logs
+docker-compose logs
+
+# Check if port is in use
+lsof -i :5001
+
+# Rebuild image
+docker-compose build --no-cache
+```
+
+### Permission issues
+
+```bash
+# Check volume permissions
+docker-compose exec oliver-metadata ls -la /app/uploads /app/data /app/output
+
+# Fix permissions (if needed)
+docker-compose exec oliver-metadata chown -R root:root /app/uploads /app/data /app/output
+```
+
+### Database locked errors
+
+```bash
+# Stop container
+docker-compose down
+
+# Start with fresh database
+docker volume rm oliver-metadata_database
+docker-compose up -d
+```
+
+### ExifTool not found
+
+ExifTool is installed in the Docker image. Verify:
+
+```bash
+docker-compose exec oliver-metadata exiftool -ver
+```
+
+Should output version 12.15+
+
+### Memory issues
+
+Increase Docker memory allocation:
+- Docker Desktop → Settings → Resources → Memory
+- Recommended: 2GB minimum, 4GB+ for large batches
+
+## Production Deployment
+
+### Security Recommendations
+
+1. **Change default credentials**
+   - Create new users via web interface
+   - Disable or remove test account
+
+2. **Use environment variables**
+   - Never commit `.env` to git
+   - Use secrets management (Docker secrets, Kubernetes secrets)
+
+3. **Enable HTTPS**
+   - Use reverse proxy (nginx, Traefik, Caddy)
+   - Terminate SSL at proxy level
+
+4. **Set custom secret key**
+   ```env
+   SECRET_KEY=$(openssl rand -hex 32)
+   ```
+
+5. **Limit file upload size**
+   - Default: 500MB
+   - Adjust via nginx/proxy if needed
+
+### Reverse Proxy Example (nginx)
+
+```nginx
+server {
+    listen 80;
+    server_name metadata.example.com;
+
+    location / {
+        proxy_pass http://localhost:5001;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+
+        # Increase timeouts for large file uploads
+        proxy_read_timeout 300;
+        proxy_connect_timeout 300;
+        proxy_send_timeout 300;
+    }
+}
+```
+
+### Resource Limits
+
+Add to `docker-compose.yml`:
+
+```yaml
+services:
+  oliver-metadata:
+    # ... existing config ...
+    deploy:
+      resources:
+        limits:
+          cpus: '2.0'
+          memory: 4G
+        reservations:
+          cpus: '1.0'
+          memory: 2G
+```
+
+## System Requirements
+
+### Container Resources
+
+- **CPU**: 1-2 cores (AI generation can use more)
+- **Memory**: 2GB minimum, 4GB recommended
+- **Disk**: 5GB+ (depends on file volume)
+
+### Host Requirements
+
+- **OS**: Linux, macOS, Windows with WSL2
+- **Docker**: 20.10+
+- **Architecture**: x86_64/amd64 (ARM64 may work but untested)
+
+## Updates
+
+### Update to latest version
+
+```bash
+# Pull latest code
+git pull origin main
+
+# Rebuild image
+docker-compose build
+
+# Restart containers
+docker-compose up -d
+```
+
+### Update Python dependencies
+
+```bash
+# Rebuild without cache
+docker-compose build --no-cache
+
+# Restart
+docker-compose up -d
+```
+
+## Monitoring
+
+### Health Checks
+
+Built-in health check runs every 30 seconds:
+
+```bash
+# Check health status
+docker ps
+
+# View health check logs
+docker inspect oliver-metadata-tool | jq '.[0].State.Health'
+```
+
+### Resource Usage
+
+```bash
+# Real-time stats
+docker stats oliver-metadata-tool
+
+# Container info
+docker inspect oliver-metadata-tool
+```
+
+## Support
+
+For issues or questions:
+1. Check logs: `docker-compose logs -f`
+2. Verify configuration: `docker-compose config`
+3. Test connection: `curl http://localhost:5001/login`
+4. Open GitHub issue with logs and configuration
+
+## FAQ
+
+**Q: Can I change the port?**
+A: Yes, edit `docker-compose.yml` port mapping: `"8080:5001"`
+
+**Q: Does this work on ARM (Apple Silicon)?**
+A: Should work but untested. Try building with `--platform linux/arm64`
+
+**Q: How do I use my own database?**
+A: Mount external database file as volume: `./my-db.db:/app/data/oliver_metadata.db`
+
+**Q: Can I run multiple instances?**
+A: Yes, change port mapping and container name in docker-compose.yml for each instance
+
+**Q: Does it support S3 storage?**
+A: Not yet, but you can mount S3 as volume using FUSE/s3fs
--- a/64
+++ b/64
@ -0,0 +1,64 @@
+# Oliver Metadata Tool - Docker Image
+# Multi-stage build for optimized image size
+
+FROM python:3.11-slim as base
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    # ExifTool - critical for metadata operations (300+ formats)
+    libimage-exiftool-perl \
+    # Tesseract OCR with CJK language support
+    tesseract-ocr \
+    tesseract-ocr-eng \
+    tesseract-ocr-chi-sim \
+    tesseract-ocr-chi-tra \
+    tesseract-ocr-jpn \
+    tesseract-ocr-kor \
+    # Poppler for PDF to image conversion
+    poppler-utils \
+    # FFmpeg for video processing
+    ffmpeg \
+    # curl for health check
+    curl \
+    # Build dependencies
+    gcc \
+    && rm -rf /var/lib/apt/lists/*
+
+# Verify ExifTool installation
+RUN exiftool -ver
+
+# Copy requirements first for better layer caching
+COPY requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY . .
+
+# Create necessary directories
+RUN mkdir -p /app/uploads /app/output /app/data /app/templates_saved
+
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV DOCKER_MODE=true
+
+# Expose port
+EXPOSE 5001
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
+    CMD curl -sf http://localhost:5001/login || exit 1
+
+# Run application with gunicorn + uvicorn workers
+CMD ["gunicorn", "app.main:app", \
+     "--worker-class", "uvicorn.workers.UvicornWorker", \
+     "--workers", "2", \
+     "--bind", "0.0.0.0:5001", \
+     "--timeout", "120", \
+     "--graceful-timeout", "30", \
+     "--access-logfile", "-", \
+     "--error-logfile", "-"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,515 @@
+# Oliver Metadata Tool v3.1 Enterprise Edition
+
+Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface, user authentication, and AI-powered metadata generation.
+
+**Developer:** Vadym Samoilenko
+**License:** Corporate License - Oliver Marketing
+**Version:** 3.1 (Enterprise Edition)
+
+---
+
+## Features
+
+### Multiple Metadata Sources
+- **📂 File Import**: Import metadata from CSV, Excel, or JSON with smart column mapping and sheet selection
+- **🤖 AI Generation**: OpenAI-powered intelligent metadata generation
+- **✏️ Manual Entry**: Direct editing with real-time validation
+- **📋 Templates**: Reusable metadata templates with variables
+
+### Enterprise Features
+- **🔐 Authentication**: Local user authentication + Microsoft SSO support
+- **👥 User Management**: SQLite database for users and sessions
+- **📊 Audit Logging**: Track all user actions and metadata changes
+- **🔍 AI Usage Tracking**: Monitor OpenAI token usage and costs
+
+### File Support
+- **300+ File Formats** via ExifTool integration
+- **PDF Files**: Full metadata support (title, subject, keywords, author, copyright)
+- **Images**: JPEG, PNG, GIF, HEIC, TIFF, RAW formats
+- **Office Documents**: Word, Excel, PowerPoint
+- **Video Files**: MP4, MOV, AVI, MKV
+- **Unicode Support**: Full support for Chinese, Japanese, Korean characters
+
+### Advanced Capabilities
+- **Smart Field Mapping**: Auto-detect columns with fuzzy matching
+- **Batch Processing**: Process multiple files with selective updates
+- **Custom Metadata Fields**: Add unlimited custom fields
+- **CSV Export**: Export metadata and processing results
+- **Template Variables**: {filename}, {date}, {user}, custom variables
+
+---
+
+## Requirements
+
+### System Dependencies
+- **Python 3.8+**
+- **ExifTool 12.15+** (required for 300+ format support)
+- **Tesseract OCR** (optional - for image text extraction)
+- **Poppler** (optional - for PDF content extraction)
+
+### Python Dependencies
+All listed in `requirements.txt`:
+- Flask 2.3.0+ (Web framework)
+- pandas, openpyxl (Excel/CSV processing)
+- PyExifTool 0.5.6+ (Metadata operations)
+- openai 1.0.0+ (AI generation)
+- tiktoken 0.5.0+ (Token counting)
+- tenacity 8.2.0+ (Retry logic)
+- msal (Microsoft SSO - optional)
+
+---
+
+## Installation
+
+### 1. Install System Dependencies
+
+**macOS:**
+```bash
+brew install exiftool tesseract tesseract-lang poppler
+```
+
+**Linux (Ubuntu/Debian):**
+```bash
+sudo apt-get install libimage-exiftool-perl tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils
+```
+
+**Windows:**
+```bash
+# Install ExifTool from: https://exiftool.org/
+choco install exiftool tesseract
+```
+
+**Verify ExifTool Installation:**
+```bash
+exiftool -ver
+# Should show version 12.15 or higher
+```
+
+See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions.
+
+### 2. Create Virtual Environment
+
+```bash
+python3 -m venv venv_local
+source venv_local/bin/activate  # On Windows: venv_local\Scripts\activate
+```
+
+### 3. Install Python Dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+### 4. Configure Environment Variables
+
+Create a `.env` file in the project root:
+
+```env
+# Required: OpenAI API Key (for AI metadata generation)
+OPENAI_API_KEY=your-openai-api-key-here
+
+# Optional: Microsoft SSO (for enterprise authentication)
+# AZURE_CLIENT_ID=your-azure-client-id
+# AZURE_CLIENT_SECRET=your-azure-client-secret
+# AZURE_TENANT_ID=your-azure-tenant-id
+# REDIRECT_URI=http://localhost:5001/auth/callback
+
+# Optional: Flask secret key (auto-generated if not set)
+# SECRET_KEY=your-secret-key-here
+
+# Optional: AI settings (defaults shown)
+# AI_MODEL=gpt-4o-mini
+# MAX_TOKENS=500
+# TEMPERATURE=0.5
+# API_TIMEOUT=30
+# API_MAX_RETRIES=3
+```
+
+### 5. Initialize Database
+
+The database will be created automatically on first run. To manually initialize:
+
+```bash
+python -c "from src.database import Database; db = Database(); print('Database initialized')"
+```
+
+---
+
+## Docker Deployment (Recommended)
+
+### Quick Start with Docker
+
+```bash
+# Build and start
+docker-compose up -d
+
+# Or use the helper script
+./docker-run.sh build
+./docker-run.sh start
+
+# Access at http://localhost:5001
+```
+
+**Benefits:**
+- ✅ No manual dependency installation
+- ✅ Consistent environment across systems
+- ✅ Persistent data storage via volumes
+- ✅ Easy updates and rollbacks
+- ✅ Production-ready configuration
+
+**See [DOCKER.md](DOCKER.md) for complete Docker deployment guide.**
+
+---
+
+## Usage
+
+### Starting the Web Application
+
+**Local Development:**
+```bash
+python web_app.py
+```
+
+**Docker:**
+```bash
+docker-compose up -d
+```
+
+The application will:
+1. ✅ Check for ExifTool availability
+2. ✅ Initialize SQLite database (users, sessions, audit_log)
+3. ✅ Start Flask server on http://localhost:5001
+4. 🌐 Open browser automatically (local mode only)
+
+### Login
+
+**Test Account:**
+- Username: `tester`
+- Password: `oliveradmin`
+
+**Microsoft SSO** (if configured):
+- Click "Sign in with Microsoft" button
+- Authenticate via Azure AD
+- Users auto-created on first login
+
+### Using Metadata Sources
+
+#### 1. Import from File
+1. Select "Import from File (CSV/Excel/JSON)" from metadata source dropdown (default)
+2. Click "Choose File" and select your metadata file
+3. Configure mapping modal:
+   - For Excel files: Select sheet name
+   - Map columns: Filename (required), Title, Description, Keywords
+   - Auto-detection suggests best matches
+   - Preview first 3 rows
+4. Confirm mapping
+5. Upload files to process - tool matches files by filename
+
+#### 2. AI Generation
+1. Select "AI Generation" from metadata source dropdown
+2. Upload files
+3. AI generates metadata (10-30 seconds per file)
+4. Review and edit generated metadata
+5. Save changes
+
+#### 3. Manual Entry
+1. Select "Manual Entry"
+2. Upload files
+3. Fill in metadata fields manually
+4. Save changes
+
+#### 4. Templates
+1. Create template with variables
+2. Select template from dropdown
+3. Apply to selected files
+4. Review and save
+
+### Batch Operations
+
+1. Upload multiple files
+2. Use checkboxes to select files
+3. "Select All" / "Deselect All" buttons
+4. Edit metadata individually
+5. Click "Update Selected Files" to save all at once
+6. Export results to CSV
+
+---
+
+## Configuration
+
+### Database Schema
+
+**Users Table:**
+- id, username, password_hash, email, full_name
+- auth_method (local/sso)
+- created_at, last_login, is_active
+
+**Sessions Table:**
+- session_id, user_id, created_at, expires_at
+- ip_address, user_agent
+
+**Audit Log Table:**
+- id, user_id, action, details, timestamp
+
+### AI Usage Tracking
+
+Every AI metadata generation is logged with:
+- User ID
+- Timestamp
+- Tokens used (prompt + completion)
+- Cost estimate (based on gpt-4o-mini pricing)
+
+View logs in database:
+```sql
+SELECT * FROM audit_log WHERE action = 'ai_generation' ORDER BY timestamp DESC;
+```
+
+### User Management
+
+**Create New User:**
+```python
+from src.database import Database
+db = Database()
+db.create_user(
+    username='newuser',
+    password='password123',
+    email='user@example.com',
+    full_name='New User',
+    auth_method='local'
+)
+```
+
+**List All Users:**
+```python
+users = db.get_all_users()
+for user in users:
+    print(f"{user['username']} - Last login: {user['last_login']}")
+```
+
+---
+
+## Architecture
+
+### File Structure
+
+```
+oliver-metadata-tool/
+├── web_app.py              # Flask web application (main entry point)
+├── requirements.txt        # Python dependencies
+├── .env                    # Environment configuration
+├── oliver_metadata.db      # SQLite database (auto-created)
+├── src/
+│   ├── config.py           # Configuration management
+│   ├── database.py         # Database operations
+│   ├── auth.py             # Authentication logic
+│   ├── metadata_analyzer.py    # AI metadata generation
+│   ├── metadata_importer.py    # Import from files
+│   ├── template_manager.py     # Template system
+│   ├── field_mapper.py         # Column mapping
+│   ├── excel_metadata_lookup.py # Excel lookup
+│   ├── extractors/
+│   │   ├── pdf_extractor.py
+│   │   ├── image_extractor.py
+│   │   ├── office_extractor.py
+│   │   ├── video_extractor.py
+│   │   └── exiftool_extractor.py
+│   └── updaters/
+│       ├── pdf_updater.py
+│       ├── image_updater.py
+│       ├── office_updater.py
+│       ├── video_updater.py
+│       └── exiftool_updater.py
+├── templates/
+│   ├── index.html          # Main UI
+│   └── login.html          # Login page
+└── docs/
+    └── EXIFTOOL_SETUP.md   # ExifTool setup guide
+```
+
+### Technology Stack
+
+- **Backend:** Flask (Python)
+- **Database:** SQLite
+- **Frontend:** HTML5, CSS3, JavaScript (Vanilla)
+- **Design:** Montserrat font, Dark & Gold theme
+- **Authentication:** Flask-Session, werkzeug.security, MSAL
+- **AI:** OpenAI API (gpt-4o-mini)
+- **Metadata:** PyExifTool, pypdf, python-docx, openpyxl
+
+---
+
+## API Endpoints
+
+### Authentication
+- `GET /login` - Login page
+- `POST /login` - Authenticate user
+- `GET /logout` - Destroy session
+- `GET /login/microsoft` - Microsoft SSO redirect
+- `GET /auth/callback` - SSO callback
+
+### File Operations
+- `POST /upload` - Upload files and generate metadata
+- `POST /update-manual` - Update file metadata manually
+- `GET /download/<filename>` - Download processed file
+
+### Metadata Sources
+- `POST /upload-excel` - Upload Excel file for mapping
+- `POST /preview-excel-sheet` - Preview Excel sheet structure
+- `POST /configure-excel-mapping` - Configure Excel column mapping
+- `POST /import-metadata` - Upload import file for mapping
+- `POST /configure-import-mapping` - Configure import column mapping
+
+### Templates
+- `GET /templates/list` - List all templates
+- `POST /templates/save` - Save new template
+- `POST /templates/load` - Load template by name
+- `DELETE /templates/delete` - Delete template
+- `POST /templates/apply` - Apply template to files
+- `POST /templates/preview` - Preview template output
+
+---
+
+## Security & Privacy
+
+### Authentication
+- Passwords hashed with werkzeug.security (pbkdf2:sha256)
+- Session tokens: 32-byte cryptographically secure random strings
+- Sessions expire after 24 hours
+- Microsoft SSO via OAuth2 + Azure AD
+
+### Data Protection
+- All credentials stored in `.env` (excluded from git)
+- Database file excluded from git
+- API keys never logged or exposed to frontend
+- Audit trail for all user actions
+
+### Production Recommendations
+1. **HTTPS:** Use SSL/TLS certificates in production
+2. **Database:** Migrate to PostgreSQL for better concurrency
+3. **Rate Limiting:** Add rate limits to prevent abuse
+4. **CSRF Protection:** Enable Flask-WTF for form security
+5. **Error Tracking:** Integrate Sentry or similar service
+6. **Backups:** Regular database backups
+7. **Monitoring:** Track AI token usage for cost management
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**ExifTool not found:**
+```bash
+# Verify installation
+exiftool -ver
+
+# macOS: Reinstall with Homebrew
+brew reinstall exiftool
+
+# Linux: Reinstall with apt
+sudo apt-get install --reinstall libimage-exiftool-perl
+```
+
+**Database locked error:**
+```bash
+# Stop all instances
+lsof -ti:5001 | xargs kill -9
+
+# Restart application
+python web_app.py
+```
+
+**OpenAI API errors:**
+- Check API key in `.env` file
+- Verify API key is valid at https://platform.openai.com/api-keys
+- Check token usage limits on OpenAI dashboard
+
+**Import failed - column not found:**
+- Use the mapping modal to manually select columns
+- Check that your file has headers in the first row
+- Verify file encoding is UTF-8
+
+---
+
+## Development
+
+### Running Tests
+
+```bash
+# Unit tests (if implemented)
+pytest tests/
+
+# Manual integration test
+python -c "from src.database import Database; from src.config import Config; print('✅ All imports successful')"
+```
+
+### Git Workflow
+
+```bash
+# Check status
+git status
+
+# Add changes
+git add .
+
+# Commit with message
+git commit -m "Your commit message"
+
+# Push to remote
+git push origin main
+```
+
+---
+
+## License & Credits
+
+**License:** Corporate License - Oliver Marketing
+All rights reserved. Unauthorized copying, distribution, or modification is prohibited.
+
+**Developer:** Vadym Samoilenko
+**Company:** Oliver Marketing
+**Version:** 3.1 Enterprise Edition
+**Release Date:** January 2026
+
+**Third-Party Software:**
+- ExifTool by Phil Harvey (Perl Artistic License)
+- Flask by Pallets (BSD License)
+- OpenAI API (Commercial License)
+- PyExifTool (LGPL License)
+
+---
+
+## Support
+
+For issues, questions, or feature requests:
+- **Internal Support:** Contact IT department
+- **Developer:** Vadym Samoilenko
+- **Documentation:** See `docs/` folder
+
+---
+
+## Changelog
+
+### v3.1 (January 2026) - Enterprise Edition
+- ✅ User authentication (local + Microsoft SSO)
+- ✅ SQLite database with audit logging
+- ✅ Unified import from file (CSV/Excel/JSON) with smart column mapping
+- ✅ Excel sheet selection and preview
+- ✅ Custom metadata fields support
+- ✅ AI usage tracking and cost monitoring
+- ✅ Dark & Gold UI redesign
+- ✅ Template variables and preview
+- ✅ Batch selection and CSV export
+- ✅ Consolidated metadata sources (removed redundant Excel Lookup)
+
+### v3.0 (January 2026)
+- ✅ ExifTool integration (300+ formats)
+- ✅ Multiple metadata sources (Import, AI, Manual)
+- ✅ Field mapping with fuzzy matching
+- ✅ Metadata templates system
+- ✅ Rebranded to Oliver Metadata Tool
+
+### v2.x (Prior)
+- Basic Excel lookup functionality
+- Multi-format file support
+- Web interface
--- a/app/init.py
+++ b/app/init.py
--- a/app/config.py
+++ b/app/config.py
@ -0,0 +1,101 @@
+"""Application settings via pydantic-settings."""
+
+import secrets
+import os
+from pathlib import Path
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables and .env file."""
+
+    # App
+    APP_NAME: str = "Oliver Metadata Tool"
+    APP_VERSION: str = "4.0.0"
+    DEBUG: bool = False
+    DOCKER_MODE: bool = False
+    ROOT_PATH: str = ""  # Subpath prefix, e.g. "/solventum-image-metadata"
+
+    # Security
+    SECRET_KEY: str = secrets.token_hex(32)
+    HTTPS_ONLY: bool = False
+    ENABLE_TEST_USER: bool = False
+
+    # Paths
+    UPLOAD_FOLDER: str = ""
+    DB_PATH: str = ""
+    SESSION_DB_PATH: str = ""
+    TEMPLATES_DIR: str = ""
+
+    # OpenAI
+    OPENAI_API_KEY: str = ""
+    AI_MODEL: str = "gpt-4o-mini"
+    MAX_TOKENS: int = 500
+    TEMPERATURE: float = 0.5
+    MAX_TEXT_LENGTH: int = 4000
+    API_TIMEOUT: int = 30
+    API_MAX_RETRIES: int = 3
+
+    # Azure SSO
+    AZURE_CLIENT_ID: str = ""
+    AZURE_CLIENT_SECRET: str = ""
+    AZURE_TENANT_ID: str = ""
+    REDIRECT_URI: str = "http://localhost:5001/auth/callback"
+
+    # OCR
+    OCR_LANGUAGES: str = "eng+chi_sim+chi_tra+jpn+kor"
+    TESSERACT_PATH: str = ""
+    FFMPEG_PATH: str = ""
+
+    # Limits
+    MAX_UPLOAD_SIZE_MB: int = 500
+    SESSION_EXPIRE_HOURS: int = 24
+    FILE_CLEANUP_HOURS: int = 24
+
+    # Superadmin
+    SUPERADMIN_EMAIL: str = "vadymsamoilenko@oliver.agency"
+
+    model_config = {
+        "env_file": ".env",
+        "env_file_encoding": "utf-8",
+        "extra": "ignore",
+    }
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        project_root = Path(__file__).parent.parent
+
+        if self.DOCKER_MODE:
+            if not self.UPLOAD_FOLDER:
+                self.UPLOAD_FOLDER = "/app/uploads"
+            if not self.DB_PATH:
+                self.DB_PATH = "/app/data/oliver_metadata.db"
+            if not self.SESSION_DB_PATH:
+                self.SESSION_DB_PATH = "/app/data/oliver_sessions.db"
+        else:
+            if not self.UPLOAD_FOLDER:
+                self.UPLOAD_FOLDER = str(project_root / "uploads")
+            if not self.DB_PATH:
+                self.DB_PATH = str(project_root / "oliver_metadata.db")
+            if not self.SESSION_DB_PATH:
+                self.SESSION_DB_PATH = str(project_root / "oliver_sessions.db")
+
+        if not self.TEMPLATES_DIR:
+            self.TEMPLATES_DIR = str(project_root / "templates")
+
+        # Ensure upload directory exists
+        Path(self.UPLOAD_FOLDER).mkdir(parents=True, exist_ok=True)
+
+        # Ensure data directory exists (for Docker)
+        Path(self.DB_PATH).parent.mkdir(parents=True, exist_ok=True)
+
+
+_settings = None
+
+
+def get_settings() -> Settings:
+    """Get cached settings instance."""
+    global _settings
+    if _settings is None:
+        _settings = Settings()
+    return _settings
--- a/app/dependencies.py
+++ b/app/dependencies.py
@ -0,0 +1,107 @@
+"""FastAPI dependency injection providers."""
+
+import logging
+from typing import Optional, Dict
+from fastapi import Depends, Request, HTTPException, status
+
+from .config import Settings, get_settings
+from .session.store import SessionStore
+from .services.auth_service import AuthService
+
+logger = logging.getLogger(__name__)
+
+# Singletons (initialized once via lifespan)
+_database = None
+_session_store = None
+_auth_service = None
+
+
+def init_dependencies(settings: Settings):
+    """Initialize singleton dependencies. Called once from app lifespan."""
+    global _database, _session_store, _auth_service
+
+    from src.database import Database
+
+    _database = Database(db_path=settings.DB_PATH)
+    _session_store = SessionStore(db_path=settings.SESSION_DB_PATH)
+    _auth_service = AuthService(database=_database)
+
+    logger.info("Dependencies initialized")
+
+
+def get_database():
+    """Get Database instance."""
+    if _database is None:
+        raise RuntimeError("Database not initialized")
+    return _database
+
+
+def get_session_store() -> SessionStore:
+    """Get SessionStore instance."""
+    if _session_store is None:
+        raise RuntimeError("SessionStore not initialized")
+    return _session_store
+
+
+def get_auth_service() -> AuthService:
+    """Get AuthService instance."""
+    if _auth_service is None:
+        raise RuntimeError("AuthService not initialized")
+    return _auth_service
+
+
+async def get_current_user(request: Request) -> Dict:
+    """FastAPI dependency: require authenticated user.
+
+    Replaces Flask's @login_required decorator.
+    Checks session cookie against database, returns user dict or raises 401.
+    """
+    session_id = request.session.get("session_id")
+    if not session_id:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Not authenticated",
+        )
+
+    auth = get_auth_service()
+    db_session = auth.validate_session(session_id)
+    if not db_session:
+        # Session expired or invalid — clear it
+        request.session.clear()
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Session expired",
+        )
+
+    user_id = db_session["user_id"]
+    user = auth.get_user_by_id(user_id)
+    if not user:
+        request.session.clear()
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="User not found",
+        )
+
+    return user
+
+
+async def get_current_user_optional(request: Request) -> Optional[Dict]:
+    """Same as get_current_user but returns None instead of raising."""
+    try:
+        return await get_current_user(request)
+    except HTTPException:
+        return None
+
+
+async def get_current_admin(request: Request) -> Dict:
+    """FastAPI dependency: require authenticated admin user.
+
+    Raises 403 if user is not an admin.
+    """
+    user = await get_current_user(request)
+    if user.get("role") != "admin":
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="Admin access required",
+        )
+    return user
--- a/app/main.py
+++ b/app/main.py
@ -0,0 +1,126 @@
+"""FastAPI application factory with lifespan management."""
+
+import logging
+from contextlib import asynccontextmanager
+from pathlib import Path
+
+from fastapi import FastAPI, Request, Depends
+from fastapi.exceptions import HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import HTMLResponse, RedirectResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi.templating import Jinja2Templates
+from slowapi import _rate_limit_exceeded_handler
+from slowapi.errors import RateLimitExceeded
+from starlette.middleware.sessions import SessionMiddleware
+
+from .config import get_settings
+from .dependencies import init_dependencies, get_current_user
+from .security import limiter
+
+logger = logging.getLogger(__name__)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Startup/shutdown lifecycle."""
+    settings = get_settings()
+    init_dependencies(settings)
+    logger.info(f"{settings.APP_NAME} v{settings.APP_VERSION} starting")
+    yield
+    logger.info("Shutting down")
+
+
+def create_app() -> FastAPI:
+    settings = get_settings()
+
+    app = FastAPI(
+        title=settings.APP_NAME,
+        version=settings.APP_VERSION,
+        root_path=settings.ROOT_PATH,
+        docs_url="/docs" if settings.DEBUG else None,
+        redoc_url=None,
+        lifespan=lifespan,
+    )
+
+    app.state.limiter = limiter
+    app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+
+    # CORS — same origin only (restrict in production)
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=[settings.REDIRECT_URI.rsplit("/", 1)[0]] if not settings.DEBUG else ["*"],
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+
+    # Session middleware (cookie-based)
+    app.add_middleware(
+        SessionMiddleware,
+        secret_key=settings.SECRET_KEY,
+        session_cookie="oliver_session",
+        max_age=settings.SESSION_EXPIRE_HOURS * 3600,
+        same_site="lax",
+        https_only=settings.HTTPS_ONLY,
+    )
+
+    # Static files
+    project_root = Path(__file__).parent.parent
+    static_dir = project_root / "static"
+    if static_dir.exists():
+        app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
+
+    # Templates
+    templates = Jinja2Templates(directory=settings.TEMPLATES_DIR)
+
+    # Register routers
+    from .routers import auth as auth_router
+    from .routers import upload as upload_router
+    from .routers import metadata as metadata_router
+    from .routers import templates as templates_router
+    from .routers import imports as imports_router
+    from .routers import downloads as downloads_router
+    from .routers import sse as sse_router
+    from .routers import admin as admin_router
+
+    auth_router.set_templates(templates)
+    admin_router.set_templates(templates)
+    app.include_router(auth_router.router)
+    app.include_router(upload_router.router)
+    app.include_router(metadata_router.router)
+    app.include_router(templates_router.router)
+    app.include_router(imports_router.router)
+    app.include_router(downloads_router.router)
+    app.include_router(sse_router.router)
+    app.include_router(admin_router.router)
+
+    # Main page
+    @app.get("/", response_class=HTMLResponse)
+    async def index(request: Request, user=Depends(get_current_user)):
+        return templates.TemplateResponse(
+            "index.html",
+            {
+                "request": request,
+                "username": user["username"],
+                "docker_mode": settings.DOCKER_MODE,
+            },
+        )
+
+    # Redirect unauthenticated users to login
+    @app.exception_handler(HTTPException)
+    async def http_exception_handler(request: Request, exc: HTTPException):
+        if exc.status_code == 401:
+            root = request.scope.get("root_path", "")
+            return RedirectResponse(url=f"{root}/login?next={request.url.path}", status_code=302)
+        # Re-raise other HTTP exceptions as JSON
+        from fastapi.responses import JSONResponse
+        return JSONResponse(
+            status_code=exc.status_code,
+            content={"detail": exc.detail},
+        )
+
+    return app
+
+
+app = create_app()
--- a/app/models/init.py
+++ b/app/models/init.py
--- a/app/models/requests.py
+++ b/app/models/requests.py
@ -0,0 +1,67 @@
+"""Pydantic request models with validation."""
+
+from typing import Optional, Dict, List
+from pydantic import BaseModel, Field
+
+
+class UpdateMetadataRequest(BaseModel):
+    """Request to update file metadata from session."""
+    session_id: str
+    file_index: int
+    filepath: Optional[str] = None  # Deprecated: resolved from session
+    output_dir: Optional[str] = ""
+
+
+class UpdateManualMetadataRequest(BaseModel):
+    """Request to update file with manually entered metadata."""
+    session_id: str
+    file_index: int
+    title: str = Field(default="", max_length=200)
+    subject: str = Field(default="", max_length=300)
+    keywords: str = Field(default="", max_length=500)
+    author: str = Field(default="", max_length=100)
+    copyright: str = Field(default="", max_length=150)
+    comments: str = Field(default="", max_length=500)
+    custom_fields: Optional[Dict[str, str]] = None
+
+
+class ExcelSheetPreviewRequest(BaseModel):
+    """Request to preview a specific Excel sheet."""
+    excel_session_id: str
+    sheet_name: str
+
+
+class ExcelMappingRequest(BaseModel):
+    """Request to configure Excel column mapping."""
+    excel_session_id: str
+    sheet_name: str
+    column_mapping: Dict[str, str]  # {filename: 'col', title: 'col', ...}
+
+
+class ImportMappingRequest(BaseModel):
+    """Request to configure import column mapping."""
+    import_session_id: str
+    column_mapping: Dict[str, str]
+
+
+class TemplateApplyRequest(BaseModel):
+    """Request to apply a template to files."""
+    template_name: str
+    session_id: str
+    file_indices: List[int]
+    custom_vars: Optional[Dict[str, str]] = None
+
+
+class TemplatePreviewRequest(BaseModel):
+    """Request to preview template output."""
+    title: str = ""
+    subject: str = ""
+    keywords: str = ""
+    sample_filename: str = "example.pdf"
+    custom_vars: Optional[Dict[str, str]] = None
+
+
+class DownloadSelectedRequest(BaseModel):
+    """Request to download selected files as ZIP."""
+    session_id: str
+    file_indices: List[int]
--- a/app/models/responses.py
+++ b/app/models/responses.py
@ -0,0 +1,70 @@
+"""Pydantic response models."""
+
+from typing import Optional, Dict, List, Any
+from pydantic import BaseModel
+
+
+class FileResult(BaseModel):
+    """Result for a single processed file."""
+    success: bool = True
+    filename: str
+    file_type: Optional[str] = None
+    current_metadata: Optional[Dict[str, str]] = None
+    suggested_metadata: Optional[Dict[str, str]] = None
+    metadata_source: Optional[str] = None
+    excel_found: bool = False
+    error: Optional[str] = None
+
+
+class UploadResponse(BaseModel):
+    """Response from file upload endpoint."""
+    success: bool
+    session_id: Optional[str] = None
+    files: List[FileResult] = []
+    error: Optional[str] = None
+
+
+class UpdateResponse(BaseModel):
+    """Response from metadata update endpoint."""
+    success: bool = True
+    message: str = ""
+    verified: bool = False
+    metadata: Optional[Dict[str, str]] = None
+    error: Optional[str] = None
+
+
+class ExcelUploadResponse(BaseModel):
+    """Response from Excel file upload."""
+    success: bool
+    excel_session_id: Optional[str] = None
+    filename: Optional[str] = None
+    sheets: Optional[List[str]] = None
+    preview: Optional[Dict[str, Any]] = None
+    message: Optional[str] = None
+    error: Optional[str] = None
+
+
+class ImportUploadResponse(BaseModel):
+    """Response from import file upload."""
+    success: bool
+    import_session_id: Optional[str] = None
+    filename: Optional[str] = None
+    columns: Optional[List[str]] = None
+    sample_data: Optional[List[Dict[str, Any]]] = None
+    message: Optional[str] = None
+    error: Optional[str] = None
+
+
+class MappingConfigResponse(BaseModel):
+    """Response from mapping configuration."""
+    success: bool
+    excel_session_id: Optional[str] = None
+    import_session_id: Optional[str] = None
+    stats: Optional[Dict[str, int]] = None
+    message: Optional[str] = None
+    error: Optional[str] = None
+
+
+class ErrorResponse(BaseModel):
+    """Standard error response."""
+    error: str
--- a/app/routers/init.py
+++ b/app/routers/init.py
--- a/app/routers/admin.py
+++ b/app/routers/admin.py
@ -0,0 +1,126 @@
+"""Admin router: user management, audit log, AI usage stats."""
+
+import logging
+from typing import Dict
+
+from fastapi import APIRouter, Request, Depends
+from fastapi.responses import HTMLResponse, JSONResponse
+from fastapi.templating import Jinja2Templates
+
+from ..config import get_settings
+from ..dependencies import get_current_admin, get_database
+from ..services.admin_service import AdminService
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/admin", tags=["admin"])
+
+_templates: Jinja2Templates = None
+_admin_service: AdminService = None
+
+
+def set_templates(templates: Jinja2Templates):
+    global _templates
+    _templates = templates
+
+
+def _get_admin_service() -> AdminService:
+    global _admin_service
+    if _admin_service is None:
+        _admin_service = AdminService(database=get_database())
+    return _admin_service
+
+
+@router.get("", response_class=HTMLResponse)
+async def admin_dashboard(request: Request, user: Dict = Depends(get_current_admin)):
+    """Admin dashboard page."""
+    svc = _get_admin_service()
+    stats = svc.get_dashboard_stats()
+    return _templates.TemplateResponse(
+        "admin.html",
+        {
+            "request": request,
+            "username": user["username"],
+            "stats": stats,
+        },
+    )
+
+
+@router.get("/users")
+async def list_users(
+    include_inactive: bool = False,
+    user: Dict = Depends(get_current_admin),
+):
+    """List all users."""
+    svc = _get_admin_service()
+    users = svc.list_users(include_inactive=include_inactive)
+    return {"success": True, "users": users}
+
+
+@router.post("/users")
+async def create_user(
+    request: Request,
+    user: Dict = Depends(get_current_admin),
+):
+    """Create a new user."""
+    try:
+        data = await request.json()
+        svc = _get_admin_service()
+        user_id = svc.create_user(
+            username=data.get("username", "").strip(),
+            email=data.get("email", "").strip(),
+            full_name=data.get("full_name", "").strip(),
+            role=data.get("role", "user"),
+            password=data.get("password"),
+            auth_method=data.get("auth_method", "local"),
+        )
+        if user_id:
+            db = get_database()
+            db.log_action(user["id"], "admin_create_user", f"Created user {data.get('username')} (ID: {user_id})")
+            return {"success": True, "user_id": user_id}
+        return JSONResponse({"error": "Failed to create user (username may already exist)"}, status_code=400)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.put("/users/{user_id}")
+async def update_user(
+    user_id: int,
+    request: Request,
+    admin: Dict = Depends(get_current_admin),
+):
+    """Update user (role, is_active, full_name, email)."""
+    try:
+        data = await request.json()
+        svc = _get_admin_service()
+        success = svc.update_user(user_id, data)
+        if success:
+            db = get_database()
+            db.log_action(admin["id"], "admin_update_user", f"Updated user {user_id}: {data}")
+            return {"success": True}
+        return JSONResponse({"error": "No changes applied"}, status_code=400)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.get("/audit")
+async def get_audit_log(
+    user_id: int = None,
+    action: str = None,
+    limit: int = 100,
+    offset: int = 0,
+    admin: Dict = Depends(get_current_admin),
+):
+    """Get audit log with optional filters."""
+    svc = _get_admin_service()
+    entries = svc.get_audit_log(user_id=user_id, action=action, limit=limit, offset=offset)
+    return {"success": True, "entries": entries, "count": len(entries)}
+
+
+@router.get("/ai-usage")
+async def get_ai_usage(admin: Dict = Depends(get_current_admin)):
+    """Get AI usage statistics."""
+    svc = _get_admin_service()
+    stats = svc.get_ai_usage_stats()
+    by_user = svc.get_ai_usage_by_user()
+    return {"success": True, "stats": stats, "by_user": by_user}
--- a/app/routers/auth.py
+++ b/app/routers/auth.py
@ -0,0 +1,251 @@
+"""Authentication router: login, logout, Microsoft SSO."""
+
+import secrets
+import logging
+from typing import Dict
+from fastapi import APIRouter, Request, Depends, Form
+from fastapi.responses import HTMLResponse, RedirectResponse
+from fastapi.templating import Jinja2Templates
+
+from ..config import get_settings, Settings
+from ..dependencies import get_auth_service, get_current_user_optional
+from ..security import limiter
+from ..services.auth_service import AuthService
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["auth"])
+
+# Templates are set from main.py after mounting
+_templates: Jinja2Templates = None
+
+
+def set_templates(templates: Jinja2Templates):
+    global _templates
+    _templates = templates
+
+
+@router.get("/login", response_class=HTMLResponse)
+async def login_page(
+    request: Request,
+    error: str = None,
+    info: str = None,
+    settings: Settings = Depends(get_settings),
+    auth: AuthService = Depends(get_auth_service),
+):
+    """Render login page."""
+    # If already logged in, redirect to index
+    user = await get_current_user_optional(request)
+    if user:
+        root = request.scope.get("root_path", "")
+        return RedirectResponse(url=f"{root}/", status_code=302)
+
+    return _templates.TemplateResponse(
+        "login.html",
+        {
+            "request": request,
+            "error": error,
+            "info": info,
+            "sso_enabled": auth.sso_enabled,
+            "enable_test_user": settings.ENABLE_TEST_USER,
+            "app_version": settings.APP_VERSION,
+        },
+    )
+
+
+@router.post("/login")
+@limiter.limit("5/minute")
+async def login_submit(
+    request: Request,
+    username: str = Form(...),
+    password: str = Form(...),
+    settings: Settings = Depends(get_settings),
+    auth: AuthService = Depends(get_auth_service),
+):
+    """Process login form. Rate limited to 5 attempts per minute."""
+    username = username.strip()
+    if not username or not password:
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": "Please enter both username and password",
+                "sso_enabled": auth.sso_enabled,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    result = auth.authenticate_user(username, password)
+
+    if not result["success"]:
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": result.get("error"),
+                "sso_enabled": auth.sso_enabled,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    user = result["user"]
+    session_id = auth.create_session(
+        user=user,
+        ip_address=request.client.host if request.client else None,
+        user_agent=request.headers.get("user-agent"),
+    )
+
+    if not session_id:
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": "Failed to create session",
+                "sso_enabled": auth.sso_enabled,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    # Set session data
+    request.session["user_id"] = user["id"]
+    request.session["username"] = user["username"]
+    request.session["session_id"] = session_id
+
+    root = request.scope.get("root_path", "")
+    next_url = request.query_params.get("next", "/")
+    # Prefix with root_path if next_url is a relative path
+    if next_url.startswith("/") and not next_url.startswith(root):
+        next_url = f"{root}{next_url}"
+    return RedirectResponse(url=next_url, status_code=302)
+
+
+@router.get("/logout")
+async def logout(
+    request: Request,
+    auth: AuthService = Depends(get_auth_service),
+):
+    """Logout and destroy session."""
+    user_id = request.session.get("user_id")
+    session_id = request.session.get("session_id")
+
+    if session_id:
+        auth.destroy_session(session_id, user_id)
+
+    request.session.clear()
+    root = request.scope.get("root_path", "")
+    return RedirectResponse(url=f"{root}/login", status_code=302)
+
+
+@router.get("/login/microsoft")
+async def login_microsoft(
+    request: Request,
+    settings: Settings = Depends(get_settings),
+    auth: AuthService = Depends(get_auth_service),
+):
+    """Redirect to Microsoft SSO."""
+    if not auth.sso_enabled:
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": "Microsoft SSO not configured",
+                "sso_enabled": False,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    state = secrets.token_urlsafe(16)
+    request.session["oauth_state"] = state
+
+    auth_url = auth.sso.get_auth_url(state=state)
+    if auth_url:
+        return RedirectResponse(url=auth_url, status_code=302)
+
+    return _templates.TemplateResponse(
+        "login.html",
+        {
+            "request": request,
+            "error": "Failed to generate SSO URL",
+            "sso_enabled": auth.sso_enabled,
+            "enable_test_user": settings.ENABLE_TEST_USER,
+            "app_version": settings.APP_VERSION,
+        },
+    )
+
+
+@router.get("/auth/callback")
+async def auth_callback(
+    request: Request,
+    state: str = None,
+    code: str = None,
+    error_description: str = None,
+    settings: Settings = Depends(get_settings),
+    auth: AuthService = Depends(get_auth_service),
+):
+    """Handle Microsoft SSO callback."""
+    from ..dependencies import get_database
+
+    # Verify state
+    if state != request.session.get("oauth_state"):
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": "Invalid state parameter",
+                "sso_enabled": auth.sso_enabled,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    if not code:
+        error_msg = error_description or "No authorization code"
+        return _templates.TemplateResponse(
+            "login.html",
+            {
+                "request": request,
+                "error": f"SSO failed: {error_msg}",
+                "sso_enabled": auth.sso_enabled,
+                "enable_test_user": settings.ENABLE_TEST_USER,
+                "app_version": settings.APP_VERSION,
+            },
+        )
+
+    # Exchange code for token
+    result = auth.sso.acquire_token(code)
+
+    if result and "access_token" in result:
+        user_info = auth.sso.get_user_info(result["access_token"])
+
+        if user_info:
+            db = get_database()
+            user = auth.sso.create_or_update_user(user_info, db)
+
+            if user:
+                session_id = auth.create_session(
+                    user=user,
+                    ip_address=request.client.host if request.client else None,
+                    user_agent=request.headers.get("user-agent"),
+                )
+
+                if session_id:
+                    request.session["user_id"] = user["id"]
+                    request.session["username"] = user["username"]
+                    request.session["session_id"] = session_id
+                    root = request.scope.get("root_path", "")
+                    return RedirectResponse(url=f"{root}/", status_code=302)
+
+    return _templates.TemplateResponse(
+        "login.html",
+        {
+            "request": request,
+            "error": "SSO authentication failed",
+            "sso_enabled": auth.sso_enabled,
+            "enable_test_user": settings.ENABLE_TEST_USER,
+            "app_version": settings.APP_VERSION,
+        },
+    )
--- a/app/routers/downloads.py
+++ b/app/routers/downloads.py
@ -0,0 +1,116 @@
+"""Download router: single file, ZIP batch, session cleanup."""
+
+import os
+import io
+import zipfile
+import logging
+from pathlib import Path
+from typing import Dict
+from datetime import datetime
+
+from fastapi import APIRouter, Request, Depends, BackgroundTasks
+from fastapi.responses import FileResponse, StreamingResponse, JSONResponse
+
+from ..dependencies import get_current_user, get_session_store
+from ..services.file_service import safe_filename
+from ..session.store import SessionStore
+from ..config import get_settings
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["downloads"])
+
+
+@router.get("/download/{filename}")
+async def download_file(
+    filename: str,
+    user: Dict = Depends(get_current_user),
+):
+    """Download a single processed file."""
+    settings = get_settings()
+    filepath = os.path.join(settings.UPLOAD_FOLDER, str(user["id"]), safe_filename(filename))
+
+    # Also check root upload folder for backward compat
+    if not os.path.exists(filepath):
+        filepath = os.path.join(settings.UPLOAD_FOLDER, safe_filename(filename))
+
+    if os.path.exists(filepath):
+        return FileResponse(filepath, filename=filename, media_type="application/octet-stream")
+
+    return JSONResponse({"error": "File not found"}, status_code=404)
+
+
+@router.post("/download-selected")
+async def download_selected_files(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Download selected files from session as ZIP archive."""
+    try:
+        data = await request.json()
+        session_id = data.get("session_id")
+        file_indices = data.get("file_indices", [])
+
+        session_data = store.get_file_session(session_id)
+        if not session_data:
+            return JSONResponse({"error": "Session not found"}, status_code=404)
+
+        if not file_indices:
+            return JSONResponse({"error": "No files selected"}, status_code=400)
+
+        files = session_data.get("files", [])
+        if not files:
+            return JSONResponse({"error": "No files in session"}, status_code=404)
+
+        # Create in-memory ZIP
+        zip_buffer = io.BytesIO()
+        with zipfile.ZipFile(zip_buffer, "w", zipfile.ZIP_DEFLATED) as zf:
+            for index in file_indices:
+                if 0 <= index < len(files):
+                    file_info = files[index]
+                    filepath = file_info.get("filepath", "")
+                    filename = file_info.get("filename", "")
+
+                    if filepath and os.path.exists(filepath):
+                        zf.write(filepath, filename)
+
+        zip_buffer.seek(0)
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        zip_filename = f"oliver_metadata_files_{timestamp}.zip"
+
+        return StreamingResponse(
+            zip_buffer,
+            media_type="application/zip",
+            headers={"Content-Disposition": f'attachment; filename="{zip_filename}"'},
+        )
+
+    except Exception as e:
+        logger.error(f"Download error: {e}", exc_info=True)
+        return JSONResponse({"error": f"Error creating ZIP archive: {e}"}, status_code=500)
+
+
+@router.post("/cleanup-session/{session_id}")
+async def cleanup_session(
+    session_id: str,
+    background_tasks: BackgroundTasks,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Clean up session files."""
+    try:
+        session_data = store.get_file_session(session_id)
+        if session_data:
+            # Delete uploaded files in background
+            files = session_data.get("files", [])
+            for file_info in files:
+                filepath = file_info.get("filepath", "")
+                if filepath and os.path.exists(filepath):
+                    background_tasks.add_task(os.remove, filepath)
+
+            store.delete_file_session(session_id)
+
+        return {"success": True, "message": "Session cleaned up successfully"}
+    except Exception as e:
+        logger.error(f"Cleanup error: {e}")
+        return JSONResponse({"error": str(e)}, status_code=500)
--- a/app/routers/imports.py
+++ b/app/routers/imports.py
@ -0,0 +1,201 @@
+"""Import router: import metadata from CSV/Excel/JSON files."""
+
+import logging
+from pathlib import Path
+from typing import Dict
+
+from fastapi import APIRouter, Request, UploadFile, File, Depends
+from fastapi.responses import JSONResponse
+
+from ..dependencies import get_current_user, get_session_store
+from ..services.file_service import FileService, safe_filename
+from ..session.store import SessionStore
+from ..config import get_settings
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["imports"])
+
+_file_service = None
+
+
+def _get_file_service() -> FileService:
+    global _file_service
+    if _file_service is None:
+        settings = get_settings()
+        _file_service = FileService(
+            upload_folder=settings.UPLOAD_FOLDER,
+            max_size_mb=settings.MAX_UPLOAD_SIZE_MB,
+        )
+    return _file_service
+
+
+@router.post("/import-metadata")
+async def import_metadata(
+    request: Request,
+    import_file: UploadFile = File(...),
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Upload import file and preview structure for mapping."""
+    try:
+        import pandas as pd
+
+        file_svc = _get_file_service()
+        filepath = await file_svc.save_upload(import_file, user["id"])
+        file_ext = Path(filepath).suffix.lower()
+
+        if file_ext == ".csv":
+            df = pd.read_csv(filepath, nrows=5, encoding="utf-8")
+        elif file_ext in [".xlsx", ".xls"]:
+            df = pd.read_excel(filepath, nrows=5)
+        elif file_ext == ".json":
+            import json
+            with open(filepath, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            if isinstance(data, list):
+                df = pd.DataFrame(data[:5])
+            elif isinstance(data, dict):
+                df = pd.DataFrame([data])
+            else:
+                return JSONResponse({"error": "Invalid JSON format"}, status_code=400)
+        else:
+            return JSONResponse({"error": f"Unsupported file format: {file_ext}"}, status_code=400)
+
+        columns = df.columns.tolist()
+        sample_data = df.fillna("").to_dict("records")
+
+        import_session_id = store.create_import_session(
+            user_id=user["id"],
+            session_type="import",
+            file_info={"path": filepath, "filename": Path(filepath).name, "file_type": file_ext},
+        )
+
+        return {
+            "success": True,
+            "import_session_id": import_session_id,
+            "filename": Path(filepath).name,
+            "columns": columns,
+            "sample_data": sample_data,
+            "message": "Import file uploaded. Please configure column mapping.",
+        }
+
+    except Exception as e:
+        logger.error(f"Import upload failed: {e}")
+        return JSONResponse({"error": f"Import upload failed: {e}"}, status_code=500)
+
+
+@router.post("/configure-import-mapping")
+async def configure_import_mapping(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Configure import column mapping and load metadata."""
+    try:
+        import pandas as pd
+        import json
+
+        data = await request.json()
+        import_session_id = data.get("import_session_id")
+        column_mapping = data.get("column_mapping", {})
+
+        session_data = store.get_import_session(import_session_id)
+        if not session_data:
+            return JSONResponse({"error": "Invalid session ID"}, status_code=400)
+
+        import_path = session_data["file_info"].get("path", "")
+        file_ext = session_data["file_info"].get("file_type", "")
+
+        if file_ext == ".csv":
+            df = pd.read_csv(import_path, encoding="utf-8")
+        elif file_ext in [".xlsx", ".xls"]:
+            df = pd.read_excel(import_path)
+        elif file_ext == ".json":
+            with open(import_path, "r", encoding="utf-8") as f:
+                json_data = json.load(f)
+            df = pd.DataFrame(json_data if isinstance(json_data, list) else [json_data])
+        else:
+            return JSONResponse({"error": "Unsupported file type"}, status_code=400)
+
+        filename_col = column_mapping.get("filename")
+        title_col = column_mapping.get("title")
+        subject_col = column_mapping.get("subject")
+        keywords_col = column_mapping.get("keywords")
+
+        if not filename_col:
+            return JSONResponse({"error": "Filename column is required"}, status_code=400)
+
+        metadata_map = {}
+        for _, row in df.iterrows():
+            fname = row.get(filename_col)
+            if pd.notna(fname) and str(fname).strip():
+                stem = Path(str(fname).strip()).stem.lower()
+                metadata_map[stem] = {
+                    "title": str(row.get(title_col, "")).strip() if title_col and pd.notna(row.get(title_col)) else "",
+                    "subject": str(row.get(subject_col, "")).strip() if subject_col and pd.notna(row.get(subject_col)) else "",
+                    "keywords": str(row.get(keywords_col, "")).strip() if keywords_col and pd.notna(row.get(keywords_col)) else "",
+                    "original_filename": str(fname).strip(),
+                }
+
+        store.update_import_session(import_session_id, metadata_map=metadata_map)
+
+        stats = {
+            "total_records": len(metadata_map),
+            "with_title": sum(1 for v in metadata_map.values() if v.get("title")),
+            "with_subject": sum(1 for v in metadata_map.values() if v.get("subject")),
+            "with_keywords": sum(1 for v in metadata_map.values() if v.get("keywords")),
+        }
+
+        return {
+            "success": True,
+            "import_session_id": import_session_id,
+            "stats": stats,
+            "message": f"Configured mapping for {stats['total_records']} records",
+        }
+
+    except Exception as e:
+        logger.error(f"Import configuration failed: {e}")
+        return JSONResponse({"error": f"Import configuration failed: {e}"}, status_code=500)
+
+
+@router.post("/preview-import")
+async def preview_import(
+    request: Request,
+    import_file: UploadFile = File(...),
+    user: Dict = Depends(get_current_user),
+):
+    """Preview file structure and suggest field mappings."""
+    try:
+        file_svc = _get_file_service()
+        filepath = await file_svc.save_upload(import_file, user["id"])
+
+        from src.metadata_importer import MetadataImporter
+        importer = MetadataImporter()
+        columns, sample_rows, suggestions = importer.preview_file_structure(filepath)
+
+        # Clean up temp file
+        file_svc.delete_file(filepath)
+
+        formatted_suggestions = {}
+        for source_field, suggestion_data in suggestions.items():
+            formatted_suggestions[source_field] = {
+                "best_match": suggestion_data["best_match"],
+                "confidence": round(suggestion_data["confidence"], 2),
+                "alternatives": [
+                    {"field": alt["field"], "confidence": round(alt["confidence"], 2)}
+                    for alt in suggestion_data.get("alternatives", [])
+                ],
+            }
+
+        return {
+            "success": True,
+            "columns": columns,
+            "sample_rows": sample_rows[:5],
+            "suggestions": formatted_suggestions,
+            "filename": Path(filepath).name,
+        }
+
+    except Exception as e:
+        logger.error(f"Preview failed: {e}")
+        return JSONResponse({"error": f"Preview failed: {e}"}, status_code=500)
--- a/app/routers/metadata.py
+++ b/app/routers/metadata.py
@ -0,0 +1,179 @@
+"""Metadata router: update, manual update, stats."""
+
+import os
+import shutil
+import logging
+from typing import Dict
+
+from fastapi import APIRouter, Request, Depends
+from fastapi.responses import JSONResponse
+
+from ..dependencies import get_current_user, get_session_store
+from ..services import metadata_service
+from ..services.file_service import FileService
+from ..session.store import SessionStore
+from ..config import get_settings
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["metadata"])
+
+
+@router.post("/update")
+async def update_metadata(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Update file metadata using suggested metadata from session."""
+    data = await request.json()
+    session_id = data.get("session_id")
+    file_index = data.get("file_index")
+
+    if not session_id:
+        return JSONResponse({"error": "Invalid or expired session"}, status_code=400)
+
+    session_data = store.get_file_session(session_id)
+    if not session_data:
+        return JSONResponse({"error": "Invalid or expired session"}, status_code=400)
+
+    files = session_data.get("files", [])
+    if file_index is None or file_index < 0 or file_index >= len(files):
+        return JSONResponse({"error": "Invalid file index"}, status_code=400)
+
+    try:
+        file_info = files[file_index]
+        filepath = file_info.get("filepath")
+
+        if not filepath or not os.path.exists(filepath):
+            return JSONResponse({"error": "File not found"}, status_code=404)
+
+        new_metadata = file_info.get("suggested_metadata", {})
+        if not new_metadata or not new_metadata.get("title"):
+            return JSONResponse({"error": "No metadata available for this file"}, status_code=400)
+
+        from src.file_detector import FileDetector, FileType
+
+        file_type = FileDetector.detect_file_type(filepath)
+        if file_type == FileType.UNSUPPORTED:
+            return JSONResponse({"error": "Unsupported file type"}, status_code=400)
+
+        settings = get_settings()
+
+        # Update metadata in-place
+        success = metadata_service.update_file_metadata(
+            filepath, file_type, new_metadata, backup=False
+        )
+        if not success:
+            return JSONResponse({"error": "Failed to update metadata"}, status_code=500)
+
+        verified = metadata_service.verify_file_metadata(filepath, file_type, new_metadata)
+
+        return {
+            "success": True,
+            "message": "Metadata updated successfully",
+            "verified": verified,
+            "metadata": new_metadata,
+        }
+
+    except Exception as e:
+        logger.error(f"Update error: {e}")
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.post("/update-manual")
+async def update_manual_metadata(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Update file with manually entered metadata."""
+    data = await request.json()
+    session_id = data.get("session_id")
+    file_index = data.get("file_index")
+
+    custom_metadata = {
+        "title": str(data.get("title", "")).strip()[:200],
+        "subject": str(data.get("subject", "")).strip()[:300],
+        "keywords": str(data.get("keywords", "")).strip()[:500],
+        "author": str(data.get("author", "")).strip()[:100],
+        "copyright": str(data.get("copyright", "")).strip()[:150],
+        "comments": str(data.get("comments", "")).strip()[:500],
+    }
+
+    # Handle custom fields
+    custom_fields = data.get("custom_fields", {})
+    if custom_fields and isinstance(custom_fields, dict):
+        for field_name, field_value in custom_fields.items():
+            safe_name = str(field_name).strip()[:50]
+            safe_value = str(field_value).strip()[:200]
+            if safe_name and safe_value:
+                custom_metadata[safe_name] = safe_value
+
+    if not session_id:
+        return JSONResponse({"error": "Invalid or expired session"}, status_code=400)
+
+    session_data = store.get_file_session(session_id)
+    if not session_data:
+        return JSONResponse({"error": "Invalid or expired session"}, status_code=400)
+
+    files = session_data.get("files", [])
+    if file_index is None or file_index < 0 or file_index >= len(files):
+        return JSONResponse({"error": "Invalid file index"}, status_code=400)
+
+    try:
+        file_info = files[file_index]
+        filepath = file_info.get("filepath")
+
+        if not filepath or not os.path.exists(filepath):
+            return JSONResponse({"error": "File not found"}, status_code=404)
+
+        from src.file_detector import FileDetector, FileType
+
+        file_type = FileDetector.detect_file_type(filepath)
+        if file_type == FileType.UNSUPPORTED:
+            return JSONResponse({"error": "Unsupported file type"}, status_code=400)
+
+        success = metadata_service.update_file_metadata(
+            filepath, file_type, custom_metadata, backup=True
+        )
+        if not success:
+            return JSONResponse({"error": "Failed to update metadata"}, status_code=500)
+
+        # Update session with new metadata
+        store.update_file_in_session(
+            session_id, file_index, {"suggested_metadata": custom_metadata}
+        )
+
+        verified = metadata_service.verify_file_metadata(filepath, file_type, custom_metadata)
+
+        return {
+            "status": "success",
+            "message": "Metadata updated successfully",
+            "verified": verified,
+            "metadata": custom_metadata,
+        }
+
+    except Exception as e:
+        logger.error(f"Manual update error: {e}")
+        return JSONResponse({"error": f"Error updating metadata: {e}"}, status_code=500)
+
+
+@router.get("/stats")
+async def get_stats(
+    user: Dict = Depends(get_current_user),
+):
+    """Get metadata statistics."""
+    try:
+        from src.excel_metadata_lookup import ExcelMetadataLookup
+        from pathlib import Path
+
+        excel_path = Path(__file__).parent.parent.parent / "Celum ID to Adobe Asset Path Mapping Spreadsheet (1).xlsx"
+        if excel_path.exists():
+            lookup = ExcelMetadataLookup(str(excel_path))
+            stats = lookup.get_stats()
+            return {"success": True, "stats": stats}
+        else:
+            return {"success": True, "stats": {"message": "No default Excel file configured"}}
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
--- a/app/routers/sse.py
+++ b/app/routers/sse.py
@ -0,0 +1,67 @@
+"""SSE router: Server-Sent Events for realtime AI progress."""
+
+import asyncio
+import logging
+from typing import Dict
+
+from fastapi import APIRouter, Request, Depends
+from fastapi.responses import StreamingResponse
+
+from ..dependencies import get_current_user
+from ..services.ai_service import get_progress_queue, remove_progress_queue
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["sse"])
+
+
+@router.get("/events/ai-progress/{session_id}")
+async def ai_progress_stream(
+    session_id: str,
+    request: Request,
+    user: Dict = Depends(get_current_user),
+):
+    """Stream AI processing progress events via SSE.
+
+    Events:
+        - processing: {file_index, filename, current, total}
+        - file_complete: {file_index, filename, metadata}
+        - error: {file_index, filename, error}
+        - done: {total_processed, total_errors}
+    """
+
+    async def event_generator():
+        queue = get_progress_queue(session_id)
+        try:
+            while True:
+                # Check if client disconnected
+                if await request.is_disconnected():
+                    break
+
+                try:
+                    event = await asyncio.wait_for(queue.get(), timeout=30.0)
+                except asyncio.TimeoutError:
+                    # Send keepalive
+                    yield ": keepalive\n\n"
+                    continue
+
+                event_type = event.get("type", "message")
+                import json
+                data = json.dumps(event)
+                yield f"event: {event_type}\ndata: {data}\n\n"
+
+                # Stop after 'done' event
+                if event_type == "done":
+                    break
+        finally:
+            remove_progress_queue(session_id)
+
+    return StreamingResponse(
+        event_generator(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
--- a/app/routers/templates.py
+++ b/app/routers/templates.py
@ -0,0 +1,182 @@
+"""Template management router: list, save, load, delete, apply, preview."""
+
+import logging
+from typing import Dict
+
+from fastapi import APIRouter, Request, Depends
+from fastapi.responses import JSONResponse
+
+from ..dependencies import get_current_user, get_session_store
+from ..session.store import SessionStore
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/templates", tags=["templates"])
+
+# Lazy-initialized template manager
+_template_manager = None
+
+
+def _get_template_manager():
+    global _template_manager
+    if _template_manager is None:
+        from src.template_manager import TemplateManager
+        _template_manager = TemplateManager()
+    return _template_manager
+
+
+@router.get("/list")
+async def list_templates(user: Dict = Depends(get_current_user)):
+    """List all available templates."""
+    try:
+        tm = _get_template_manager()
+        templates = tm.list_templates()
+        return {"success": True, "templates": templates}
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.post("/save")
+async def save_template(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+):
+    """Save a new template."""
+    try:
+        data = await request.json()
+        name = data.get("name", "").strip()
+        if not name:
+            return JSONResponse({"error": "Template name is required"}, status_code=400)
+
+        tm = _get_template_manager()
+        template = tm.create_template(
+            name=name,
+            title_template=data.get("title", ""),
+            subject_template=data.get("subject", ""),
+            keywords_template=data.get("keywords", ""),
+            description=data.get("description", ""),
+        )
+        success = tm.save_template(template)
+
+        if success:
+            return {"success": True, "message": f'Template "{name}" saved successfully', "template": template}
+        return JSONResponse({"error": "Failed to save template"}, status_code=500)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.get("/load/{name}")
+async def load_template(name: str, user: Dict = Depends(get_current_user)):
+    """Load a template by name."""
+    try:
+        tm = _get_template_manager()
+        template = tm.load_template(name)
+        if template:
+            return {"success": True, "template": template}
+        return JSONResponse({"error": f'Template "{name}" not found'}, status_code=404)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.delete("/delete/{name}")
+async def delete_template(name: str, user: Dict = Depends(get_current_user)):
+    """Delete a template."""
+    try:
+        tm = _get_template_manager()
+        success = tm.delete_template(name)
+        if success:
+            return {"success": True, "message": f'Template "{name}" deleted successfully'}
+        return JSONResponse({"error": f'Template "{name}" not found'}, status_code=404)
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.post("/apply")
+async def apply_template(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Apply a template to generate metadata for files."""
+    try:
+        data = await request.json()
+        template_name = data.get("template_name", "").strip()
+        file_indices = data.get("file_indices", [])
+        session_id = data.get("session_id")
+        custom_vars = data.get("custom_vars", {})
+
+        if not template_name:
+            return JSONResponse({"error": "Template name is required"}, status_code=400)
+
+        session_data = store.get_file_session(session_id)
+        if not session_data:
+            return JSONResponse({"error": "Invalid or expired session"}, status_code=400)
+
+        tm = _get_template_manager()
+        template = tm.load_template(template_name)
+        if not template:
+            return JSONResponse({"error": f'Template "{template_name}" not found'}, status_code=404)
+
+        files = session_data.get("files", [])
+        results = []
+
+        for file_index in file_indices:
+            if file_index >= len(files):
+                continue
+            file_info = files[file_index]
+            filename = file_info.get("filename", "unknown")
+
+            metadata = tm.apply_template(
+                template=template,
+                filename=filename,
+                user="web_user",
+                custom_vars=custom_vars,
+            )
+
+            # Update session
+            store.update_file_in_session(session_id, file_index, {"suggested_metadata": metadata})
+
+            results.append({
+                "file_index": file_index,
+                "filename": filename,
+                "metadata": metadata,
+            })
+
+        return {
+            "success": True,
+            "message": f"Template applied to {len(results)} file(s)",
+            "results": results,
+        }
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
+
+
+@router.post("/preview")
+async def preview_template(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+):
+    """Preview template output with sample data."""
+    try:
+        data = await request.json()
+        template = {
+            "name": "preview",
+            "title": data.get("title", ""),
+            "subject": data.get("subject", ""),
+            "keywords": data.get("keywords", ""),
+        }
+        sample_filename = data.get("sample_filename", "example.pdf")
+        custom_vars = data.get("custom_vars", {})
+
+        tm = _get_template_manager()
+        preview = tm.preview_template(
+            template=template,
+            sample_filename=sample_filename,
+            user="web_user",
+            custom_vars=custom_vars,
+        )
+        available_vars = tm.get_available_variables()
+
+        return {"success": True, "preview": preview, "available_variables": available_vars}
+    except Exception as e:
+        return JSONResponse({"error": str(e)}, status_code=500)
--- a/app/routers/upload.py
+++ b/app/routers/upload.py
@ -0,0 +1,302 @@
+"""Upload router: file upload, Excel upload, mapping configuration."""
+
+import secrets
+import logging
+from pathlib import Path
+from typing import Dict, List
+
+from fastapi import APIRouter, Request, Depends, UploadFile, File, Form
+from fastapi.responses import JSONResponse
+
+from ..dependencies import get_current_user, get_session_store
+from ..security import limiter
+from ..services.file_service import FileService, safe_filename
+from ..services import metadata_service
+from ..session.store import SessionStore
+from ..config import get_settings, Settings
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["upload"])
+
+# Lazy-initialized file service
+_file_service = None
+
+
+def _get_file_service() -> FileService:
+    global _file_service
+    if _file_service is None:
+        settings = get_settings()
+        _file_service = FileService(
+            upload_folder=settings.UPLOAD_FOLDER,
+            max_size_mb=settings.MAX_UPLOAD_SIZE_MB,
+        )
+    return _file_service
+
+
+@router.post("/upload")
+@limiter.limit("10/minute")
+async def upload_files(
+    request: Request,
+    files: List[UploadFile] = File(...),
+    metadata_source: str = Form("manual"),
+    import_session_id: str = Form(""),
+    excel_session_id: str = Form(""),
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Handle multiple file uploads with metadata source selection."""
+    if not files or (len(files) == 1 and not files[0].filename):
+        return JSONResponse({"error": "No files provided"}, status_code=400)
+
+    file_svc = _get_file_service()
+    user_id = user["id"]
+
+    # Resolve lookup / import_map based on source
+    lookup = None
+    import_map = None
+
+    if metadata_source == "excel":
+        if excel_session_id:
+            session_data = store.get_import_session(excel_session_id)
+            if session_data and "metadata_map" in session_data:
+                # Wrap metadata_map as a lookup-like object
+                lookup = _ExcelLookupAdapter(session_data["metadata_map"])
+        if not lookup:
+            return JSONResponse(
+                {"error": "Please upload an Excel file first using the Upload Excel File button"},
+                status_code=400,
+            )
+
+    elif metadata_source == "import":
+        if import_session_id:
+            session_data = store.get_import_session(import_session_id)
+            if session_data and "metadata_map" in session_data:
+                import_map = session_data["metadata_map"]
+        if not import_map:
+            return JSONResponse(
+                {"error": "Please import a metadata file first using the Import button"},
+                status_code=400,
+            )
+
+    # Create file session
+    session_id = store.create_file_session(
+        user_id=user_id,
+        metadata_source=metadata_source,
+        import_session_id=import_session_id,
+    )
+
+    results = []
+    ai_pending = []  # Files needing background AI processing
+
+    for upload_file in files:
+        try:
+            filepath = await file_svc.save_upload(upload_file, user_id)
+            filename = Path(filepath).name
+
+            if metadata_source == "ai":
+                # For AI source: save files first, process AI in background
+                file_type = metadata_service.detect_file(filepath)
+                old_metadata = metadata_service.extract_metadata(filepath, file_type)
+                file_result = {
+                    "success": True,
+                    "filename": filename,
+                    "file_type": file_type.value,
+                    "current_metadata": old_metadata,
+                    "suggested_metadata": {"title": "", "subject": "AI processing...", "keywords": ""},
+                    "filepath": filepath,
+                    "metadata_source": "ai",
+                    "ai_status": "pending",
+                }
+                store.add_file_to_session(session_id, file_result)
+                ai_pending.append({
+                    "file_index": len(results),
+                    "filepath": filepath,
+                    "filename": filename,
+                    "file_type": file_type,
+                })
+                results.append(file_result)
+            else:
+                file_result = await metadata_service.process_uploaded_file(
+                    filepath=filepath,
+                    filename=filename,
+                    metadata_source=metadata_source,
+                    lookup=lookup,
+                    import_map=import_map,
+                )
+                store.add_file_to_session(session_id, file_result)
+                results.append(file_result)
+
+        except ValueError as e:
+            results.append({"filename": upload_file.filename, "error": str(e)})
+        except Exception as e:
+            logger.error(f"Upload error for {upload_file.filename}: {e}")
+            results.append({"filename": upload_file.filename, "error": str(e)})
+
+    # Start background AI processing if needed
+    if ai_pending:
+        import asyncio
+        from ..services.ai_service import process_bulk_ai
+        asyncio.create_task(process_bulk_ai(session_id, ai_pending, store, user_id))
+
+    # Strip server paths from client response
+    safe_results = [{k: v for k, v in r.items() if k != "filepath"} for r in results]
+
+    return {"success": True, "session_id": session_id, "files": safe_results, "ai_processing": bool(ai_pending)}
+
+
+@router.post("/upload-excel")
+async def upload_excel(
+    request: Request,
+    excel_file: UploadFile = File(...),
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Upload Excel file for metadata lookup — returns sheet structure for mapping."""
+    try:
+        import pandas as pd
+
+        file_svc = _get_file_service()
+        filepath = await file_svc.save_upload(excel_file, user["id"])
+
+        excel = pd.ExcelFile(filepath)
+        sheet_names = excel.sheet_names
+
+        preview_data = {}
+        for sheet_name in sheet_names[:5]:
+            df = pd.read_excel(excel, sheet_name=sheet_name, nrows=5)
+            preview_data[sheet_name] = {
+                "columns": df.columns.tolist(),
+                "sample_data": df.head(3).fillna("").to_dict("records"),
+            }
+
+        # Store as import session with file info
+        excel_session_id = store.create_import_session(
+            user_id=user["id"],
+            session_type="excel",
+            file_info={
+                "path": filepath,
+                "filename": Path(filepath).name,
+                "sheet_names": sheet_names,
+            },
+        )
+
+        return {
+            "success": True,
+            "excel_session_id": excel_session_id,
+            "filename": Path(filepath).name,
+            "sheets": sheet_names,
+            "preview": preview_data,
+            "message": "Excel file uploaded. Please configure column mapping.",
+        }
+
+    except Exception as e:
+        logger.error(f"Excel upload failed: {e}")
+        return JSONResponse({"error": f"Excel upload failed: {e}"}, status_code=500)
+
+
+@router.post("/preview-excel-sheet")
+async def preview_excel_sheet(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Preview a specific sheet from uploaded Excel file."""
+    try:
+        import pandas as pd
+
+        data = await request.json()
+        excel_session_id = data.get("excel_session_id")
+        sheet_name = data.get("sheet_name")
+
+        session_data = store.get_import_session(excel_session_id)
+        if not session_data:
+            return JSONResponse({"error": "Invalid session ID"}, status_code=400)
+
+        excel_path = session_data["file_info"].get("path", "")
+        df = pd.read_excel(excel_path, sheet_name=sheet_name, nrows=10)
+
+        return {
+            "success": True,
+            "columns": df.columns.tolist(),
+            "sample_data": df.head(5).fillna("").to_dict("records"),
+        }
+
+    except Exception as e:
+        logger.error(f"Sheet preview failed: {e}")
+        return JSONResponse({"error": f"Sheet preview failed: {e}"}, status_code=500)
+
+
+@router.post("/configure-excel-mapping")
+async def configure_excel_mapping(
+    request: Request,
+    user: Dict = Depends(get_current_user),
+    store: SessionStore = Depends(get_session_store),
+):
+    """Configure Excel column mapping and load metadata into session."""
+    try:
+        import pandas as pd
+
+        data = await request.json()
+        excel_session_id = data.get("excel_session_id")
+        sheet_name = data.get("sheet_name")
+        column_mapping = data.get("column_mapping", {})
+
+        session_data = store.get_import_session(excel_session_id)
+        if not session_data:
+            return JSONResponse({"error": "Invalid session ID"}, status_code=400)
+
+        excel_path = session_data["file_info"].get("path", "")
+        df = pd.read_excel(excel_path, sheet_name=sheet_name)
+
+        filename_col = column_mapping.get("filename")
+        title_col = column_mapping.get("title")
+        description_col = column_mapping.get("description")
+        keywords_col = column_mapping.get("keywords")
+
+        if not filename_col:
+            return JSONResponse({"error": "Filename column is required"}, status_code=400)
+
+        metadata_map = {}
+        for _, row in df.iterrows():
+            fname = row.get(filename_col)
+            if pd.notna(fname) and str(fname).strip():
+                stem = Path(str(fname).strip()).stem.lower()
+                metadata_map[stem] = {
+                    "title": str(row.get(title_col, "")).strip() if title_col and pd.notna(row.get(title_col)) else "",
+                    "description": str(row.get(description_col, "")).strip() if description_col and pd.notna(row.get(description_col)) else "",
+                    "keywords": str(row.get(keywords_col, "")).strip() if keywords_col and pd.notna(row.get(keywords_col)) else "",
+                    "original_filename": str(fname).strip(),
+                }
+
+        # Store the built metadata_map in the session
+        store.update_import_session(excel_session_id, metadata_map=metadata_map)
+
+        stats = {
+            "total_records": len(metadata_map),
+            "with_title": sum(1 for v in metadata_map.values() if v.get("title")),
+            "with_description": sum(1 for v in metadata_map.values() if v.get("description")),
+            "with_keywords": sum(1 for v in metadata_map.values() if v.get("keywords")),
+        }
+
+        return {
+            "success": True,
+            "excel_session_id": excel_session_id,
+            "stats": stats,
+            "message": f"Configured mapping for {stats['total_records']} records from sheet \"{sheet_name}\"",
+        }
+
+    except Exception as e:
+        logger.error(f"Excel configuration failed: {e}")
+        return JSONResponse({"error": f"Excel configuration failed: {e}"}, status_code=500)
+
+
+class _ExcelLookupAdapter:
+    """Wraps a metadata_map dict to behave like ExcelMetadataLookup."""
+
+    def __init__(self, metadata_map: dict):
+        self.metadata_map = metadata_map
+
+    def lookup_by_filename(self, filename: str):
+        stem = Path(filename).stem.lower()
+        return self.metadata_map.get(stem)
--- a/app/security.py
+++ b/app/security.py
@ -0,0 +1,7 @@
+"""Security utilities: rate limiter, audit helper."""
+
+from slowapi import Limiter
+from slowapi.util import get_remote_address
+
+# Shared rate limiter instance
+limiter = Limiter(key_func=get_remote_address)
--- a/app/services/init.py
+++ b/app/services/init.py
--- a/app/services/admin_service.py
+++ b/app/services/admin_service.py
@ -0,0 +1,108 @@
+"""Admin service: user management, audit log, AI usage stats."""
+
+import logging
+from typing import Dict, List, Optional
+from datetime import datetime
+
+logger = logging.getLogger(__name__)
+
+
+class AdminService:
+    """Business logic for admin operations."""
+
+    def __init__(self, database):
+        self.db = database
+
+    # --- User Management ---
+
+    def list_users(self, include_inactive: bool = False) -> List[Dict]:
+        """Get all users with sanitized output (no password hashes)."""
+        users = self.db.get_all_users(include_inactive=include_inactive)
+        for user in users:
+            user.pop("password_hash", None)
+        return users
+
+    def get_user(self, user_id: int) -> Optional[Dict]:
+        """Get single user by ID."""
+        user = self.db.get_user_by_id(user_id)
+        if user:
+            user.pop("password_hash", None)
+        return user
+
+    def create_user(
+        self,
+        username: str,
+        email: str = "",
+        full_name: str = "",
+        role: str = "user",
+        password: str = None,
+        auth_method: str = "local",
+    ) -> Optional[int]:
+        """Create a new user."""
+        password_hash = None
+        if password:
+            from werkzeug.security import generate_password_hash
+            password_hash = generate_password_hash(password)
+
+        return self.db.create_user(
+            username=username,
+            password_hash=password_hash,
+            email=email,
+            full_name=full_name,
+            auth_method=auth_method,
+            role=role,
+        )
+
+    def update_user(self, user_id: int, updates: Dict) -> bool:
+        """Update user fields (role, is_active, full_name, email)."""
+        allowed_fields = {"role", "is_active", "full_name", "email"}
+        filtered = {k: v for k, v in updates.items() if k in allowed_fields}
+        if not filtered:
+            return False
+        return self.db.update_user(user_id, filtered)
+
+    def deactivate_user(self, user_id: int) -> bool:
+        """Deactivate a user account."""
+        return self.db.update_user(user_id, {"is_active": 0})
+
+    def activate_user(self, user_id: int) -> bool:
+        """Reactivate a user account."""
+        return self.db.update_user(user_id, {"is_active": 1})
+
+    # --- Audit Log ---
+
+    def get_audit_log(
+        self,
+        user_id: Optional[int] = None,
+        action: Optional[str] = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> List[Dict]:
+        """Get audit log with optional filters."""
+        return self.db.get_audit_log(
+            user_id=user_id,
+            action=action,
+            limit=limit,
+            offset=offset,
+        )
+
+    # --- AI Usage Stats ---
+
+    def get_ai_usage_stats(self) -> Dict:
+        """Get aggregate AI usage statistics."""
+        return self.db.get_ai_usage_stats()
+
+    def get_ai_usage_by_user(self, limit: int = 50) -> List[Dict]:
+        """Get AI usage broken down by user."""
+        return self.db.get_ai_usage_by_user(limit=limit)
+
+    # --- Dashboard Stats ---
+
+    def get_dashboard_stats(self) -> Dict:
+        """Get combined statistics for admin dashboard."""
+        db_stats = self.db.get_stats()
+        ai_stats = self.db.get_ai_usage_stats()
+        return {
+            **db_stats,
+            "ai_usage": ai_stats,
+        }
--- a/app/services/ai_service.py
+++ b/app/services/ai_service.py
@ -0,0 +1,189 @@
+"""Async wrapper around MetadataAnalyzer for non-blocking AI generation."""
+
+import asyncio
+import logging
+from typing import Dict, Optional
+
+logger = logging.getLogger(__name__)
+
+# Lazy-initialized singleton
+_analyzer = None
+
+# Progress queues per session (for SSE streaming)
+_progress_queues: Dict[str, asyncio.Queue] = {}
+
+
+def _get_analyzer():
+    """Lazy-initialize MetadataAnalyzer."""
+    global _analyzer
+    if _analyzer is None:
+        from app.config import get_settings
+        settings = get_settings()
+        if settings.OPENAI_API_KEY:
+            try:
+                from src.metadata_analyzer import MetadataAnalyzer
+                _analyzer = MetadataAnalyzer()
+                logger.info("MetadataAnalyzer initialized")
+            except Exception as e:
+                logger.error(f"Failed to initialize MetadataAnalyzer: {e}")
+    return _analyzer
+
+
+def get_progress_queue(session_id: str) -> asyncio.Queue:
+    """Get or create a progress queue for a session."""
+    if session_id not in _progress_queues:
+        _progress_queues[session_id] = asyncio.Queue()
+    return _progress_queues[session_id]
+
+
+def remove_progress_queue(session_id: str):
+    """Remove a progress queue when SSE connection closes."""
+    _progress_queues.pop(session_id, None)
+
+
+async def generate_metadata_async(
+    content: str,
+    filename: str,
+    file_type,
+) -> Dict[str, str]:
+    """Run AI metadata generation in a thread pool (non-blocking).
+
+    Args:
+        content: Extracted text content from the file.
+        filename: Original filename.
+        file_type: FileType enum value.
+
+    Returns:
+        Dict with 'title', 'subject', 'keywords' and internal fields.
+    """
+    analyzer = _get_analyzer()
+    if not analyzer:
+        return {
+            "title": "",
+            "subject": "AI generation not available (OpenAI API key not configured)",
+            "keywords": "",
+            "_ai_error": "OpenAI API key not configured",
+        }
+
+    if not content or len(content.strip()) < 10:
+        from pathlib import Path
+        return {
+            "title": Path(filename).stem,
+            "subject": "Insufficient content for AI analysis",
+            "keywords": "",
+            "_ai_error": "Not enough text content extracted",
+        }
+
+    loop = asyncio.get_event_loop()
+    try:
+        result = await loop.run_in_executor(
+            None, analyzer.analyze_content, content, filename, file_type
+        )
+        if "_tokens_used" in result:
+            logger.info(f"AI tokens used for {filename}: {result['_tokens_used']}")
+        return result
+    except Exception as e:
+        logger.error(f"AI generation failed for {filename}: {e}")
+        from pathlib import Path
+        return {
+            "title": Path(filename).stem,
+            "subject": f"AI generation error: {e}",
+            "keywords": "",
+            "_ai_error": str(e),
+        }
+
+
+async def process_bulk_ai(
+    session_id: str,
+    files_data: list,
+    store,
+    user_id: int,
+):
+    """Process multiple files with AI in background, sending progress via SSE.
+
+    Args:
+        session_id: File session ID.
+        files_data: List of dicts with {file_index, filepath, filename, file_type}.
+        store: SessionStore instance.
+        user_id: User ID for AI usage logging.
+    """
+    from .metadata_service import extract_content
+
+    queue = get_progress_queue(session_id)
+    total = len(files_data)
+    processed = 0
+    errors = 0
+
+    for i, file_info in enumerate(files_data):
+        file_index = file_info["file_index"]
+        filename = file_info["filename"]
+        filepath = file_info["filepath"]
+        file_type = file_info["file_type"]
+
+        # Send 'processing' event
+        await queue.put({
+            "type": "processing",
+            "file_index": file_index,
+            "filename": filename,
+            "current": i + 1,
+            "total": total,
+        })
+
+        try:
+            content = extract_content(filepath, file_type)
+            metadata = await generate_metadata_async(content, filename, file_type)
+
+            # Update session with result
+            store.update_file_in_session(session_id, file_index, {
+                "suggested_metadata": metadata,
+                "ai_status": "complete",
+            })
+
+            # Log AI usage
+            tokens_used = metadata.get("_tokens_used", 0)
+            if tokens_used and user_id:
+                try:
+                    from app.dependencies import get_database
+                    db = get_database()
+                    db.log_ai_usage(
+                        user_id=user_id,
+                        filename=filename,
+                        tokens_total=tokens_used,
+                        model=metadata.get("_model", ""),
+                    )
+                except Exception:
+                    pass
+
+            # Send 'file_complete' event
+            await queue.put({
+                "type": "file_complete",
+                "file_index": file_index,
+                "filename": filename,
+                "metadata": {
+                    "title": metadata.get("title", ""),
+                    "subject": metadata.get("subject", ""),
+                    "keywords": metadata.get("keywords", ""),
+                },
+            })
+            processed += 1
+
+        except Exception as e:
+            logger.error(f"Bulk AI error for {filename}: {e}")
+            errors += 1
+            store.update_file_in_session(session_id, file_index, {
+                "ai_status": "error",
+                "ai_error": str(e),
+            })
+            await queue.put({
+                "type": "error",
+                "file_index": file_index,
+                "filename": filename,
+                "error": str(e),
+            })
+
+    # Send 'done' event
+    await queue.put({
+        "type": "done",
+        "total_processed": processed,
+        "total_errors": errors,
+    })
--- a/app/services/auth_service.py
+++ b/app/services/auth_service.py
@ -0,0 +1,207 @@
+"""Framework-agnostic authentication service."""
+
+import os
+import secrets
+import logging
+from typing import Dict, Optional
+
+logger = logging.getLogger(__name__)
+
+
+class AuthService:
+    """Authentication logic extracted from src/auth.py, without Flask dependencies."""
+
+    def __init__(self, database):
+        self.db = database
+        self._sso = None
+
+    def authenticate_user(self, username: str, password: str) -> Dict:
+        """Authenticate user with username and password.
+
+        Returns dict with 'success' bool and either 'user' dict or 'error' message.
+        """
+        try:
+            from werkzeug.security import check_password_hash
+
+            user = self.db.get_user_by_username(username)
+            if user and user.get("password_hash"):
+                if check_password_hash(user["password_hash"], password):
+                    logger.info(f"User '{username}' authenticated successfully")
+                    return {"success": True, "user": user}
+
+            logger.warning(f"Authentication failed for user '{username}'")
+            return {"success": False, "error": "Invalid username or password"}
+
+        except ImportError:
+            logger.error("werkzeug not available - cannot verify passwords")
+            return {"success": False, "error": "Authentication system not available"}
+        except Exception as e:
+            logger.error(f"Authentication error: {e}")
+            return {"success": False, "error": "Authentication error occurred"}
+
+    def create_session(
+        self,
+        user: Dict,
+        ip_address: Optional[str] = None,
+        user_agent: Optional[str] = None,
+    ) -> Optional[str]:
+        """Create a new auth session for an authenticated user."""
+        session_id = secrets.token_urlsafe(32)
+        user_id = user["id"]
+
+        success = self.db.create_session(
+            user_id=user_id,
+            session_id=session_id,
+            expires_in_hours=24,
+            ip_address=ip_address,
+            user_agent=user_agent,
+        )
+
+        if success:
+            self.db.update_last_login(user_id)
+            self.db.log_action(user_id, "login", f"IP: {ip_address}")
+            logger.info(f"Created session for user {user['username']} (ID: {user_id})")
+            return session_id
+
+        logger.error(f"Failed to create session for user {user_id}")
+        return None
+
+    def destroy_session(self, session_id: str, user_id: Optional[int] = None):
+        """Destroy an auth session (logout)."""
+        self.db.delete_session(session_id)
+        if user_id:
+            self.db.log_action(user_id, "logout", f"Session: {session_id}")
+            logger.info(f"User {user_id} logged out")
+
+    def validate_session(self, session_id: str) -> Optional[Dict]:
+        """Validate a session and return session data if valid."""
+        return self.db.get_session(session_id)
+
+    def get_user_by_id(self, user_id: int) -> Optional[Dict]:
+        """Get user by ID."""
+        return self.db.get_user_by_id(user_id)
+
+    def cleanup_expired_sessions(self):
+        """Clean up expired auth sessions."""
+        self.db.cleanup_expired_sessions()
+
+    # --- Microsoft SSO ---
+
+    @property
+    def sso(self):
+        """Lazy-initialize Microsoft SSO."""
+        if self._sso is None:
+            self._sso = MicrosoftSSO()
+        return self._sso
+
+    @property
+    def sso_enabled(self) -> bool:
+        return self.sso.enabled
+
+
+class MicrosoftSSO:
+    """Microsoft SSO authentication handler using MSAL."""
+
+    def __init__(self):
+        self.client_id = os.getenv("AZURE_CLIENT_ID")
+        self.client_secret = os.getenv("AZURE_CLIENT_SECRET")
+        self.tenant_id = os.getenv("AZURE_TENANT_ID")
+        self.redirect_uri = os.getenv("REDIRECT_URI", "http://localhost:5001/auth/callback")
+
+        if not all([self.client_id, self.client_secret, self.tenant_id]):
+            self.enabled = False
+            logger.warning("Microsoft SSO not configured (missing Azure credentials)")
+            return
+
+        try:
+            import msal
+
+            self.authority = f"https://login.microsoftonline.com/{self.tenant_id}"
+            self.app = msal.ConfidentialClientApplication(
+                self.client_id,
+                authority=self.authority,
+                client_credential=self.client_secret,
+            )
+            self.enabled = True
+            logger.info("Microsoft SSO initialized successfully")
+        except ImportError:
+            self.enabled = False
+            logger.warning("Microsoft SSO not available (msal library not installed)")
+        except Exception as e:
+            self.enabled = False
+            logger.error(f"Failed to initialize Microsoft SSO: {e}")
+
+    def get_auth_url(self, state: Optional[str] = None) -> Optional[str]:
+        if not self.enabled:
+            return None
+        try:
+            return self.app.get_authorization_request_url(
+                scopes=["User.Read"],
+                state=state,
+                redirect_uri=self.redirect_uri,
+            )
+        except Exception as e:
+            logger.error(f"Error generating auth URL: {e}")
+            return None
+
+    def acquire_token(self, auth_code: str) -> Optional[Dict]:
+        if not self.enabled:
+            return None
+        try:
+            return self.app.acquire_token_by_authorization_code(
+                auth_code,
+                scopes=["User.Read"],
+                redirect_uri=self.redirect_uri,
+            )
+        except Exception as e:
+            logger.error(f"Error acquiring token: {e}")
+            return None
+
+    def get_user_info(self, access_token: str) -> Optional[Dict]:
+        if not self.enabled:
+            return None
+        try:
+            import requests
+
+            headers = {"Authorization": f"Bearer {access_token}"}
+            response = requests.get(
+                "https://graph.microsoft.com/v1.0/me",
+                headers=headers,
+                timeout=10,
+            )
+            if response.status_code == 200:
+                return response.json()
+            logger.error(f"Graph API error: {response.status_code}")
+            return None
+        except Exception as e:
+            logger.error(f"Error fetching user info: {e}")
+            return None
+
+    def create_or_update_user(self, user_info: Dict, database) -> Optional[Dict]:
+        """Create or update user from SSO login."""
+        try:
+            email = user_info.get("mail") or user_info.get("userPrincipalName")
+            username = email.split("@")[0] if email else user_info.get("displayName", "unknown")
+            full_name = user_info.get("displayName")
+
+            user = database.get_user_by_username(username)
+            if not user:
+                user_id = database.create_user(
+                    username=username,
+                    email=email,
+                    full_name=full_name,
+                    auth_method="sso",
+                )
+                if user_id:
+                    user = database.get_user_by_id(user_id)
+                    logger.info(f"Created new SSO user: {username}")
+                else:
+                    logger.error(f"Failed to create SSO user: {username}")
+                    return None
+            else:
+                logger.info(f"Existing SSO user logged in: {username}")
+
+            return user
+        except Exception as e:
+            logger.error(f"Error creating/updating SSO user: {e}")
+            return None
--- a/app/services/file_service.py
+++ b/app/services/file_service.py
@ -0,0 +1,99 @@
+"""File handling: upload, naming, cleanup."""
+
+import os
+import shutil
+import unicodedata
+import logging
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+
+def safe_filename(filename: str) -> str:
+    """Sanitize filename while preserving Unicode characters (CJK, etc.)."""
+    filename = unicodedata.normalize("NFC", filename)
+    filename = filename.replace("/", "_").replace("\\", "_").replace("\x00", "")
+    filename = filename.strip(". ")
+    if not filename:
+        filename = "unnamed_file"
+    return filename
+
+
+class FileService:
+    """Handles file uploads, per-user storage, and cleanup."""
+
+    def __init__(self, upload_folder: str, max_size_mb: int = 500):
+        self.upload_folder = Path(upload_folder)
+        self.upload_folder.mkdir(parents=True, exist_ok=True)
+        self.max_size_bytes = max_size_mb * 1024 * 1024
+
+    async def save_upload(self, upload_file, user_id: int) -> str:
+        """Save an uploaded file to disk using streaming.
+
+        Returns the path to the saved file.
+        """
+        filename = safe_filename(upload_file.filename or "unnamed")
+        user_dir = self.upload_folder / str(user_id)
+        user_dir.mkdir(parents=True, exist_ok=True)
+
+        filepath = user_dir / filename
+        # Handle name collisions
+        if filepath.exists():
+            stem = filepath.stem
+            suffix = filepath.suffix
+            counter = 1
+            while filepath.exists():
+                filepath = user_dir / f"{stem}_{counter}{suffix}"
+                counter += 1
+
+        # Stream to disk (handles large files without loading into memory)
+        with open(filepath, "wb") as f:
+            shutil.copyfileobj(upload_file.file, f)
+
+        size = filepath.stat().st_size
+        if size > self.max_size_bytes:
+            filepath.unlink()
+            raise ValueError(f"File exceeds {self.max_size_bytes // (1024*1024)}MB limit")
+
+        logger.info(f"Saved upload: {filepath.name} ({size} bytes) for user {user_id}")
+        return str(filepath)
+
+    def delete_file(self, filepath: str):
+        """Delete a file from disk."""
+        try:
+            path = Path(filepath)
+            if path.exists() and path.is_file():
+                path.unlink()
+                logger.info(f"Deleted file: {filepath}")
+        except Exception as e:
+            logger.warning(f"Failed to delete {filepath}: {e}")
+
+    def cleanup_user_files(self, user_id: int):
+        """Delete all files for a user."""
+        user_dir = self.upload_folder / str(user_id)
+        if user_dir.exists():
+            shutil.rmtree(user_dir, ignore_errors=True)
+            logger.info(f"Cleaned up files for user {user_id}")
+
+    def get_filepath(self, filename: str, user_id: Optional[int] = None) -> Optional[str]:
+        """Resolve filepath from filename. Checks user dir first, then root."""
+        if user_id:
+            user_path = self.upload_folder / str(user_id) / safe_filename(filename)
+            if user_path.exists():
+                return str(user_path)
+
+        root_path = self.upload_folder / safe_filename(filename)
+        if root_path.exists():
+            return str(root_path)
+
+        return None
+
+    def validate_filepath(self, filepath: str) -> bool:
+        """Validate that filepath is within upload folder (prevent traversal)."""
+        try:
+            resolved = Path(filepath).resolve()
+            upload_resolved = self.upload_folder.resolve()
+            return str(resolved).startswith(str(upload_resolved))
+        except Exception:
+            return False
--- a/app/services/metadata_service.py
+++ b/app/services/metadata_service.py
@ -0,0 +1,186 @@
+"""Metadata processing orchestration: upload → detect → extract → generate."""
+
+import logging
+from pathlib import Path
+from typing import Dict, Optional
+
+from src.file_detector import FileDetector, FileType
+from src.extractors.pdf_extractor import PDFExtractor
+from src.extractors.image_extractor import ImageExtractor
+from src.extractors.office_extractor import OfficeExtractor
+from src.extractors.video_extractor import VideoExtractor
+from src.updaters.pdf_updater import PDFUpdater
+from src.updaters.image_updater import ImageUpdater
+from src.updaters.office_updater import OfficeUpdater
+from src.updaters.video_updater import VideoUpdater
+
+logger = logging.getLogger(__name__)
+
+# Extractor/updater instances (stateless, safe to share)
+EXTRACTORS = {
+    FileType.PDF: PDFExtractor(),
+    FileType.IMAGE: ImageExtractor(),
+    FileType.OFFICE_DOC: OfficeExtractor(),
+    FileType.OFFICE_SHEET: OfficeExtractor(),
+    FileType.OFFICE_PRESENTATION: OfficeExtractor(),
+    FileType.VIDEO: VideoExtractor(),
+}
+
+UPDATERS = {
+    FileType.PDF: PDFUpdater(),
+    FileType.IMAGE: ImageUpdater(),
+    FileType.OFFICE_DOC: OfficeUpdater(),
+    FileType.OFFICE_SHEET: OfficeUpdater(),
+    FileType.OFFICE_PRESENTATION: OfficeUpdater(),
+    FileType.VIDEO: VideoUpdater(),
+}
+
+
+def detect_file(filepath: str) -> FileType:
+    """Detect the type of a file."""
+    return FileDetector.detect_file_type(filepath)
+
+
+def extract_metadata(filepath: str, file_type: FileType) -> Dict[str, str]:
+    """Read current metadata from file."""
+    extractor = EXTRACTORS.get(file_type)
+    if not extractor:
+        return {}
+    try:
+        return extractor.read_metadata(filepath)
+    except Exception as e:
+        logger.error(f"Failed to extract metadata from {filepath}: {e}")
+        return {}
+
+
+def extract_content(filepath: str, file_type: FileType) -> str:
+    """Extract text content for AI analysis."""
+    extractor = EXTRACTORS.get(file_type)
+    if not extractor:
+        return ""
+    try:
+        return extractor.extract_content(filepath)
+    except Exception as e:
+        logger.error(f"Failed to extract content from {filepath}: {e}")
+        return ""
+
+
+def update_file_metadata(
+    filepath: str,
+    file_type: FileType,
+    metadata: Dict[str, str],
+    backup: bool = False,
+) -> bool:
+    """Write metadata to file. Returns True on success."""
+    updater = UPDATERS.get(file_type)
+    if not updater:
+        logger.error(f"No updater for file type: {file_type}")
+        return False
+    try:
+        return updater.update_metadata(filepath, metadata, backup=backup)
+    except Exception as e:
+        logger.error(f"Failed to update metadata for {filepath}: {e}")
+        return False
+
+
+def verify_file_metadata(
+    filepath: str,
+    file_type: FileType,
+    metadata: Dict[str, str],
+) -> bool:
+    """Verify metadata was written correctly."""
+    updater = UPDATERS.get(file_type)
+    if not updater:
+        return False
+    try:
+        return updater.verify_metadata(filepath, metadata)
+    except Exception as e:
+        logger.error(f"Failed to verify metadata for {filepath}: {e}")
+        return False
+
+
+async def process_uploaded_file(
+    filepath: str,
+    filename: str,
+    metadata_source: str,
+    lookup=None,
+    import_map=None,
+) -> Dict:
+    """Process a single uploaded file through the full pipeline.
+
+    Args:
+        filepath: Path to uploaded file on disk.
+        filename: Original filename.
+        metadata_source: One of 'excel', 'ai', 'manual', 'import'.
+        lookup: Excel lookup instance (for excel source).
+        import_map: Metadata map dict (for import source).
+
+    Returns:
+        Dict with file processing results.
+    """
+    file_type = detect_file(filepath)
+
+    if file_type == FileType.UNSUPPORTED:
+        return {"success": False, "filename": filename, "error": "Unsupported file type"}
+
+    # Read current metadata
+    old_metadata = extract_metadata(filepath, file_type)
+
+    # Generate new metadata based on source
+    excel_found = False
+    new_metadata = {"title": "", "subject": "", "keywords": ""}
+
+    if metadata_source == "excel" and lookup:
+        excel_data = lookup.lookup_by_filename(filename)
+        if excel_data:
+            new_metadata = {
+                "title": excel_data.get("title", ""),
+                "subject": excel_data.get("description", ""),
+                "keywords": "",
+            }
+            excel_found = True
+        else:
+            new_metadata = {
+                "title": Path(filename).stem,
+                "subject": f"No metadata found in Excel for {filename}",
+                "keywords": "",
+            }
+
+    elif metadata_source == "manual":
+        new_metadata = {
+            "title": Path(filename).stem,
+            "subject": "",
+            "keywords": "",
+        }
+
+    elif metadata_source == "ai":
+        from .ai_service import generate_metadata_async
+
+        content = extract_content(filepath, file_type)
+        new_metadata = await generate_metadata_async(content, filename, file_type)
+
+    elif metadata_source == "import" and import_map:
+        from src.metadata_importer import MetadataImporter
+
+        importer = MetadataImporter()
+        imported = importer.get_metadata_for_file(import_map, filename)
+        if imported:
+            new_metadata = imported
+            excel_found = True
+        else:
+            new_metadata = {
+                "title": Path(filename).stem,
+                "subject": f"No metadata found in imported file for {filename}",
+                "keywords": "",
+            }
+
+    return {
+        "success": True,
+        "filename": filename,
+        "file_type": file_type.value,
+        "current_metadata": old_metadata,
+        "suggested_metadata": new_metadata,
+        "filepath": filepath,
+        "metadata_source": metadata_source,
+        "excel_found": excel_found,
+    }
--- a/app/session/init.py
+++ b/app/session/init.py
--- a/app/session/store.py
+++ b/app/session/store.py
@ -0,0 +1,298 @@
+"""SQLite-backed session store for file processing and import sessions."""
+
+import json
+import sqlite3
+import secrets
+import logging
+from datetime import datetime, timedelta
+from typing import Optional, Dict, List, Any
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+class SessionStore:
+    """Persistent session store replacing in-memory dicts.
+
+    Stores file processing sessions and imported metadata maps in SQLite,
+    surviving server restarts and supporting multi-worker deployments.
+    """
+
+    def __init__(self, db_path: str):
+        self.db_path = db_path
+        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
+        self._init_tables()
+
+    def _get_conn(self) -> sqlite3.Connection:
+        """Create a new connection per call (thread-safe)."""
+        conn = sqlite3.connect(self.db_path, timeout=10)
+        conn.row_factory = sqlite3.Row
+        conn.execute("PRAGMA journal_mode=WAL")
+        return conn
+
+    def _init_tables(self):
+        conn = self._get_conn()
+        try:
+            conn.execute("""
+                CREATE TABLE IF NOT EXISTS file_sessions (
+                    session_id TEXT PRIMARY KEY,
+                    user_id INTEGER NOT NULL,
+                    metadata_source TEXT DEFAULT 'manual',
+                    import_session_id TEXT DEFAULT '',
+                    files_json TEXT DEFAULT '[]',
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    expires_at TIMESTAMP NOT NULL
+                )
+            """)
+            conn.execute("""
+                CREATE TABLE IF NOT EXISTS import_sessions (
+                    session_id TEXT PRIMARY KEY,
+                    user_id INTEGER NOT NULL,
+                    session_type TEXT DEFAULT 'import',
+                    metadata_json TEXT DEFAULT '{}',
+                    file_info_json TEXT DEFAULT '{}',
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    expires_at TIMESTAMP NOT NULL
+                )
+            """)
+            conn.execute("CREATE INDEX IF NOT EXISTS idx_fs_user ON file_sessions(user_id)")
+            conn.execute("CREATE INDEX IF NOT EXISTS idx_fs_expires ON file_sessions(expires_at)")
+            conn.execute("CREATE INDEX IF NOT EXISTS idx_is_user ON import_sessions(user_id)")
+            conn.execute("CREATE INDEX IF NOT EXISTS idx_is_expires ON import_sessions(expires_at)")
+            conn.commit()
+            logger.info(f"Session store initialized at {self.db_path}")
+        finally:
+            conn.close()
+
+    # --- File Sessions ---
+
+    def create_file_session(
+        self,
+        user_id: int,
+        metadata_source: str = "manual",
+        import_session_id: str = "",
+        expires_hours: int = 24,
+    ) -> str:
+        """Create a new file processing session with a secure random ID."""
+        session_id = secrets.token_urlsafe(32)
+        expires_at = datetime.now() + timedelta(hours=expires_hours)
+        conn = self._get_conn()
+        try:
+            conn.execute(
+                "INSERT INTO file_sessions (session_id, user_id, metadata_source, import_session_id, expires_at) VALUES (?,?,?,?,?)",
+                (session_id, user_id, metadata_source, import_session_id, expires_at),
+            )
+            conn.commit()
+            logger.info(f"Created file session {session_id[:8]}... for user {user_id}")
+            return session_id
+        finally:
+            conn.close()
+
+    def get_file_session(self, session_id: str) -> Optional[Dict[str, Any]]:
+        """Get file session by ID. Returns None if expired or not found."""
+        conn = self._get_conn()
+        try:
+            row = conn.execute(
+                "SELECT * FROM file_sessions WHERE session_id = ? AND expires_at > datetime('now')",
+                (session_id,),
+            ).fetchone()
+            if row:
+                result = dict(row)
+                result["files"] = json.loads(result.pop("files_json"))
+                return result
+            return None
+        finally:
+            conn.close()
+
+    def add_file_to_session(self, session_id: str, file_entry: Dict[str, Any]):
+        """Add a processed file entry to a session."""
+        conn = self._get_conn()
+        try:
+            row = conn.execute(
+                "SELECT files_json FROM file_sessions WHERE session_id = ?",
+                (session_id,),
+            ).fetchone()
+            if row:
+                files = json.loads(row["files_json"])
+                files.append(file_entry)
+                conn.execute(
+                    "UPDATE file_sessions SET files_json = ? WHERE session_id = ?",
+                    (json.dumps(files, ensure_ascii=False), session_id),
+                )
+                conn.commit()
+        finally:
+            conn.close()
+
+    def update_file_in_session(
+        self, session_id: str, file_index: int, updates: Dict[str, Any]
+    ):
+        """Update specific fields of a file entry within a session."""
+        conn = self._get_conn()
+        try:
+            row = conn.execute(
+                "SELECT files_json FROM file_sessions WHERE session_id = ?",
+                (session_id,),
+            ).fetchone()
+            if row:
+                files = json.loads(row["files_json"])
+                if 0 <= file_index < len(files):
+                    files[file_index].update(updates)
+                    conn.execute(
+                        "UPDATE file_sessions SET files_json = ? WHERE session_id = ?",
+                        (json.dumps(files, ensure_ascii=False), session_id),
+                    )
+                    conn.commit()
+        finally:
+            conn.close()
+
+    def get_file_session_files(self, session_id: str) -> List[Dict[str, Any]]:
+        """Get just the files list from a session."""
+        session = self.get_file_session(session_id)
+        if session:
+            return session["files"]
+        return []
+
+    def delete_file_session(self, session_id: str):
+        """Delete a file session."""
+        conn = self._get_conn()
+        try:
+            conn.execute("DELETE FROM file_sessions WHERE session_id = ?", (session_id,))
+            conn.commit()
+        finally:
+            conn.close()
+
+    def get_user_file_sessions(self, user_id: int) -> List[str]:
+        """Get all active session IDs for a user."""
+        conn = self._get_conn()
+        try:
+            rows = conn.execute(
+                "SELECT session_id FROM file_sessions WHERE user_id = ? AND expires_at > datetime('now')",
+                (user_id,),
+            ).fetchall()
+            return [row["session_id"] for row in rows]
+        finally:
+            conn.close()
+
+    # --- Import Sessions ---
+
+    def create_import_session(
+        self,
+        user_id: int,
+        session_type: str = "import",
+        metadata_map: Optional[Dict] = None,
+        file_info: Optional[Dict] = None,
+        expires_hours: int = 24,
+    ) -> str:
+        """Create an import/excel session."""
+        session_id = f"{session_type}_{secrets.token_urlsafe(8)}"
+        expires_at = datetime.now() + timedelta(hours=expires_hours)
+        conn = self._get_conn()
+        try:
+            conn.execute(
+                "INSERT INTO import_sessions (session_id, user_id, session_type, metadata_json, file_info_json, expires_at) VALUES (?,?,?,?,?,?)",
+                (
+                    session_id,
+                    user_id,
+                    session_type,
+                    json.dumps(metadata_map or {}, ensure_ascii=False),
+                    json.dumps(file_info or {}, ensure_ascii=False),
+                    expires_at,
+                ),
+            )
+            conn.commit()
+            logger.info(f"Created {session_type} session {session_id} for user {user_id}")
+            return session_id
+        finally:
+            conn.close()
+
+    def get_import_session(self, session_id: str) -> Optional[Dict[str, Any]]:
+        """Get import session by ID."""
+        conn = self._get_conn()
+        try:
+            row = conn.execute(
+                "SELECT * FROM import_sessions WHERE session_id = ? AND expires_at > datetime('now')",
+                (session_id,),
+            ).fetchone()
+            if row:
+                result = dict(row)
+                result["metadata_map"] = json.loads(result.pop("metadata_json"))
+                result["file_info"] = json.loads(result.pop("file_info_json"))
+                return result
+            return None
+        finally:
+            conn.close()
+
+    def update_import_session(
+        self,
+        session_id: str,
+        metadata_map: Optional[Dict] = None,
+        file_info: Optional[Dict] = None,
+    ):
+        """Update an import session's metadata map or file info."""
+        conn = self._get_conn()
+        try:
+            updates = []
+            params = []
+            if metadata_map is not None:
+                updates.append("metadata_json = ?")
+                params.append(json.dumps(metadata_map, ensure_ascii=False))
+            if file_info is not None:
+                updates.append("file_info_json = ?")
+                params.append(json.dumps(file_info, ensure_ascii=False))
+            if updates:
+                params.append(session_id)
+                conn.execute(
+                    f"UPDATE import_sessions SET {', '.join(updates)} WHERE session_id = ?",
+                    params,
+                )
+                conn.commit()
+        finally:
+            conn.close()
+
+    def delete_import_session(self, session_id: str):
+        """Delete an import session."""
+        conn = self._get_conn()
+        try:
+            conn.execute("DELETE FROM import_sessions WHERE session_id = ?", (session_id,))
+            conn.commit()
+        finally:
+            conn.close()
+
+    # --- Cleanup ---
+
+    def cleanup_expired(self) -> int:
+        """Remove all expired sessions. Returns count of deleted rows."""
+        conn = self._get_conn()
+        try:
+            c1 = conn.execute("DELETE FROM file_sessions WHERE expires_at < datetime('now')")
+            c2 = conn.execute("DELETE FROM import_sessions WHERE expires_at < datetime('now')")
+            conn.commit()
+            total = c1.rowcount + c2.rowcount
+            if total > 0:
+                logger.info(f"Cleaned up {total} expired sessions")
+            return total
+        finally:
+            conn.close()
+
+    def cleanup_user_sessions(self, user_id: int) -> List[str]:
+        """Delete all sessions for a user. Returns file paths for cleanup."""
+        conn = self._get_conn()
+        try:
+            # Collect file paths before deleting
+            rows = conn.execute(
+                "SELECT files_json FROM file_sessions WHERE user_id = ?",
+                (user_id,),
+            ).fetchall()
+            file_paths = []
+            for row in rows:
+                files = json.loads(row["files_json"])
+                for f in files:
+                    if f.get("filepath"):
+                        file_paths.append(f["filepath"])
+
+            conn.execute("DELETE FROM file_sessions WHERE user_id = ?", (user_id,))
+            conn.execute("DELETE FROM import_sessions WHERE user_id = ?", (user_id,))
+            conn.commit()
+            return file_paths
+        finally:
+            conn.close()
--- a/deploy.sh
+++ b/deploy.sh
@ -0,0 +1,78 @@
+#!/bin/bash
+# Solventum Image Metadata — Idempotent Deployment Script
+# Usage: ./deploy.sh
+#
+# First run:
+#   cd /opt/oliver-metadata-tool
+#   cp .env.example .env   # edit with your secrets
+#   chmod +x deploy.sh
+#   ./deploy.sh
+#
+# Subsequent updates:
+#   cd /opt/oliver-metadata-tool && ./deploy.sh
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+COMPOSE_PROJECT="solventum-image-metadata"
+
+cd "$SCRIPT_DIR"
+
+echo "=== Solventum Image Metadata — Deploy ==="
+echo "Directory: $SCRIPT_DIR"
+echo ""
+
+# 1. Pull latest code from Bitbucket
+echo ">>> Pulling latest code..."
+git pull
+
+# 2. Check .env exists (first-run guard)
+if [ ! -f .env ]; then
+    echo ""
+    echo "ERROR: .env file not found!"
+    echo ""
+    echo "  cp .env.example .env"
+    echo "  Then edit .env with your secrets (AZURE_CLIENT_SECRET, SECRET_KEY, etc.)"
+    echo ""
+    exit 1
+fi
+
+# 3. Build Docker image (uses layer cache, picks up code changes via COPY . .)
+echo ">>> Building Docker image..."
+docker compose -p "$COMPOSE_PROJECT" build
+
+# 4. Start or restart containers (idempotent — creates if missing, restarts if running)
+echo ">>> Starting containers..."
+docker compose -p "$COMPOSE_PROJECT" up -d
+
+# 5. Wait for health check
+#    Database auto-initializes on first container startup:
+#    - Tables created via CREATE TABLE IF NOT EXISTS
+#    - Migrations run in-code (check-before-act pattern)
+#    - Superadmin created if SUPERADMIN_EMAIL is set
+echo ">>> Waiting for app to be healthy..."
+HEALTHY=false
+for i in $(seq 1 20); do
+    if curl -sf http://127.0.0.1:5001/login > /dev/null 2>&1; then
+        echo ">>> App is healthy!"
+        HEALTHY=true
+        break
+    fi
+    echo "  Waiting... ($i/20)"
+    sleep 3
+done
+
+if [ "$HEALTHY" = false ]; then
+    echo ""
+    echo "WARNING: App may not be healthy after 60 seconds."
+    echo "Check logs:"
+    echo "  docker compose -p $COMPOSE_PROJECT logs --tail 50"
+    echo ""
+    exit 1
+fi
+
+echo ""
+echo "=== Deploy complete ==="
+echo "URL: https://ai-sandbox.oliver.solutions/solventum-image-metadata/"
+echo ""
+docker compose -p "$COMPOSE_PROJECT" ps
--- a/deploy/apache-solventum-metadata.conf
+++ b/deploy/apache-solventum-metadata.conf
@ -0,0 +1,17 @@
+# Solventum Image Metadata Tool — Apache Config Additions
+# Add these directives inside your existing <VirtualHost *:443> for ai-sandbox.oliver.solutions
+#
+# The main reverse proxy rule is already configured:
+#   ProxyPass /solventum-image-metadata/ http://localhost:5001/
+#   ProxyPassReverse /solventum-image-metadata/ http://localhost:5001/
+
+# SSE support (disable buffering for realtime AI progress events)
+<LocationMatch "^/solventum-image-metadata/events/">
+    SetEnv proxy-sendchunked 1
+    SetEnv proxy-interim-response RFC
+</LocationMatch>
+
+# Upload size limit (500MB)
+<Location /solventum-image-metadata/>
+    LimitRequestBody 524288000
+</Location>
--- a/deploy/deploy.sh
+++ b/deploy/deploy.sh
@ -0,0 +1,94 @@
+#!/bin/bash
+# Oliver Metadata Tool — Deployment Script
+# Usage: ./deploy.sh [--first-run]
+set -euo pipefail
+
+APP_DIR="/var/www/oliver"
+SERVICE_NAME="oliver-metadata"
+VENV_DIR="$APP_DIR/venv"
+REPO_BRANCH="${DEPLOY_BRANCH:-main}"
+
+echo "=== Oliver Metadata Tool Deployment ==="
+echo "Directory: $APP_DIR"
+echo "Service:   $SERVICE_NAME"
+echo ""
+
+# Check we're running as root or with sudo
+if [ "$EUID" -ne 0 ]; then
+    echo "Please run with sudo"
+    exit 1
+fi
+
+cd "$APP_DIR"
+
+# First run setup
+if [ "${1:-}" = "--first-run" ]; then
+    echo ">>> First-run setup..."
+
+    # System dependencies
+    apt-get update
+    apt-get install -y python3.11 python3.11-venv python3.11-dev \
+        libimage-exiftool-perl tesseract-ocr tesseract-ocr-eng \
+        tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor \
+        poppler-utils ffmpeg gcc
+
+    # Create venv
+    python3.11 -m venv "$VENV_DIR"
+
+    # Create directories
+    mkdir -p "$APP_DIR/uploads" "$APP_DIR/data" "$APP_DIR/templates_saved"
+
+    # Set permissions
+    chown -R www-data:www-data "$APP_DIR"
+
+    # Install systemd service
+    cp "$APP_DIR/deploy/oliver-metadata.service" /etc/systemd/system/
+    systemctl daemon-reload
+    systemctl enable "$SERVICE_NAME"
+
+    # Install Apache config (if Apache is installed)
+    if command -v apache2 &> /dev/null; then
+        cp "$APP_DIR/deploy/oliver-metadata.conf" /etc/apache2/sites-available/
+        a2enmod proxy proxy_http headers rewrite ssl expires
+        a2ensite oliver-metadata
+        echo ">>> Apache config installed. Update SSL paths and restart Apache."
+    fi
+
+    echo ">>> First-run setup complete."
+    echo ">>> Edit $APP_DIR/.env before starting the service."
+    echo ""
+fi
+
+# Pull latest code
+echo ">>> Pulling latest code..."
+sudo -u www-data git pull origin "$REPO_BRANCH"
+
+# Install/update Python deps
+echo ">>> Installing Python dependencies..."
+"$VENV_DIR/bin/pip" install --upgrade pip
+"$VENV_DIR/bin/pip" install -r requirements.txt
+
+# Restart service
+echo ">>> Restarting service..."
+systemctl restart "$SERVICE_NAME"
+
+# Wait for health
+echo ">>> Waiting for service to start..."
+sleep 3
+
+# Health check
+for i in {1..10}; do
+    if curl -sf http://127.0.0.1:5001/login > /dev/null 2>&1; then
+        echo ">>> Service is healthy!"
+        systemctl status "$SERVICE_NAME" --no-pager -l
+        echo ""
+        echo "=== Deployment complete ==="
+        exit 0
+    fi
+    echo "  Waiting... ($i/10)"
+    sleep 2
+done
+
+echo ">>> WARNING: Service may not be healthy. Check logs:"
+echo "  journalctl -u $SERVICE_NAME -n 50 --no-pager"
+exit 1
--- a/deploy/oliver-metadata.conf
+++ b/deploy/oliver-metadata.conf
@ -0,0 +1,57 @@
+<VirtualHost *:443>
+    ServerName metadata.oliver.agency
+
+    # SSL — provide your own certificates
+    SSLEngine on
+    SSLCertificateFile /etc/ssl/certs/oliver-metadata.crt
+    SSLCertificateKeyFile /etc/ssl/private/oliver-metadata.key
+    # SSLCertificateChainFile /etc/ssl/certs/ca-bundle.crt
+
+    # Serve static files directly via Apache (bypass gunicorn)
+    Alias /static /var/www/oliver/static
+    <Directory /var/www/oliver/static>
+        Require all granted
+        Options -Indexes
+        ExpiresActive On
+        ExpiresDefault "access plus 1 week"
+        Header set Cache-Control "public, max-age=604800"
+    </Directory>
+
+    # Proxy to gunicorn/uvicorn
+    ProxyPreserveHost On
+    ProxyPass /static !
+    ProxyPass / http://127.0.0.1:5001/
+    ProxyPassReverse / http://127.0.0.1:5001/
+
+    # SSE support — disable buffering for event streams
+    <LocationMatch "/events/">
+        ProxyPass http://127.0.0.1:5001
+        ProxyPassReverse http://127.0.0.1:5001
+        SetEnv proxy-sendchunked 1
+        SetEnv proxy-interim-response RFC
+    </LocationMatch>
+
+    # Timeouts (AI generation can take 30+ seconds per file)
+    ProxyTimeout 120
+    Timeout 120
+
+    # Upload size limit (500MB)
+    LimitRequestBody 524288000
+
+    # Security headers
+    Header always set X-Content-Type-Options "nosniff"
+    Header always set X-Frame-Options "DENY"
+    Header always set X-XSS-Protection "1; mode=block"
+    Header always set Referrer-Policy "strict-origin-when-cross-origin"
+
+    # Logging
+    ErrorLog ${APACHE_LOG_DIR}/oliver-metadata-error.log
+    CustomLog ${APACHE_LOG_DIR}/oliver-metadata-access.log combined
+</VirtualHost>
+
+# Redirect HTTP to HTTPS
+<VirtualHost *:80>
+    ServerName metadata.oliver.agency
+    RewriteEngine On
+    RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R=301,L]
+</VirtualHost>
--- a/deploy/oliver-metadata.service
+++ b/deploy/oliver-metadata.service
@ -0,0 +1,37 @@
+[Unit]
+Description=Oliver Metadata Tool (FastAPI)
+After=network.target
+Wants=network-online.target
+
+[Service]
+Type=notify
+User=www-data
+Group=www-data
+WorkingDirectory=/var/www/oliver
+Environment="PATH=/var/www/oliver/venv/bin:/usr/local/bin:/usr/bin:/bin"
+EnvironmentFile=/var/www/oliver/.env
+
+ExecStart=/var/www/oliver/venv/bin/gunicorn app.main:app \
+    --worker-class uvicorn.workers.UvicornWorker \
+    --workers 2 \
+    --bind 127.0.0.1:5001 \
+    --timeout 120 \
+    --graceful-timeout 30 \
+    --access-logfile - \
+    --error-logfile -
+
+ExecReload=/bin/kill -s HUP $MAINPID
+KillMode=mixed
+TimeoutStopSec=10
+Restart=on-failure
+RestartSec=5
+
+# Security hardening
+NoNewPrivileges=yes
+ProtectSystem=strict
+ProtectHome=yes
+ReadWritePaths=/var/www/oliver/uploads /var/www/oliver/data /var/www/oliver/oliver_metadata.db /var/www/oliver/oliver_sessions.db /tmp
+PrivateTmp=yes
+
+[Install]
+WantedBy=multi-user.target
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,44 @@
+services:
+  oliver-metadata:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: oliver-metadata-tool
+    ports:
+      - "127.0.0.1:5001:5001"
+    volumes:
+      # Persistent storage for uploads
+      - uploads:/app/uploads
+      # Persistent storage for database
+      - database:/app/data
+      # Persistent storage for output/backups/reports
+      - output:/app/output
+
+    # Load environment variables from .env file (if exists)
+    env_file:
+      - .env
+
+    environment:
+      # Docker mode enabled
+      - DOCKER_MODE=true
+
+    restart: unless-stopped
+
+    healthcheck:
+      test: ["CMD", "curl", "-sf", "http://localhost:5001/login"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+
+volumes:
+  uploads:
+    driver: local
+  database:
+    driver: local
+  output:
+    driver: local
+
+networks:
+  default:
+    name: oliver-metadata-network
--- a/docker-run.sh
+++ b/docker-run.sh
@ -0,0 +1,165 @@
+#!/bin/bash
+# Oliver Metadata Tool - Docker Management Script
+
+set -e
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Functions
+print_header() {
+    echo -e "${BLUE}============================================${NC}"
+    echo -e "${BLUE}  Oliver Metadata Tool - Docker Manager${NC}"
+    echo -e "${BLUE}============================================${NC}"
+}
+
+print_success() {
+    echo -e "${GREEN}✓ $1${NC}"
+}
+
+print_error() {
+    echo -e "${RED}✗ $1${NC}"
+}
+
+print_info() {
+    echo -e "${YELLOW}ℹ $1${NC}"
+}
+
+# Check if Docker is installed
+check_docker() {
+    if ! command -v docker &> /dev/null; then
+        print_error "Docker is not installed. Please install Docker first."
+        exit 1
+    fi
+
+    if ! command -v docker-compose &> /dev/null && ! docker compose version &> /dev/null; then
+        print_error "Docker Compose is not installed. Please install Docker Compose first."
+        exit 1
+    fi
+}
+
+# Build Docker image
+build() {
+    print_header
+    print_info "Building Docker image..."
+    docker-compose build
+    print_success "Docker image built successfully"
+}
+
+# Start containers
+start() {
+    print_header
+    print_info "Starting Oliver Metadata Tool..."
+    docker-compose up -d
+    print_success "Application started successfully"
+    print_info "Access the application at: http://localhost:5001"
+    print_info "Default credentials: tester / oliveradmin"
+}
+
+# Stop containers
+stop() {
+    print_header
+    print_info "Stopping Oliver Metadata Tool..."
+    docker-compose down
+    print_success "Application stopped successfully"
+}
+
+# View logs
+logs() {
+    print_header
+    print_info "Showing application logs (Ctrl+C to exit)..."
+    docker-compose logs -f
+}
+
+# Restart containers
+restart() {
+    print_header
+    print_info "Restarting Oliver Metadata Tool..."
+    docker-compose restart
+    print_success "Application restarted successfully"
+}
+
+# Show status
+status() {
+    print_header
+    docker-compose ps
+}
+
+# Clean up (remove containers and volumes)
+clean() {
+    print_header
+    print_error "WARNING: This will remove all containers, volumes, and data!"
+    read -p "Are you sure? (yes/no): " confirm
+    if [ "$confirm" == "yes" ]; then
+        print_info "Cleaning up..."
+        docker-compose down -v
+        print_success "Cleanup completed"
+    else
+        print_info "Cleanup cancelled"
+    fi
+}
+
+# Show help
+show_help() {
+    print_header
+    echo ""
+    echo "Usage: ./docker-run.sh [command]"
+    echo ""
+    echo "Commands:"
+    echo "  build    - Build Docker image"
+    echo "  start    - Start the application"
+    echo "  stop     - Stop the application"
+    echo "  restart  - Restart the application"
+    echo "  logs     - View application logs"
+    echo "  status   - Show container status"
+    echo "  clean    - Remove containers and volumes (WARNING: deletes data)"
+    echo "  help     - Show this help message"
+    echo ""
+    echo "Examples:"
+    echo "  ./docker-run.sh build    # Build image"
+    echo "  ./docker-run.sh start    # Start application"
+    echo "  ./docker-run.sh logs     # View logs"
+    echo ""
+}
+
+# Main script
+check_docker
+
+case "$1" in
+    build)
+        build
+        ;;
+    start)
+        start
+        ;;
+    stop)
+        stop
+        ;;
+    restart)
+        restart
+        ;;
+    logs)
+        logs
+        ;;
+    status)
+        status
+        ;;
+    clean)
+        clean
+        ;;
+    help|--help|-h)
+        show_help
+        ;;
+    "")
+        show_help
+        ;;
+    *)
+        print_error "Unknown command: $1"
+        show_help
+        exit 1
+        ;;
+esac
--- a/docs/EXIFTOOL_SETUP.md
+++ b/docs/EXIFTOOL_SETUP.md
@ -0,0 +1,243 @@
+# ExifTool Setup Guide
+
+ExifTool is a powerful command-line application for reading, writing, and editing metadata in a wide variety of files. Oliver Metadata Tool uses ExifTool to provide enhanced metadata support for 300+ file formats.
+
+## Why ExifTool?
+
+- **Unified API**: Single tool handles images, videos, PDFs, and more
+- **300+ formats**: Support for virtually all media file types
+- **Better performance**: Optimized batch operations (10-60x faster)
+- **Battle-tested**: 20+ years of development and widespread use
+- **PDF writing support**: Can write PDF metadata (unlike pypdf)
+
+## Installation
+
+### macOS
+
+```bash
+brew install exiftool
+```
+
+Verify installation:
+```bash
+exiftool -ver
+# Should show version 12.15 or higher
+```
+
+### Linux (Ubuntu/Debian)
+
+```bash
+sudo apt-get update
+sudo apt-get install libimage-exiftool-perl
+```
+
+Verify installation:
+```bash
+exiftool -ver
+```
+
+### Linux (Fedora/RHEL/CentOS)
+
+```bash
+sudo yum install perl-Image-ExifTool
+```
+
+### Windows
+
+**Option 1: Chocolatey**
+```powershell
+choco install exiftool
+```
+
+**Option 2: Manual installation**
+1. Download from https://exiftool.org/
+2. Extract the `.zip` file
+3. Rename `exiftool(-k).exe` to `exiftool.exe`
+4. Add the directory to your PATH
+
+Verify installation:
+```powershell
+exiftool -ver
+```
+
+## Verification
+
+After installation, verify ExifTool is accessible:
+
+```bash
+# Check version
+exiftool -ver
+
+# Check location
+which exiftool  # macOS/Linux
+where exiftool  # Windows
+
+# Test with a file
+exiftool your-image.jpg
+```
+
+## What Oliver Metadata Tool Uses ExifTool For
+
+### Supported Operations
+
+1. **Images (JPEG, PNG, GIF, TIFF, HEIC, RAW formats)**
+   - Read/write Title, Description, Keywords
+   - Access EXIF, IPTC, XMP metadata
+   - Support for camera metadata
+
+2. **Videos (MP4, MOV, AVI, MKV)**
+   - Read/write Title, Description, Keywords
+   - QuickTime metadata support
+   - Unified API across formats
+
+3. **PDFs**
+   - Read/write PDF metadata fields
+   - Better than pypdf for metadata writing
+   - Preserves document structure
+
+### Format Coverage
+
+ExifTool provides support for these additional formats beyond Python libraries:
+
+- **Images**: HEIC, CR2, NEF, ARW, DNG (RAW formats)
+- **Video**: MKV, WebM, FLV, WMV (extended video formats)
+- **Audio**: MP3, FLAC, WAV, OGG (audio files)
+- **Documents**: EPUB, MOBI (ebook formats)
+- **3D/CAD**: STL, DWG, DXF
+- And 250+ more formats
+
+## PyExifTool Python Wrapper
+
+Oliver Metadata Tool uses the PyExifTool library to interact with ExifTool from Python:
+
+```python
+from exiftool import ExifToolHelper
+
+# Read metadata
+with ExifToolHelper() as et:
+    metadata = et.get_metadata(["image.jpg"])
+    print(metadata[0])
+
+# Write metadata
+with ExifToolHelper() as et:
+    et.set_tags(
+        ["image.jpg"],
+        tags={"EXIF:ImageDescription": "New Title"},
+        params=["-overwrite_original"]
+    )
+```
+
+### Batch Mode Performance
+
+PyExifTool uses ExifTool's `-stay_open` mode, which keeps one ExifTool process running for multiple operations:
+
+- **Single file operations**: ~50-100ms overhead
+- **Batch operations (100 files)**: 10-60x faster than spawning new processes
+- **Memory efficient**: One process handles all operations
+
+## Troubleshooting
+
+### ExifTool not found
+
+**Error:** `ExifTool not found` or `exiftool command not available`
+
+**Solution:**
+1. Install ExifTool using the instructions above
+2. Restart your terminal/command prompt
+3. Verify with `exiftool -ver`
+4. If still not found, check your PATH environment variable
+
+### Permission denied
+
+**Error:** `Permission denied when executing exiftool`
+
+**Solution (macOS/Linux):**
+```bash
+chmod +x /path/to/exiftool
+```
+
+### PyExifTool import error
+
+**Error:** `ModuleNotFoundError: No module named 'exiftool'`
+
+**Solution:**
+```bash
+pip install PyExifTool>=0.5.6
+```
+
+### Encoding issues with Unicode filenames
+
+ExifTool handles Unicode filenames natively. If you encounter issues:
+
+1. Ensure your terminal supports UTF-8
+2. Use the PyExifTool wrapper (handles encoding automatically)
+3. Check file system supports Unicode filenames
+
+## Performance Tips
+
+### Use batch mode for multiple files
+
+```python
+# Good: Process multiple files in one batch
+with ExifToolHelper() as et:
+    et.set_tags(
+        ["file1.jpg", "file2.jpg", "file3.jpg"],
+        tags={"EXIF:ImageDescription": "Title"},
+        params=["-overwrite_original"]
+    )
+
+# Avoid: Processing files one at a time
+for file in files:
+    with ExifToolHelper() as et:
+        et.set_tags([file], tags={...})
+```
+
+### Use specific tag names
+
+```python
+# Good: Specific tag queries
+et.get_tags(["image.jpg"], tags=["EXIF:ImageDescription", "XMP:Title"])
+
+# Slower: Extract all tags
+et.get_metadata(["image.jpg"])  # Returns 100+ tags
+```
+
+### Skip unnecessary tags with -fast
+
+For read-only operations where you only need basic metadata:
+
+```python
+et.execute("-fast", "-json", "image.jpg")
+```
+
+## Integration with Oliver Metadata Tool
+
+Oliver Metadata Tool automatically detects ExifTool and uses it when available:
+
+1. **On startup**: Checks for ExifTool installation
+2. **Hybrid approach**: Uses ExifTool for images/video/PDF, Python libraries for Office docs
+3. **Graceful fallback**: Falls back to pure Python if ExifTool unavailable
+
+### Check ExifTool status
+
+```python
+from src.config import Config
+
+if Config.check_exiftool():
+    print("ExifTool available")
+else:
+    print("Using Python libraries")
+```
+
+## References
+
+- [ExifTool Official Website](https://exiftool.org/)
+- [ExifTool Documentation](https://exiftool.org/exiftool_pod.html)
+- [PyExifTool GitHub](https://github.com/sylikc/pyexiftool)
+- [PyExifTool Documentation](https://sylikc.github.io/pyexiftool/)
+- [Supported File Types](https://exiftool.org/#supported)
+- [Tag Names Reference](https://exiftool.org/TagNames/)
+
+## License
+
+ExifTool is free software licensed under the Perl Artistic License or GPL version 1 or later.
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,54 @@
+# Core Libraries
+python-magic>=0.4.27
+python-dotenv>=1.0.1
+tqdm>=4.66.0
+
+# Excel Processing
+pandas>=2.0.0
+openpyxl>=3.1.0
+
+# PDF Processing
+pypdf>=4.0.0
+pdfplumber>=0.11.0
+PyPDF2>=3.0.0
+
+# Image Processing
+Pillow>=10.2.0
+pytesseract>=0.3.0
+pdf2image>=1.16.0
+piexif>=1.1.0
+iptcinfo3>=2.1.0
+
+# Office Documents
+python-docx>=1.0.0
+python-pptx>=0.6.0
+
+# Video Processing
+mutagen>=1.45.0
+ffmpeg-python>=0.2.0
+pymediainfo>=7.0.0
+
+# AI & Metadata Generation
+openai>=1.0.0
+tiktoken>=0.5.0
+tenacity>=8.2.0
+
+# ExifTool Integration (optional but recommended)
+PyExifTool>=0.5.6
+
+# Web Framework (FastAPI)
+fastapi>=0.109.0
+uvicorn[standard]>=0.27.0
+gunicorn>=21.2.0
+python-multipart>=0.0.6
+pydantic-settings>=2.1.0
+jinja2>=3.1.0
+
+# Password Hashing (from Flask ecosystem, still needed)
+Werkzeug>=3.0.0
+
+# Authentication & SSO
+msal>=1.20.0  # Microsoft Authentication Library for SSO (optional)
+
+# Security
+slowapi>=0.1.9
--- a/run.py
+++ b/run.py
@ -0,0 +1,13 @@
+#!/usr/bin/env python3
+"""Development entry point for Oliver Metadata Tool."""
+
+import uvicorn
+
+if __name__ == "__main__":
+    uvicorn.run(
+        "app.main:app",
+        host="127.0.0.1",
+        port=5001,
+        reload=True,
+        log_level="info",
+    )
--- a/src/init.py
+++ b/src/init.py
@ -0,0 +1,4 @@
+"""Universal Metadata Automation Tool"""
+
+__version__ = "1.0.0"
+__author__ = "Oliver Team"
--- a/src/auth.py
+++ b/src/auth.py
@ -0,0 +1,324 @@
+"""Authentication and authorization module."""
+
+import os
+import secrets
+from functools import wraps
+from flask import session, redirect, url_for, request
+from typing import Dict, Optional
+from .database import Database
+from .utils import get_logger
+
+logger = get_logger(__name__)
+
+# Initialize database
+db = Database()
+
+
+def login_required(f):
+    """
+    Decorator to require login for routes.
+
+    Usage:
+        @app.route('/protected')
+        @login_required
+        def protected_route():
+            return 'Protected content'
+    """
+    @wraps(f)
+    def decorated_function(*args, **kwargs):
+        if 'user_id' not in session:
+            # Save the original URL to redirect after login
+            return redirect(url_for('login', next=request.url))
+
+        # Check if session is still valid in database
+        session_id = session.get('session_id')
+        if session_id:
+            db_session = db.get_session(session_id)
+            if not db_session:
+                # Session expired or invalid
+                session.clear()
+                return redirect(url_for('login', next=request.url))
+
+        return f(*args, **kwargs)
+    return decorated_function
+
+
+def authenticate_user(username: str, password: str) -> Dict:
+    """
+    Authenticate user with username and password.
+
+    Args:
+        username: Username
+        password: Plain text password
+
+    Returns:
+        Dictionary with 'success' boolean and either 'user' dict or 'error' message
+    """
+    try:
+        # Import werkzeug for password verification
+        from werkzeug.security import check_password_hash
+
+        # Check test user first (hardcoded for testing)
+        if username == 'tester' and password == 'oliveradmin':
+            user = db.get_user_by_username('tester')
+            if user:
+                logger.info(f"Test user '{username}' authenticated successfully")
+                return {'success': True, 'user': user}
+
+        # Check database for other users
+        user = db.get_user_by_username(username)
+
+        if user and user.get('password_hash'):
+            if check_password_hash(user['password_hash'], password):
+                logger.info(f"User '{username}' authenticated successfully (database)")
+                return {'success': True, 'user': user}
+
+        logger.warning(f"Authentication failed for user '{username}'")
+        return {'success': False, 'error': 'Invalid username or password'}
+
+    except ImportError:
+        logger.error("werkzeug not available - cannot verify passwords")
+        return {'success': False, 'error': 'Authentication system not available'}
+    except Exception as e:
+        logger.error(f"Authentication error: {e}")
+        return {'success': False, 'error': 'Authentication error occurred'}
+
+
+def create_user_session(user: Dict, ip_address: Optional[str] = None, user_agent: Optional[str] = None) -> str:
+    """
+    Create a new session for authenticated user.
+
+    Args:
+        user: User dictionary from database
+        ip_address: Client IP address
+        user_agent: Client user agent string
+
+    Returns:
+        Session ID
+    """
+    session_id = secrets.token_urlsafe(32)
+    user_id = user['id']
+
+    # Create session in database
+    success = db.create_session(
+        user_id=user_id,
+        session_id=session_id,
+        expires_in_hours=24,
+        ip_address=ip_address,
+        user_agent=user_agent
+    )
+
+    if success:
+        # Update last login timestamp
+        db.update_last_login(user_id)
+
+        # Log the login action
+        db.log_action(user_id, 'login', f'IP: {ip_address}')
+
+        logger.info(f"Created session for user {user['username']} (ID: {user_id})")
+        return session_id
+    else:
+        logger.error(f"Failed to create session for user {user_id}")
+        return None
+
+
+def destroy_user_session(session_id: str, user_id: Optional[int] = None):
+    """
+    Destroy user session (logout).
+
+    Args:
+        session_id: Session ID to destroy
+        user_id: Optional user ID for logging
+    """
+    db.delete_session(session_id)
+
+    if user_id:
+        db.log_action(user_id, 'logout', f'Session: {session_id}')
+        logger.info(f"User {user_id} logged out")
+
+
+def get_current_user() -> Optional[Dict]:
+    """
+    Get current logged-in user from session.
+
+    Returns:
+        User dictionary or None if not logged in
+    """
+    user_id = session.get('user_id')
+    if user_id:
+        return db.get_user_by_id(user_id)
+    return None
+
+
+def cleanup_sessions():
+    """Clean up expired sessions from database."""
+    db.cleanup_expired_sessions()
+
+
+class MicrosoftSSO:
+    """Microsoft SSO authentication handler using MSAL."""
+
+    def __init__(self):
+        """Initialize Microsoft SSO with environment variables."""
+        self.client_id = os.getenv('AZURE_CLIENT_ID')
+        self.client_secret = os.getenv('AZURE_CLIENT_SECRET')
+        self.tenant_id = os.getenv('AZURE_TENANT_ID')
+        self.redirect_uri = os.getenv('REDIRECT_URI', 'http://localhost:5001/auth/callback')
+
+        # Check if SSO is configured
+        if not all([self.client_id, self.client_secret, self.tenant_id]):
+            self.enabled = False
+            logger.warning("Microsoft SSO not configured (missing Azure credentials)")
+            return
+
+        try:
+            import msal
+            self.authority = f"https://login.microsoftonline.com/{self.tenant_id}"
+            self.app = msal.ConfidentialClientApplication(
+                self.client_id,
+                authority=self.authority,
+                client_credential=self.client_secret
+            )
+            self.enabled = True
+            logger.info("Microsoft SSO initialized successfully")
+        except ImportError:
+            self.enabled = False
+            logger.warning("Microsoft SSO not available (msal library not installed)")
+        except Exception as e:
+            self.enabled = False
+            logger.error(f"Failed to initialize Microsoft SSO: {e}")
+
+    def get_auth_url(self, state: Optional[str] = None) -> Optional[str]:
+        """
+        Get Microsoft login URL.
+
+        Args:
+            state: State parameter for CSRF protection
+
+        Returns:
+            Authorization URL or None if SSO not enabled
+        """
+        if not self.enabled:
+            return None
+
+        try:
+            return self.app.get_authorization_request_url(
+                scopes=["User.Read"],
+                state=state,
+                redirect_uri=self.redirect_uri
+            )
+        except Exception as e:
+            logger.error(f"Error generating auth URL: {e}")
+            return None
+
+    def acquire_token(self, auth_code: str) -> Optional[Dict]:
+        """
+        Exchange authorization code for access token.
+
+        Args:
+            auth_code: Authorization code from Microsoft
+
+        Returns:
+            Token result dictionary or None if failed
+        """
+        if not self.enabled:
+            return None
+
+        try:
+            result = self.app.acquire_token_by_authorization_code(
+                auth_code,
+                scopes=["User.Read"],
+                redirect_uri=self.redirect_uri
+            )
+            return result
+        except Exception as e:
+            logger.error(f"Error acquiring token: {e}")
+            return None
+
+    def get_user_info(self, access_token: str) -> Optional[Dict]:
+        """
+        Get user info from Microsoft Graph API.
+
+        Args:
+            access_token: Access token from Microsoft
+
+        Returns:
+            User info dictionary or None if failed
+        """
+        if not self.enabled:
+            return None
+
+        try:
+            import requests
+            headers = {'Authorization': f'Bearer {access_token}'}
+            response = requests.get(
+                'https://graph.microsoft.com/v1.0/me',
+                headers=headers,
+                timeout=10
+            )
+
+            if response.status_code == 200:
+                return response.json()
+            else:
+                logger.error(f"Graph API error: {response.status_code}")
+                return None
+
+        except Exception as e:
+            logger.error(f"Error fetching user info: {e}")
+            return None
+
+    def create_or_update_user(self, user_info: Dict) -> Optional[Dict]:
+        """
+        Create or update user from SSO login.
+
+        Args:
+            user_info: User information from Microsoft Graph
+
+        Returns:
+            User dictionary or None if failed
+        """
+        try:
+            email = user_info.get('mail') or user_info.get('userPrincipalName')
+            username = email.split('@')[0] if email else user_info.get('displayName', 'unknown')
+            full_name = user_info.get('displayName')
+
+            # Check if user exists
+            user = db.get_user_by_username(username)
+
+            if not user:
+                # Create new user
+                user_id = db.create_user(
+                    username=username,
+                    email=email,
+                    full_name=full_name,
+                    auth_method='sso'
+                )
+
+                if user_id:
+                    user = db.get_user_by_id(user_id)
+                    logger.info(f"Created new SSO user: {username}")
+                else:
+                    logger.error(f"Failed to create SSO user: {username}")
+                    return None
+            else:
+                logger.info(f"Existing SSO user logged in: {username}")
+
+            return user
+
+        except Exception as e:
+            logger.error(f"Error creating/updating SSO user: {e}")
+            return None
+
+
+# Initialize Microsoft SSO
+sso = MicrosoftSSO()
+
+
+def is_sso_enabled() -> bool:
+    """Check if Microsoft SSO is enabled and configured."""
+    return sso.enabled
+
+
+def get_sso_instance() -> MicrosoftSSO:
+    """Get Microsoft SSO instance."""
+    return sso
--- a/src/base_extractor.py
+++ b/src/base_extractor.py
@ -0,0 +1,64 @@
+"""Base class for all content extractors."""
+
+from abc import ABC, abstractmethod
+from typing import Dict, Optional
+
+class BaseExtractor(ABC):
+    """Abstract base class for content extractors."""
+
+    @abstractmethod
+    def extract_content(self, file_path: str) -> str:
+        """
+        Extract text content from file.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            Extracted text content
+        """
+        pass
+
+    @abstractmethod
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read existing metadata from file.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            Dictionary of metadata fields
+        """
+        pass
+
+    def truncate_content(self, content: str, max_length: int = 3000) -> str:
+        """
+        Truncate content to maximum length for AI processing.
+
+        Args:
+            content: Text content
+            max_length: Maximum length
+
+        Returns:
+            Truncated content
+        """
+        if len(content) <= max_length:
+            return content
+        return content[:max_length] + "..."
+
+    def clean_text(self, text: str) -> str:
+        """
+        Clean extracted text (remove excessive whitespace, etc.).
+
+        Args:
+            text: Raw text
+
+        Returns:
+            Cleaned text
+        """
+        # Remove multiple spaces
+        text = ' '.join(text.split())
+        # Remove multiple newlines
+        text = '\n'.join(line for line in text.split('\n') if line.strip())
+        return text.strip()
--- a/src/base_updater.py
+++ b/src/base_updater.py
@ -0,0 +1,60 @@
+"""Base class for all metadata updaters."""
+
+from abc import ABC, abstractmethod
+from typing import Dict, Optional
+
+class BaseUpdater(ABC):
+    """Abstract base class for metadata updaters."""
+
+    @abstractmethod
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update file metadata.
+
+        Args:
+            file_path: Path to the file
+            metadata: Dictionary of metadata to update
+            backup: Whether to create backup before updating
+
+        Returns:
+            True if successful, False otherwise
+        """
+        pass
+
+    @abstractmethod
+    def verify_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify metadata was written correctly.
+
+        Args:
+            file_path: Path to the file
+            expected_metadata: Expected metadata values
+
+        Returns:
+            True if metadata matches expected values
+        """
+        pass
+
+    def validate_metadata(self, metadata: Dict[str, str]) -> bool:
+        """
+        Validate metadata before writing.
+
+        Args:
+            metadata: Metadata dictionary
+
+        Returns:
+            True if valid
+        """
+        # Check for required fields
+        required_fields = ['title']
+        for field in required_fields:
+            if field not in metadata or not metadata[field]:
+                return False
+
+        # Check field lengths
+        if len(metadata.get('title', '')) > 200:
+            return False
+        if len(metadata.get('keywords', '')) > 500:
+            return False
+
+        return True
--- a/src/config.py
+++ b/src/config.py
@ -0,0 +1,70 @@
+"""Configuration management for Oliver Metadata Tool."""
+
+import os
+import shutil
+import logging
+from pathlib import Path
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+
+logger = logging.getLogger(__name__)
+
+class Config:
+    """Configuration class for managing settings."""
+
+    # App Info
+    APP_NAME = "Oliver Metadata Tool"
+    APP_VERSION = "3.0.0"
+    APP_DESCRIPTION = "Universal metadata creation and management tool"
+
+    # Paths
+    PROJECT_ROOT = Path(__file__).parent.parent
+    OUTPUT_DIR = PROJECT_ROOT / 'output'
+    BACKUP_DIR = OUTPUT_DIR / 'backup'
+    REPORTS_DIR = OUTPUT_DIR / 'reports'
+
+    # External tool paths (optional)
+    TESSERACT_PATH = os.getenv('TESSERACT_PATH')
+    FFMPEG_PATH = os.getenv('FFMPEG_PATH')
+
+    # Processing Settings
+    PDF_MAX_PAGES = 3  # Maximum pages to extract from PDF
+
+    # OCR Settings - languages for Tesseract (CGA region support)
+    # eng=English, chi_sim=Chinese Simplified, chi_tra=Chinese Traditional,
+    # jpn=Japanese, kor=Korean
+    OCR_LANGUAGES = os.getenv('OCR_LANGUAGES', 'eng+chi_sim+chi_tra+jpn+kor')
+
+    # AI Settings (for CLI and Web AI mode)
+    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
+    AI_MODEL = os.getenv('AI_MODEL', 'gpt-4o-mini')  # Better than gpt-3.5-turbo
+    MAX_TOKENS = int(os.getenv('MAX_TOKENS', '500'))
+    TEMPERATURE = float(os.getenv('TEMPERATURE', '0.5'))  # 0.5 better for factual content
+    MAX_TEXT_LENGTH = int(os.getenv('MAX_TEXT_LENGTH', '4000'))
+
+    # API Rate Limiting & Retry (from open source analysis)
+    API_TIMEOUT = int(os.getenv('API_TIMEOUT', '30'))
+    API_MAX_RETRIES = int(os.getenv('API_MAX_RETRIES', '3'))
+    API_RETRY_DELAY = float(os.getenv('API_RETRY_DELAY', '1.0'))  # exponential backoff multiplier
+
+    @classmethod
+    def ensure_directories(cls):
+        """Ensure required directories exist."""
+        cls.OUTPUT_DIR.mkdir(exist_ok=True)
+        cls.BACKUP_DIR.mkdir(exist_ok=True)
+        cls.REPORTS_DIR.mkdir(exist_ok=True)
+
+    @classmethod
+    def check_exiftool(cls):
+        """Check if ExifTool is installed."""
+        exiftool_path = shutil.which('exiftool')
+        if not exiftool_path:
+            logger.warning("⚠️  ExifTool not found. Install with: brew install exiftool (macOS) or apt-get install libimage-exiftool-perl (Linux)")
+            return False
+        logger.info(f"✓ ExifTool found at {exiftool_path}")
+        return True
+
+# Ensure directories on import
+Config.ensure_directories()
--- a/src/database.py
+++ b/src/database.py
@ -0,0 +1,525 @@
+"""Database management for user authentication and sessions."""
+
+import sqlite3
+import os
+from datetime import datetime, timedelta
+from typing import Optional, Dict, List
+from pathlib import Path
+from .utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class Database:
+    """SQLite database manager for Oliver Metadata Tool.
+
+    Uses connection-per-operation pattern for thread safety with
+    multiple uvicorn workers.
+    """
+
+    def __init__(self, db_path: str = None):
+        # Auto-detect database path based on environment
+        if db_path is None:
+            DOCKER_MODE = os.getenv('DOCKER_MODE', 'false').lower() == 'true'
+            if DOCKER_MODE:
+                db_dir = Path('/app/data')
+                db_dir.mkdir(parents=True, exist_ok=True)
+                db_path = str(db_dir / 'oliver_metadata.db')
+            else:
+                db_path = 'oliver_metadata.db'
+
+        self.db_path = db_path
+        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
+        self._create_tables()
+        logger.info(f"Database initialized at {db_path}")
+
+    def _get_conn(self) -> sqlite3.Connection:
+        """Create a new connection per call (thread-safe)."""
+        conn = sqlite3.connect(self.db_path, timeout=10)
+        conn.row_factory = sqlite3.Row
+        conn.execute("PRAGMA journal_mode=WAL")
+        return conn
+
+    def _create_tables(self):
+        """Create database tables if they don't exist."""
+        conn = self._get_conn()
+        try:
+            # Users table (with role column)
+            conn.execute('''
+                CREATE TABLE IF NOT EXISTS users (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    username TEXT UNIQUE NOT NULL,
+                    password_hash TEXT,
+                    email TEXT,
+                    full_name TEXT,
+                    role TEXT DEFAULT 'user',
+                    auth_method TEXT DEFAULT 'local',
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    last_login TIMESTAMP,
+                    is_active BOOLEAN DEFAULT 1
+                )
+            ''')
+
+            # Sessions table
+            conn.execute('''
+                CREATE TABLE IF NOT EXISTS sessions (
+                    session_id TEXT PRIMARY KEY,
+                    user_id INTEGER NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    expires_at TIMESTAMP NOT NULL,
+                    ip_address TEXT,
+                    user_agent TEXT,
+                    FOREIGN KEY (user_id) REFERENCES users (id)
+                )
+            ''')
+
+            # Audit log table
+            conn.execute('''
+                CREATE TABLE IF NOT EXISTS audit_log (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    user_id INTEGER NOT NULL,
+                    action TEXT NOT NULL,
+                    details TEXT,
+                    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    FOREIGN KEY (user_id) REFERENCES users (id)
+                )
+            ''')
+
+            # AI usage table
+            conn.execute('''
+                CREATE TABLE IF NOT EXISTS ai_usage (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    user_id INTEGER NOT NULL,
+                    filename TEXT,
+                    tokens_total INTEGER DEFAULT 0,
+                    model TEXT DEFAULT '',
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    FOREIGN KEY (user_id) REFERENCES users (id)
+                )
+            ''')
+
+            # Indexes
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_sessions_user_id ON sessions(user_id)')
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_sessions_expires_at ON sessions(expires_at)')
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_audit_user_id ON audit_log(user_id)')
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_audit_timestamp ON audit_log(timestamp)')
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_ai_usage_user_id ON ai_usage(user_id)')
+            conn.execute('CREATE INDEX IF NOT EXISTS idx_ai_usage_created ON ai_usage(created_at)')
+
+            conn.commit()
+            logger.info("Database tables created/verified")
+
+            # Add role column to existing databases (migration)
+            self._migrate_add_role_column(conn)
+
+            # Create test user if enabled
+            enable_test = os.getenv('ENABLE_TEST_USER', 'false').lower() == 'true'
+            if enable_test:
+                self._create_test_user(conn)
+
+            # Create superadmin if configured
+            superadmin_email = os.getenv('SUPERADMIN_EMAIL', '')
+            if superadmin_email:
+                self._create_superadmin(conn, superadmin_email)
+
+        finally:
+            conn.close()
+
+    def _migrate_add_role_column(self, conn: sqlite3.Connection):
+        """Add role column if it doesn't exist (for existing databases)."""
+        try:
+            cursor = conn.execute("PRAGMA table_info(users)")
+            columns = [row['name'] for row in cursor.fetchall()]
+            if 'role' not in columns:
+                conn.execute("ALTER TABLE users ADD COLUMN role TEXT DEFAULT 'user'")
+                conn.commit()
+                logger.info("Added 'role' column to users table")
+        except Exception as e:
+            logger.error(f"Error migrating role column: {e}")
+
+    def _create_test_user(self, conn: sqlite3.Connection):
+        """Create test user (tester/oliveradmin) if doesn't exist."""
+        try:
+            cursor = conn.execute('SELECT id FROM users WHERE username = ?', ('tester',))
+            if not cursor.fetchone():
+                try:
+                    from werkzeug.security import generate_password_hash
+                    password_hash = generate_password_hash('oliveradmin')
+                    conn.execute(
+                        'INSERT INTO users (username, password_hash, email, full_name, role, auth_method) VALUES (?, ?, ?, ?, ?, ?)',
+                        ('tester', password_hash, 'tester@oliver.local', 'Test User', 'user', 'local'),
+                    )
+                    conn.commit()
+                    logger.info("Test user 'tester' created")
+                except ImportError:
+                    logger.warning("werkzeug not available - test user not created")
+        except Exception as e:
+            logger.error(f"Error creating test user: {e}")
+
+    def _create_superadmin(self, conn: sqlite3.Connection, email: str):
+        """Create or promote superadmin user."""
+        try:
+            username = email.split('@')[0]
+            cursor = conn.execute('SELECT id, role FROM users WHERE username = ? OR email = ?', (username, email))
+            row = cursor.fetchone()
+            if row:
+                if row['role'] != 'admin':
+                    conn.execute('UPDATE users SET role = ? WHERE id = ?', ('admin', row['id']))
+                    conn.commit()
+                    logger.info(f"Promoted user '{username}' to admin")
+            else:
+                conn.execute(
+                    'INSERT INTO users (username, email, full_name, role, auth_method) VALUES (?, ?, ?, ?, ?)',
+                    (username, email, username, 'admin', 'sso'),
+                )
+                conn.commit()
+                logger.info(f"Created superadmin user '{username}' ({email})")
+        except Exception as e:
+            logger.error(f"Error creating superadmin: {e}")
+
+    # --- User Operations ---
+
+    def get_user_by_username(self, username: str) -> Optional[Dict]:
+        """Get user by username."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute('SELECT * FROM users WHERE username = ? AND is_active = 1', (username,))
+            row = cursor.fetchone()
+            return dict(row) if row else None
+        except Exception as e:
+            logger.error(f"Error fetching user '{username}': {e}")
+            return None
+        finally:
+            conn.close()
+
+    def get_user_by_id(self, user_id: int) -> Optional[Dict]:
+        """Get user by ID."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute('SELECT * FROM users WHERE id = ? AND is_active = 1', (user_id,))
+            row = cursor.fetchone()
+            return dict(row) if row else None
+        except Exception as e:
+            logger.error(f"Error fetching user ID {user_id}: {e}")
+            return None
+        finally:
+            conn.close()
+
+    def create_user(
+        self,
+        username: str,
+        password_hash: Optional[str] = None,
+        email: Optional[str] = None,
+        full_name: Optional[str] = None,
+        auth_method: str = 'local',
+        role: str = 'user',
+    ) -> Optional[int]:
+        """Create a new user. Returns user ID if successful."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute(
+                'INSERT INTO users (username, password_hash, email, full_name, role, auth_method) VALUES (?, ?, ?, ?, ?, ?)',
+                (username, password_hash, email, full_name, role, auth_method),
+            )
+            conn.commit()
+            user_id = cursor.lastrowid
+            logger.info(f"Created user '{username}' (ID: {user_id})")
+            return user_id
+        except sqlite3.IntegrityError:
+            logger.warning(f"User '{username}' already exists")
+            return None
+        except Exception as e:
+            logger.error(f"Error creating user '{username}': {e}")
+            return None
+        finally:
+            conn.close()
+
+    def update_last_login(self, user_id: int):
+        """Update user's last login timestamp."""
+        conn = self._get_conn()
+        try:
+            conn.execute('UPDATE users SET last_login = CURRENT_TIMESTAMP WHERE id = ?', (user_id,))
+            conn.commit()
+        except Exception as e:
+            logger.error(f"Error updating last login for user {user_id}: {e}")
+        finally:
+            conn.close()
+
+    # --- Session Operations ---
+
+    def create_session(
+        self,
+        user_id: int,
+        session_id: str,
+        expires_in_hours: int = 24,
+        ip_address: Optional[str] = None,
+        user_agent: Optional[str] = None,
+    ) -> bool:
+        """Create new session for user."""
+        conn = self._get_conn()
+        try:
+            expires_at = datetime.now() + timedelta(hours=expires_in_hours)
+            conn.execute(
+                'INSERT INTO sessions (session_id, user_id, expires_at, ip_address, user_agent) VALUES (?, ?, ?, ?, ?)',
+                (session_id, user_id, expires_at, ip_address, user_agent),
+            )
+            conn.commit()
+            return True
+        except Exception as e:
+            logger.error(f"Error creating session: {e}")
+            return False
+        finally:
+            conn.close()
+
+    def get_session(self, session_id: str) -> Optional[Dict]:
+        """Get session by ID. Returns None if expired or not found."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute('''
+                SELECT s.*, u.username, u.email, u.full_name
+                FROM sessions s
+                JOIN users u ON s.user_id = u.id
+                WHERE s.session_id = ? AND s.expires_at > CURRENT_TIMESTAMP
+            ''', (session_id,))
+            row = cursor.fetchone()
+            return dict(row) if row else None
+        except Exception as e:
+            logger.error(f"Error fetching session: {e}")
+            return None
+        finally:
+            conn.close()
+
+    def delete_session(self, session_id: str) -> bool:
+        """Delete session (logout)."""
+        conn = self._get_conn()
+        try:
+            conn.execute('DELETE FROM sessions WHERE session_id = ?', (session_id,))
+            conn.commit()
+            return True
+        except Exception as e:
+            logger.error(f"Error deleting session: {e}")
+            return False
+        finally:
+            conn.close()
+
+    def cleanup_expired_sessions(self):
+        """Remove expired sessions from database."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute('DELETE FROM sessions WHERE expires_at < CURRENT_TIMESTAMP')
+            conn.commit()
+            deleted_count = cursor.rowcount
+            if deleted_count > 0:
+                logger.info(f"Cleaned up {deleted_count} expired sessions")
+        except Exception as e:
+            logger.error(f"Error cleaning up sessions: {e}")
+        finally:
+            conn.close()
+
+    # --- Audit Log ---
+
+    def log_action(self, user_id: int, action: str, details: Optional[str] = None):
+        """Log user action to audit trail."""
+        conn = self._get_conn()
+        try:
+            conn.execute(
+                'INSERT INTO audit_log (user_id, action, details) VALUES (?, ?, ?)',
+                (user_id, action, details),
+            )
+            conn.commit()
+        except Exception as e:
+            logger.error(f"Error logging action: {e}")
+        finally:
+            conn.close()
+
+    def get_user_activity(self, user_id: int, limit: int = 100, offset: int = 0) -> List[Dict]:
+        """Get user activity log."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute(
+                'SELECT * FROM audit_log WHERE user_id = ? ORDER BY timestamp DESC LIMIT ? OFFSET ?',
+                (user_id, limit, offset),
+            )
+            return [dict(row) for row in cursor.fetchall()]
+        except Exception as e:
+            logger.error(f"Error fetching user activity: {e}")
+            return []
+        finally:
+            conn.close()
+
+    def get_all_users(self, include_inactive: bool = False) -> List[Dict]:
+        """Get all users."""
+        conn = self._get_conn()
+        try:
+            query = 'SELECT * FROM users'
+            if not include_inactive:
+                query += ' WHERE is_active = 1'
+            query += ' ORDER BY created_at DESC'
+            cursor = conn.execute(query)
+            return [dict(row) for row in cursor.fetchall()]
+        except Exception as e:
+            logger.error(f"Error fetching users: {e}")
+            return []
+        finally:
+            conn.close()
+
+    def get_stats(self) -> Dict:
+        """Get database statistics."""
+        conn = self._get_conn()
+        try:
+            stats = {}
+            cursor = conn.execute('SELECT COUNT(*) as count FROM users WHERE is_active = 1')
+            stats['active_users'] = cursor.fetchone()['count']
+
+            cursor = conn.execute('SELECT COUNT(*) as count FROM sessions WHERE expires_at > CURRENT_TIMESTAMP')
+            stats['active_sessions'] = cursor.fetchone()['count']
+
+            cursor = conn.execute('SELECT COUNT(*) as count FROM audit_log')
+            stats['audit_entries'] = cursor.fetchone()['count']
+
+            cursor = conn.execute("SELECT COUNT(*) as count FROM audit_log WHERE timestamp > datetime('now', '-24 hours')")
+            stats['recent_activity'] = cursor.fetchone()['count']
+
+            return stats
+        except Exception as e:
+            logger.error(f"Error fetching stats: {e}")
+            return {}
+        finally:
+            conn.close()
+
+    # --- User Update ---
+
+    def update_user(self, user_id: int, updates: Dict) -> bool:
+        """Update user fields. Returns True on success."""
+        allowed = {'role', 'is_active', 'full_name', 'email'}
+        filtered = {k: v for k, v in updates.items() if k in allowed}
+        if not filtered:
+            return False
+        conn = self._get_conn()
+        try:
+            set_clause = ', '.join(f'{k} = ?' for k in filtered)
+            values = list(filtered.values()) + [user_id]
+            conn.execute(f'UPDATE users SET {set_clause} WHERE id = ?', values)
+            conn.commit()
+            return conn.total_changes > 0
+        except Exception as e:
+            logger.error(f"Error updating user {user_id}: {e}")
+            return False
+        finally:
+            conn.close()
+
+    # --- Audit Log (extended) ---
+
+    def get_audit_log(
+        self,
+        user_id: Optional[int] = None,
+        action: Optional[str] = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> List[Dict]:
+        """Get audit log with optional filters."""
+        conn = self._get_conn()
+        try:
+            query = '''
+                SELECT a.*, u.username
+                FROM audit_log a
+                LEFT JOIN users u ON a.user_id = u.id
+            '''
+            conditions = []
+            params = []
+            if user_id is not None:
+                conditions.append('a.user_id = ?')
+                params.append(user_id)
+            if action:
+                conditions.append('a.action = ?')
+                params.append(action)
+            if conditions:
+                query += ' WHERE ' + ' AND '.join(conditions)
+            query += ' ORDER BY a.timestamp DESC LIMIT ? OFFSET ?'
+            params.extend([limit, offset])
+            cursor = conn.execute(query, params)
+            return [dict(row) for row in cursor.fetchall()]
+        except Exception as e:
+            logger.error(f"Error fetching audit log: {e}")
+            return []
+        finally:
+            conn.close()
+
+    # --- AI Usage ---
+
+    def log_ai_usage(
+        self,
+        user_id: int,
+        filename: str = "",
+        tokens_total: int = 0,
+        model: str = "",
+    ):
+        """Log AI token usage for a file."""
+        conn = self._get_conn()
+        try:
+            conn.execute(
+                'INSERT INTO ai_usage (user_id, filename, tokens_total, model) VALUES (?, ?, ?, ?)',
+                (user_id, filename, tokens_total, model),
+            )
+            conn.commit()
+        except Exception as e:
+            logger.error(f"Error logging AI usage: {e}")
+        finally:
+            conn.close()
+
+    def get_ai_usage_stats(self) -> Dict:
+        """Get aggregate AI usage statistics."""
+        conn = self._get_conn()
+        try:
+            stats = {}
+            cursor = conn.execute('SELECT COUNT(*) as count, COALESCE(SUM(tokens_total), 0) as total_tokens FROM ai_usage')
+            row = cursor.fetchone()
+            stats['total_requests'] = row['count']
+            stats['total_tokens'] = row['total_tokens']
+
+            cursor = conn.execute(
+                "SELECT COUNT(*) as count, COALESCE(SUM(tokens_total), 0) as tokens FROM ai_usage WHERE created_at > datetime('now', '-24 hours')"
+            )
+            row = cursor.fetchone()
+            stats['requests_24h'] = row['count']
+            stats['tokens_24h'] = row['tokens']
+
+            cursor = conn.execute(
+                "SELECT COUNT(*) as count, COALESCE(SUM(tokens_total), 0) as tokens FROM ai_usage WHERE created_at > datetime('now', '-7 days')"
+            )
+            row = cursor.fetchone()
+            stats['requests_7d'] = row['count']
+            stats['tokens_7d'] = row['tokens']
+
+            return stats
+        except Exception as e:
+            logger.error(f"Error fetching AI usage stats: {e}")
+            return {}
+        finally:
+            conn.close()
+
+    def get_ai_usage_by_user(self, limit: int = 50) -> List[Dict]:
+        """Get AI usage broken down by user."""
+        conn = self._get_conn()
+        try:
+            cursor = conn.execute('''
+                SELECT u.username, u.id as user_id,
+                       COUNT(*) as request_count,
+                       COALESCE(SUM(a.tokens_total), 0) as total_tokens,
+                       MAX(a.created_at) as last_used
+                FROM ai_usage a
+                JOIN users u ON a.user_id = u.id
+                GROUP BY u.id
+                ORDER BY total_tokens DESC
+                LIMIT ?
+            ''', (limit,))
+            return [dict(row) for row in cursor.fetchall()]
+        except Exception as e:
+            logger.error(f"Error fetching AI usage by user: {e}")
+            return []
+        finally:
+            conn.close()
+
+    def close(self):
+        """No-op for connection-per-operation pattern."""
+        pass
--- a/src/excel_metadata_lookup.py
+++ b/src/excel_metadata_lookup.py
@ -0,0 +1,171 @@
+"""Excel-based metadata lookup service."""
+
+import pandas as pd
+from pathlib import Path
+from typing import Dict, Optional
+from .utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class ExcelMetadataLookup:
+    """Lookup metadata from Excel spreadsheet by filename."""
+
+    def __init__(self, excel_path: str):
+        """
+        Initialize the lookup service.
+
+        Args:
+            excel_path: Path to the Excel file with metadata
+        """
+        self.excel_path = Path(excel_path)
+        self.filename_to_metadata = {}
+        self._load_excel()
+
+    def _load_excel(self):
+        """Load and index the Excel file from multiple sheets."""
+        try:
+            logger.info(f"Loading metadata from: {self.excel_path}")
+
+            # Load Sheet 1: DSB Celum ID to Path mapping
+            self._load_dsb_sheet()
+
+            # Load Sheet 2: Medsurg Metadata Cheat (fallback)
+            self._load_medsurg_sheet()
+
+            logger.info(f"✅ Total loaded: {len(self.filename_to_metadata)} metadata records")
+
+        except Exception as e:
+            logger.error(f"Failed to load Excel file: {e}", exc_info=True)
+            raise
+
+    def _load_dsb_sheet(self):
+        """Load DSB Celum ID to Path mapping sheet."""
+        try:
+            df = pd.read_excel(
+                self.excel_path,
+                sheet_name="DSB Celum ID to Path mapping"
+            )
+
+            # Skip header row (first row contains template)
+            df = df[df['Celum ID'].notna()][1:]
+
+            count = 0
+            for _, row in df.iterrows():
+                filename = row.get('File Name')
+                if pd.notna(filename):
+                    # Get filename without extension for indexing
+                    filename_stem = Path(str(filename).strip()).stem.lower()
+
+                    metadata = {
+                        'celum_id': str(row['Celum ID']) if pd.notna(row.get('Celum ID')) else '',
+                        'title': str(row['Title']) if pd.notna(row.get('Title')) else '',
+                        'description': str(row['External Description/Alt Text']) if pd.notna(row.get('External Description/Alt Text')) else '',
+                        'business': str(row['Business']) if pd.notna(row.get('Business')) else '',
+                        'original_filename': str(filename).strip(),
+                        'source_sheet': 'DSB'
+                    }
+
+                    # Only add if not already exists
+                    if filename_stem not in self.filename_to_metadata:
+                        self.filename_to_metadata[filename_stem] = metadata
+                        count += 1
+
+            logger.info(f"✅ Loaded {count} records from DSB sheet")
+
+        except Exception as e:
+            logger.warning(f"Failed to load DSB sheet: {e}")
+
+    def _load_medsurg_sheet(self):
+        """Load Medsurg Metadata Cheat sheet."""
+        try:
+            df = pd.read_excel(
+                self.excel_path,
+                sheet_name="Medsurg Metadata Cheat"
+            )
+
+            # Skip header row
+            df = df[df['Celum ID'].notna()][1:]
+
+            count = 0
+            for _, row in df.iterrows():
+                # Get filename from Solventum DAM Asset Path (extract filename from path)
+                asset_path = row.get('Solventum DAM Asset Path')
+                if pd.notna(asset_path):
+                    # Extract filename from path
+                    filename = Path(str(asset_path).strip()).name
+                    filename_stem = Path(filename).stem.lower()
+
+                    metadata = {
+                        'celum_id': str(row['Celum ID']) if pd.notna(row.get('Celum ID')) else '',
+                        'title': str(row['Title']) if pd.notna(row.get('Title')) else '',
+                        'description': str(row['External Description/Alt Text']) if pd.notna(row.get('External Description/Alt Text')) else '',
+                        'business': str(row['Business']) if pd.notna(row.get('Business')) else '',
+                        'original_filename': filename,
+                        'source_sheet': 'Medsurg'
+                    }
+
+                    # Only add if not already exists (DSB has priority)
+                    if filename_stem not in self.filename_to_metadata:
+                        self.filename_to_metadata[filename_stem] = metadata
+                        count += 1
+
+            logger.info(f"✅ Loaded {count} records from Medsurg sheet")
+
+        except Exception as e:
+            logger.warning(f"Failed to load Medsurg sheet: {e}")
+
+    def lookup_by_filename(self, filename: str) -> Optional[Dict[str, str]]:
+        """
+        Lookup metadata by filename (ignoring extension).
+
+        Args:
+            filename: Name of the file (with or without extension)
+
+        Returns:
+            Dictionary with metadata fields, or None if not found
+        """
+        # Extract just the filename without path and extension
+        filename_stem = Path(filename).stem.lower()
+
+        # Direct lookup by stem (case-insensitive)
+        if filename_stem in self.filename_to_metadata:
+            result = self.filename_to_metadata[filename_stem]
+            logger.info(f"✅ Found match for: {filename} (from {result.get('source_sheet', 'unknown')} sheet)")
+            return result
+
+        logger.warning(f"⚠️ No metadata found for: {filename} (searched: {filename_stem})")
+        return None
+
+    def search_by_celum_id(self, celum_id: str) -> Optional[Dict[str, str]]:
+        """
+        Search metadata by Celum ID.
+
+        Args:
+            celum_id: Celum ID to search for
+
+        Returns:
+            Dictionary with metadata fields, or None if not found
+        """
+        celum_id = str(celum_id).strip()
+
+        for metadata in self.filename_to_metadata.values():
+            if metadata['celum_id'] == celum_id:
+                logger.info(f"✅ Found metadata for Celum ID: {celum_id}")
+                return metadata
+
+        logger.warning(f"⚠️ No metadata found for Celum ID: {celum_id}")
+        return None
+
+    def get_stats(self) -> Dict[str, int]:
+        """Get statistics about loaded metadata."""
+        dsb_count = sum(1 for m in self.filename_to_metadata.values() if m.get('source_sheet') == 'DSB')
+        medsurg_count = sum(1 for m in self.filename_to_metadata.values() if m.get('source_sheet') == 'Medsurg')
+
+        return {
+            'total_records': len(self.filename_to_metadata),
+            'dsb_records': dsb_count,
+            'medsurg_records': medsurg_count,
+            'with_title': sum(1 for m in self.filename_to_metadata.values() if m['title']),
+            'with_description': sum(1 for m in self.filename_to_metadata.values() if m['description']),
+        }
--- a/src/extractors/init.py
+++ b/src/extractors/init.py
@ -0,0 +1 @@
+"""Content extractors for different file types."""
--- a/src/extractors/exiftool_extractor.py
+++ b/src/extractors/exiftool_extractor.py
@ -0,0 +1,174 @@
+"""Unified metadata extractor using ExifTool for images, video, and PDF files."""
+
+from typing import Dict, Optional
+from pathlib import Path
+import logging
+
+try:
+    from exiftool import ExifToolHelper
+    EXIFTOOL_AVAILABLE = True
+except ImportError:
+    EXIFTOOL_AVAILABLE = False
+
+from ..base_extractor import BaseExtractor
+from ..utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class ExifToolExtractor(BaseExtractor):
+    """
+    Extract metadata using ExifTool.
+
+    Supports images (JPEG, PNG, GIF, TIFF, HEIC, RAW),
+    videos (MP4, MOV, AVI, MKV), and PDF metadata extraction.
+
+    Note: This does NOT extract content (text) from files - only metadata.
+    For content extraction, use the regular extractors (PDFExtractor, ImageExtractor with OCR).
+    """
+
+    # Map ExifTool tags to our standard metadata fields
+    TAG_MAPPING = {
+        # Images (JPEG/PNG/TIFF)
+        'EXIF:ImageDescription': 'title',
+        'XMP:Description': 'subject',
+        'IPTC:Caption-Abstract': 'subject',
+        'IPTC:Headline': 'title',
+        'XMP:Title': 'title',
+        'EXIF:XPSubject': 'subject',
+        'EXIF:XPKeywords': 'keywords',
+        'IPTC:Keywords': 'keywords',
+        'XMP:Subject': 'keywords',
+
+        # PDF
+        'PDF:Title': 'title',
+        'PDF:Subject': 'subject',
+        'PDF:Keywords': 'keywords',
+
+        # Video (QuickTime/MP4)
+        'QuickTime:Title': 'title',
+        'QuickTime:Description': 'subject',
+        'QuickTime:Keywords': 'keywords',
+        'UserData:Title': 'title',
+        'UserData:Description': 'subject',
+    }
+
+    def __init__(self):
+        """Initialize ExifTool extractor."""
+        if not EXIFTOOL_AVAILABLE:
+            raise ImportError(
+                "PyExifTool not installed. Install with: pip install PyExifTool>=0.5.6\n"
+                "Also ensure ExifTool is installed on your system."
+            )
+
+    def extract_content(self, file_path: str) -> str:
+        """
+        ExifTool does not extract text content - only metadata.
+
+        This method returns empty string. For content extraction:
+        - PDFs: Use PDFExtractor
+        - Images: Use ImageExtractor with OCR
+        - Office docs: Use OfficeExtractor
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            Empty string (ExifTool doesn't extract content)
+        """
+        logger.debug(f"ExifToolExtractor.extract_content called for {file_path} - returning empty (metadata only)")
+        return ""
+
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read metadata using ExifTool.
+
+        Extracts title, subject, and keywords from various metadata fields.
+        Supports images, videos, and PDFs.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            Dictionary with metadata (title, subject, keywords)
+        """
+        try:
+            with ExifToolHelper() as et:
+                metadata_list = et.get_metadata([file_path])
+                if not metadata_list:
+                    logger.warning(f"No metadata returned by ExifTool for {file_path}")
+                    return {'title': '', 'subject': '', 'keywords': ''}
+
+                exif_data = metadata_list[0]
+                result = {'title': '', 'subject': '', 'keywords': ''}
+
+                # Map ExifTool tags to standard fields
+                for exif_tag, standard_key in self.TAG_MAPPING.items():
+                    if exif_tag in exif_data and exif_data[exif_tag]:
+                        value = exif_data[exif_tag]
+
+                        # Handle list values (keywords often come as arrays)
+                        if isinstance(value, list):
+                            value = ', '.join(str(v) for v in value)
+                        else:
+                            value = str(value)
+
+                        # First non-empty value wins (priority based on TAG_MAPPING order)
+                        if not result[standard_key] and value.strip():
+                            result[standard_key] = value.strip()
+
+                logger.info(f"Extracted metadata from {Path(file_path).name}: "
+                           f"title={bool(result['title'])}, "
+                           f"subject={bool(result['subject'])}, "
+                           f"keywords={bool(result['keywords'])}")
+
+                return result
+
+        except Exception as e:
+            logger.error(f"ExifTool extraction failed for {file_path}: {e}")
+            return {'title': '', 'subject': '', 'keywords': ''}
+
+    def get_all_tags(self, file_path: str) -> Dict:
+        """
+        Get all available metadata tags from a file.
+
+        Useful for debugging or exploring available metadata fields.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            Dictionary of all metadata tags
+        """
+        try:
+            with ExifToolHelper() as et:
+                metadata_list = et.get_metadata([file_path])
+                if metadata_list:
+                    return metadata_list[0]
+                return {}
+        except Exception as e:
+            logger.error(f"Failed to get all tags for {file_path}: {e}")
+            return {}
+
+    def get_specific_tags(self, file_path: str, tags: list) -> Dict:
+        """
+        Get specific metadata tags from a file.
+
+        More efficient than get_all_tags when you know which tags you need.
+
+        Args:
+            file_path: Path to the file
+            tags: List of tag names (e.g., ['EXIF:ImageDescription', 'PDF:Title'])
+
+        Returns:
+            Dictionary of requested tags
+        """
+        try:
+            with ExifToolHelper() as et:
+                metadata_list = et.get_tags([file_path], tags=tags)
+                if metadata_list:
+                    return metadata_list[0]
+                return {}
+        except Exception as e:
+            logger.error(f"Failed to get specific tags for {file_path}: {e}")
+            return {}
--- a/src/extractors/image_extractor.py
+++ b/src/extractors/image_extractor.py
@ -0,0 +1,179 @@
+"""Image content and metadata extractor."""
+
+import pytesseract
+import piexif
+from PIL import Image
+from typing import Dict
+import os
+
+from ..base_extractor import BaseExtractor
+from ..config import Config
+from ..utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class ImageExtractor(BaseExtractor):
+    """Extractor for image files (JPEG, PNG, etc.) with OCR and EXIF metadata."""
+
+    def __init__(self):
+        """Initialize image extractor."""
+        self.tesseract_path = Config.TESSERACT_PATH
+        if self.tesseract_path and os.path.exists(self.tesseract_path):
+            pytesseract.pytesseract.pytesseract_cmd = self.tesseract_path
+        # Get OCR languages from config (supports Chinese, Japanese, Korean, etc.)
+        self.ocr_lang = Config.OCR_LANGUAGES
+
+    def extract_content(self, file_path: str) -> str:
+        """
+        Extract text content from image using OCR.
+
+        Uses pytesseract to perform optical character recognition on the image.
+        Supports multiple languages including Chinese, Japanese, Korean.
+
+        Args:
+            file_path: Path to the image file
+
+        Returns:
+            Extracted text content
+
+        Raises:
+            Exception: If extraction fails
+        """
+        try:
+            logger.info(f"Starting image OCR extraction from {file_path}")
+
+            # Open image
+            image = Image.open(file_path)
+
+            # Apply OCR with multi-language support
+            text = pytesseract.image_to_string(image, lang=self.ocr_lang)
+
+            if text and len(text.strip()) > 0:
+                cleaned_text = self.clean_text(text)
+                logger.info(f"Successfully extracted {len(cleaned_text)} characters from {file_path}")
+                return cleaned_text
+            else:
+                logger.warning(f"OCR extraction returned empty content for {file_path}")
+                return ""
+
+        except Exception as e:
+            logger.error(f"Failed to extract content from image {file_path}: {e}", exc_info=True)
+            return ""
+
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read image metadata from EXIF and IPTC data.
+
+        Extracts standard image metadata fields including camera info, date taken,
+        copyright, etc.
+
+        Args:
+            file_path: Path to the image file
+
+        Returns:
+            Dictionary of metadata fields
+
+        Raises:
+            Exception: If metadata reading fails
+        """
+        metadata = {}
+
+        try:
+            # Get file extension to determine format
+            file_ext = file_path.lower().split('.')[-1]
+
+            # Try EXIF data
+            metadata = self._read_exif_metadata(file_path)
+
+            # For PNG files, try IPTC data
+            if file_ext in ['png']:
+                iptc_metadata = self._read_iptc_metadata(file_path)
+                metadata.update(iptc_metadata)
+
+            logger.info(f"Successfully read metadata from {file_path}")
+            return metadata
+
+        except Exception as e:
+            logger.error(f"Failed to read image metadata from {file_path}: {e}", exc_info=True)
+            return {}
+
+    def _read_exif_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read EXIF metadata from image.
+
+        Args:
+            file_path: Path to image file
+
+        Returns:
+            Dictionary of EXIF metadata
+        """
+        try:
+            # Try piexif first for JPEG
+            if file_path.lower().endswith(('.jpg', '.jpeg')):
+                try:
+                    exif_dict = piexif.load(file_path)
+                    metadata = {}
+
+                    # Extract commonly useful EXIF fields
+                    if "0th" in exif_dict:
+                        for tag, value in exif_dict["0th"].items():
+                            tag_name = piexif.TAGS["0th"][tag]["name"]
+                            try:
+                                if isinstance(value, bytes):
+                                    value = value.decode('utf-8', errors='ignore')
+                                metadata[tag_name.lower()] = str(value).strip()
+                            except Exception:
+                                pass
+
+                    return metadata
+                except Exception as e:
+                    logger.debug(f"piexif extraction failed: {e}")
+
+            # Fallback to PIL for all image types
+            image = Image.open(file_path)
+            metadata = {}
+
+            if hasattr(image, '_getexif') and image._getexif() is not None:
+                exif_data = image._getexif()
+                for tag_id, value in exif_data.items():
+                    tag_name = piexif.TAGS["0th"].get(tag_id, {}).get("name", f"tag_{tag_id}")
+                    if isinstance(value, bytes):
+                        value = value.decode('utf-8', errors='ignore')
+                    metadata[tag_name.lower()] = str(value).strip()
+
+            return metadata
+
+        except Exception as e:
+            logger.debug(f"EXIF metadata extraction failed: {e}")
+            return {}
+
+    def _read_iptc_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read IPTC metadata from image.
+
+        Args:
+            file_path: Path to image file
+
+        Returns:
+            Dictionary of IPTC metadata
+        """
+        try:
+            from PIL import Image
+            from PIL.PngImagePlugin import PngInfo
+
+            image = Image.open(file_path)
+            metadata = {}
+
+            # Check for PNG info
+            if hasattr(image, 'info'):
+                for key, value in image.info.items():
+                    if isinstance(value, bytes):
+                        value = value.decode('utf-8', errors='ignore')
+                    metadata[str(key).lower()] = str(value).strip()
+
+            return metadata
+
+        except Exception as e:
+            logger.debug(f"IPTC metadata extraction failed: {e}")
+            return {}
--- a/src/extractors/office_extractor.py
+++ b/src/extractors/office_extractor.py
@ -0,0 +1,207 @@
+"""Office document content and metadata extractor."""
+
+from docx import Document as DocxDocument
+from openpyxl import load_workbook
+from pptx import Presentation
+from typing import Dict
+
+from ..base_extractor import BaseExtractor
+from ..utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class OfficeExtractor(BaseExtractor):
+    """Extractor for Office files (DOCX, XLSX, PPTX)."""
+
+    SUPPORTED_FORMATS = ['docx', 'xlsx', 'pptx']
+
+    def extract_content(self, file_path: str) -> str:
+        """
+        Extract text content from Office document.
+
+        Routes to appropriate extraction method based on file format.
+
+        Args:
+            file_path: Path to the Office file
+
+        Returns:
+            Extracted text content
+        """
+        try:
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext == 'docx':
+                return self._extract_docx_content(file_path)
+            elif file_ext == 'xlsx':
+                return self._extract_xlsx_content(file_path)
+            elif file_ext == 'pptx':
+                return self._extract_pptx_content(file_path)
+            else:
+                logger.error(f"Unsupported Office format: {file_ext}")
+                return ""
+
+        except Exception as e:
+            logger.error(f"Failed to extract content from Office file {file_path}: {e}", exc_info=True)
+            return ""
+
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read metadata from Office document.
+
+        Routes to appropriate metadata reading method based on file format.
+
+        Args:
+            file_path: Path to the Office file
+
+        Returns:
+            Dictionary of metadata fields
+        """
+        try:
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext == 'docx':
+                return self._read_docx_metadata(file_path)
+            elif file_ext == 'xlsx':
+                return self._read_xlsx_metadata(file_path)
+            elif file_ext == 'pptx':
+                return self._read_pptx_metadata(file_path)
+            else:
+                logger.error(f"Unsupported Office format: {file_ext}")
+                return {}
+
+        except Exception as e:
+            logger.error(f"Failed to read metadata from Office file {file_path}: {e}", exc_info=True)
+            return {}
+
+    def _extract_docx_content(self, file_path: str) -> str:
+        """Extract text content from DOCX file."""
+        try:
+            logger.info(f"Extracting content from DOCX: {file_path}")
+            doc = DocxDocument(file_path)
+            paragraphs = [para.text for para in doc.paragraphs if para.text.strip()]
+            content = "\n".join(paragraphs)
+            cleaned_content = self.clean_text(content)
+            logger.info(f"Successfully extracted {len(cleaned_content)} characters from DOCX")
+            return cleaned_content
+        except Exception as e:
+            logger.error(f"Failed to extract DOCX content: {e}", exc_info=True)
+            return ""
+
+    def _extract_xlsx_content(self, file_path: str) -> str:
+        """Extract text content from XLSX file."""
+        try:
+            logger.info(f"Extracting content from XLSX: {file_path}")
+            workbook = load_workbook(file_path)
+            content_parts = []
+
+            for sheet_name in workbook.sheetnames:
+                sheet = workbook[sheet_name]
+                content_parts.append(f"Sheet: {sheet_name}")
+
+                for row in sheet.iter_rows(values_only=True):
+                    row_text = " | ".join(str(cell) if cell is not None else "" for cell in row)
+                    if row_text.strip():
+                        content_parts.append(row_text)
+
+            content = "\n".join(content_parts)
+            cleaned_content = self.clean_text(content)
+            logger.info(f"Successfully extracted {len(cleaned_content)} characters from XLSX")
+            return cleaned_content
+        except Exception as e:
+            logger.error(f"Failed to extract XLSX content: {e}", exc_info=True)
+            return ""
+
+    def _extract_pptx_content(self, file_path: str) -> str:
+        """Extract text content from PPTX file."""
+        try:
+            logger.info(f"Extracting content from PPTX: {file_path}")
+            presentation = Presentation(file_path)
+            content_parts = []
+
+            for slide_num, slide in enumerate(presentation.slides, 1):
+                content_parts.append(f"Slide {slide_num}:")
+
+                for shape in slide.shapes:
+                    if hasattr(shape, "text") and shape.text.strip():
+                        content_parts.append(shape.text)
+
+            content = "\n".join(content_parts)
+            cleaned_content = self.clean_text(content)
+            logger.info(f"Successfully extracted {len(cleaned_content)} characters from PPTX")
+            return cleaned_content
+        except Exception as e:
+            logger.error(f"Failed to extract PPTX content: {e}", exc_info=True)
+            return ""
+
+    def _read_docx_metadata(self, file_path: str) -> Dict[str, str]:
+        """Read metadata from DOCX file."""
+        try:
+            logger.info(f"Reading metadata from DOCX: {file_path}")
+            doc = DocxDocument(file_path)
+            core_props = doc.core_properties
+
+            metadata = {
+                'title': getattr(core_props, 'title', '') or '',
+                'subject': getattr(core_props, 'subject', '') or '',
+                'keywords': getattr(core_props, 'keywords', '') or '',
+                'author': getattr(core_props, 'author', '') or '',
+                'comments': getattr(core_props, 'comments', '') or '',
+                'category': getattr(core_props, 'category', '') or '',
+            }
+
+            # Remove empty values
+            metadata = {k: v for k, v in metadata.items() if v}
+            logger.info(f"Successfully read metadata from DOCX")
+            return metadata
+        except Exception as e:
+            logger.error(f"Failed to read DOCX metadata: {e}", exc_info=True)
+            return {}
+
+    def _read_xlsx_metadata(self, file_path: str) -> Dict[str, str]:
+        """Read metadata from XLSX file."""
+        try:
+            logger.info(f"Reading metadata from XLSX: {file_path}")
+            workbook = load_workbook(file_path)
+            props = workbook.properties
+
+            metadata = {
+                'title': getattr(props, 'title', '') or '',
+                'subject': getattr(props, 'subject', '') or '',
+                'keywords': getattr(props, 'keywords', '') or '',
+                'author': getattr(props, 'author', '') or '',
+                'comments': getattr(props, 'comments', '') or '',
+                'category': getattr(props, 'category', '') or '',
+            }
+
+            # Remove empty values
+            metadata = {k: v for k, v in metadata.items() if v}
+            logger.info(f"Successfully read metadata from XLSX")
+            return metadata
+        except Exception as e:
+            logger.error(f"Failed to read XLSX metadata: {e}", exc_info=True)
+            return {}
+
+    def _read_pptx_metadata(self, file_path: str) -> Dict[str, str]:
+        """Read metadata from PPTX file."""
+        try:
+            logger.info(f"Reading metadata from PPTX: {file_path}")
+            presentation = Presentation(file_path)
+            core_props = presentation.core_properties
+
+            metadata = {
+                'title': getattr(core_props, 'title', '') or '',
+                'subject': getattr(core_props, 'subject', '') or '',
+                'keywords': getattr(core_props, 'keywords', '') or '',
+                'author': getattr(core_props, 'author', '') or '',
+                'comments': getattr(core_props, 'comments', '') or '',
+                'category': getattr(core_props, 'category', '') or '',
+            }
+
+            # Remove empty values
+            metadata = {k: v for k, v in metadata.items() if v}
+            logger.info(f"Successfully read metadata from PPTX")
+            return metadata
+        except Exception as e:
+            logger.error(f"Failed to read PPTX metadata: {e}", exc_info=True)
+            return {}
--- a/src/extractors/pdf_extractor.py
+++ b/src/extractors/pdf_extractor.py
@ -0,0 +1,228 @@
+"""PDF content extractor."""
+
+import pypdf
+import pdfplumber
+from pdf2image import convert_from_path
+import pytesseract
+from typing import Dict
+from pathlib import Path
+import os
+
+from ..base_extractor import BaseExtractor
+from ..config import Config
+from ..utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class PDFExtractor(BaseExtractor):
+    """Extractor for PDF files with fallback to OCR."""
+
+    def __init__(self):
+        """Initialize PDF extractor."""
+        self.tesseract_path = Config.TESSERACT_PATH
+        if self.tesseract_path and os.path.exists(self.tesseract_path):
+            pytesseract.pytesseract.pytesseract_cmd = self.tesseract_path
+        self.max_pages = Config.PDF_MAX_PAGES
+
+    def extract_content(self, file_path: str) -> str:
+        """
+        Extract text content from PDF using multiple fallback strategies.
+
+        First tries pypdf, then pdfplumber, then OCR if both fail.
+        Limits extraction to the first MAX_PDF_PAGES pages.
+
+        Args:
+            file_path: Path to the PDF file
+
+        Returns:
+            Extracted text content
+
+        Raises:
+            Exception: If all extraction methods fail
+        """
+        try:
+            logger.info(f"Starting PDF extraction from {file_path}")
+
+            # Strategy 1: Try pypdf
+            content = self._extract_with_pypdf(file_path)
+            if content and len(content.strip()) > 100:
+                logger.info(f"Successfully extracted {len(content)} characters using pypdf")
+                return self.clean_text(content)
+
+            logger.debug("pypdf returned minimal content, trying pdfplumber")
+
+            # Strategy 2: Try pdfplumber
+            content = self._extract_with_pdfplumber(file_path)
+            if content and len(content.strip()) > 100:
+                logger.info(f"Successfully extracted {len(content)} characters using pdfplumber")
+                return self.clean_text(content)
+
+            logger.debug("pdfplumber returned minimal content, attempting OCR")
+
+            # Strategy 3: Try OCR as last resort
+            content = self._extract_with_ocr(file_path)
+            if content and len(content.strip()) > 50:
+                logger.info(f"Successfully extracted {len(content)} characters using OCR")
+                return self.clean_text(content)
+
+            logger.warning(f"All extraction methods returned minimal content for {file_path}")
+            return ""
+
+        except Exception as e:
+            logger.error(f"Failed to extract PDF content from {file_path}: {e}", exc_info=True)
+            return ""
+
+    def _extract_with_pypdf(self, file_path: str) -> str:
+        """
+        Extract text using pypdf library.
+
+        Args:
+            file_path: Path to PDF file
+
+        Returns:
+            Extracted text
+        """
+        try:
+            content = []
+            with open(file_path, 'rb') as f:
+                pdf_reader = pypdf.PdfReader(f)
+                num_pages = min(len(pdf_reader.pages), self.max_pages)
+
+                for page_num in range(num_pages):
+                    try:
+                        page = pdf_reader.pages[page_num]
+                        text = page.extract_text()
+                        if text:
+                            content.append(text)
+                    except Exception as e:
+                        logger.debug(f"Error extracting page {page_num} with pypdf: {e}")
+                        continue
+
+            return "\n".join(content)
+
+        except Exception as e:
+            logger.debug(f"pypdf extraction failed: {e}")
+            return ""
+
+    def _extract_with_pdfplumber(self, file_path: str) -> str:
+        """
+        Extract text using pdfplumber library.
+
+        Args:
+            file_path: Path to PDF file
+
+        Returns:
+            Extracted text
+        """
+        try:
+            content = []
+            with pdfplumber.open(file_path) as pdf:
+                num_pages = min(len(pdf.pages), self.max_pages)
+
+                for page_num in range(num_pages):
+                    try:
+                        page = pdf.pages[page_num]
+                        text = page.extract_text()
+                        if text:
+                            content.append(text)
+                    except Exception as e:
+                        logger.debug(f"Error extracting page {page_num} with pdfplumber: {e}")
+                        continue
+
+            return "\n".join(content)
+
+        except Exception as e:
+            logger.debug(f"pdfplumber extraction failed: {e}")
+            return ""
+
+    def _extract_with_ocr(self, file_path: str) -> str:
+        """
+        Extract text using OCR via pdf2image and pytesseract.
+
+        Args:
+            file_path: Path to PDF file
+
+        Returns:
+            Extracted text
+        """
+        try:
+            content = []
+
+            # Convert PDF pages to images
+            images = convert_from_path(file_path)
+
+            # Limit to max_pages
+            images = images[:self.max_pages]
+
+            # Get OCR languages from config (supports Chinese, Japanese, Korean, etc.)
+            ocr_lang = Config.OCR_LANGUAGES
+
+            # Apply OCR to each image
+            for page_num, image in enumerate(images):
+                try:
+                    text = pytesseract.image_to_string(image, lang=ocr_lang)
+                    if text:
+                        content.append(text)
+                except Exception as e:
+                    logger.debug(f"Error running OCR on page {page_num}: {e}")
+                    continue
+
+            return "\n".join(content)
+
+        except Exception as e:
+            logger.debug(f"OCR extraction failed: {e}")
+            return ""
+
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read PDF metadata from document properties.
+
+        Extracts standard PDF metadata fields: Title, Subject, Keywords, Author, Creator.
+
+        Args:
+            file_path: Path to PDF file
+
+        Returns:
+            Dictionary of metadata fields with lowercase keys
+
+        Raises:
+            Exception: If metadata reading fails
+        """
+        metadata = {}
+
+        try:
+            with open(file_path, 'rb') as f:
+                pdf_reader = pypdf.PdfReader(f)
+
+                # Get document information
+                doc_info = pdf_reader.metadata
+
+                if doc_info:
+                    # Map PDF metadata fields to standardized keys
+                    field_mapping = {
+                        '/Title': 'title',
+                        '/Subject': 'subject',
+                        '/Keywords': 'keywords',
+                        '/Author': 'author',
+                        '/Creator': 'creator',
+                    }
+
+                    for pdf_field, standard_field in field_mapping.items():
+                        try:
+                            value = doc_info.get(pdf_field)
+                            if value:
+                                # Convert bytes to string if necessary
+                                if isinstance(value, bytes):
+                                    value = value.decode('utf-8', errors='ignore')
+                                metadata[standard_field] = str(value).strip()
+                        except Exception as e:
+                            logger.debug(f"Error reading field {pdf_field}: {e}")
+                            continue
+
+            logger.info(f"Successfully read metadata from {file_path}")
+            return metadata
+
+        except Exception as e:
+            logger.error(f"Failed to read PDF metadata from {file_path}: {e}", exc_info=True)
+            return {}
--- a/src/extractors/video_extractor.py
+++ b/src/extractors/video_extractor.py
@ -0,0 +1,153 @@
+"""Video metadata extractor."""
+
+from typing import Dict
+
+from ..base_extractor import BaseExtractor
+from ..utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class VideoExtractor(BaseExtractor):
+    """Extractor for video files (MP4, MOV, AVI) - metadata extraction only."""
+
+    SUPPORTED_FORMATS = ['mp4', 'mov', 'avi', 'mkv', 'flv', 'wmv', 'webm']
+
+    def extract_content(self, file_path: str) -> str:
+        """
+        Extract text content from video (not supported).
+
+        Video files cannot be easily processed for text content without expensive
+        OCR/speech-to-text processing. This method returns empty string.
+
+        Args:
+            file_path: Path to the video file
+
+        Returns:
+            Empty string (not supported for video)
+        """
+        logger.info(f"Text extraction not supported for video files: {file_path}")
+        return ""
+
+    def read_metadata(self, file_path: str) -> Dict[str, str]:
+        """
+        Read metadata from video file using mutagen.
+
+        Extracts standard video metadata tags.
+
+        Args:
+            file_path: Path to the video file
+
+        Returns:
+            Dictionary of metadata fields
+        """
+        try:
+            logger.info(f"Reading metadata from video: {file_path}")
+            metadata = self._read_with_mutagen(file_path)
+            logger.info(f"Successfully read metadata from video")
+            return metadata
+
+        except Exception as e:
+            logger.error(f"Failed to read video metadata from {file_path}: {e}", exc_info=True)
+            return {}
+
+    def _read_with_mutagen(self, file_path: str) -> Dict[str, str]:
+        """
+        Read video metadata using mutagen.
+
+        Args:
+            file_path: Path to video file
+
+        Returns:
+            Dictionary of metadata
+        """
+        try:
+            from mutagen import File
+        except ImportError:
+            logger.warning("mutagen not installed, attempting pymediainfo fallback")
+            return self._read_with_pymediainfo(file_path)
+
+        try:
+            audio = File(file_path)
+            metadata = {}
+
+            if audio is not None:
+                # Extract common tags
+                tag_mapping = {
+                    'TIT2': 'title',
+                    '\xa9nam': 'title',
+                    'Title': 'title',
+                    'TIT3': 'subtitle',
+                    '\xa9cmt': 'comments',
+                    'Comments': 'comments',
+                    'TPE1': 'artist',
+                    '\xa9ART': 'artist',
+                    'Artist': 'artist',
+                    'TALB': 'album',
+                    '\xa9alb': 'album',
+                    'Album': 'album',
+                    'TXXX:KEYWORDS': 'keywords',
+                    'TXXX:Description': 'description',
+                }
+
+                for key, value in audio.items():
+                    # Check direct mapping
+                    if key in tag_mapping:
+                        standard_key = tag_mapping[key]
+                        if isinstance(value, list):
+                            value = value[0] if value else ""
+                        if value:
+                            metadata[standard_key] = str(value).strip()
+
+                    # Generic fallback for other tags
+                    elif isinstance(value, (list, tuple)):
+                        if value:
+                            metadata[key.lower()] = str(value[0]).strip()
+                    else:
+                        metadata[key.lower()] = str(value).strip()
+
+            return metadata
+
+        except Exception as e:
+            logger.debug(f"Mutagen extraction failed: {e}")
+            return self._read_with_pymediainfo(file_path)
+
+    def _read_with_pymediainfo(self, file_path: str) -> Dict[str, str]:
+        """
+        Read video metadata using pymediainfo.
+
+        Args:
+            file_path: Path to video file
+
+        Returns:
+            Dictionary of metadata
+        """
+        try:
+            from pymediainfo import MediaInfo
+        except ImportError:
+            logger.warning("pymediainfo not installed, cannot extract video metadata")
+            return {}
+
+        try:
+            media_info = MediaInfo.parse(file_path)
+            metadata = {}
+
+            # Extract from general track
+            for track in media_info.tracks:
+                if track.track_type == "General":
+                    if hasattr(track, 'title') and track.title:
+                        metadata['title'] = track.title
+                    if hasattr(track, 'comment') and track.comment:
+                        metadata['comments'] = track.comment
+                    if hasattr(track, 'performer') and track.performer:
+                        metadata['artist'] = track.performer
+                    if hasattr(track, 'description') and track.description:
+                        metadata['description'] = track.description
+
+                    break
+
+            return metadata
+
+        except Exception as e:
+            logger.debug(f"pymediainfo extraction failed: {e}")
+            return {}
--- a/src/field_mapper.py
+++ b/src/field_mapper.py
@ -0,0 +1,409 @@
+"""Field mapping with automatic detection and manual override."""
+
+import json
+from typing import Dict, List, Optional, Tuple
+from difflib import SequenceMatcher
+from pathlib import Path
+from .utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class FieldMapper:
+    """Map source fields to standard metadata fields with fuzzy matching."""
+
+    # Standard metadata fields used in Oliver Metadata Tool
+    STANDARD_FIELDS = ['title', 'subject', 'keywords', 'description']
+
+    # Common aliases for fuzzy matching (case-insensitive)
+    FIELD_ALIASES = {
+        'title': [
+            'title', 'name', 'heading', 'filename', 'file_name', 'document_title',
+            'asset_title', 'resource_title', 'object_name', 'label'
+        ],
+        'subject': [
+            'subject', 'description', 'summary', 'abstract', 'alt_text',
+            'external_description', 'caption', 'about', 'overview', 'details',
+            'desc', 'long_description', 'content'
+        ],
+        'keywords': [
+            'keywords', 'tags', 'categories', 'labels', 'subjects', 'topics',
+            'taxonomy', 'classification', 'key_words', 'search_terms'
+        ],
+        'description': [
+            'description', 'desc', 'summary', 'notes', 'comments', 'remarks',
+            'details', 'about', 'information', 'info'
+        ]
+    }
+
+    # Similarity threshold for fuzzy matching (0.0 to 1.0)
+    SIMILARITY_THRESHOLD = 0.6
+
+    def __init__(self, presets_path: Optional[str] = None):
+        """
+        Initialize field mapper.
+
+        Args:
+            presets_path: Path to JSON file for saving/loading mapping presets
+        """
+        self.presets_path = presets_path or 'field_mapping_presets.json'
+
+    def auto_map(self, source_fields: List[str], strict: bool = False) -> Dict[str, Tuple[str, float]]:
+        """
+        Automatically map source fields to standard fields using fuzzy matching.
+
+        Args:
+            source_fields: List of field names from source data
+            strict: If True, only accept matches above high confidence threshold (0.8)
+
+        Returns:
+            Dictionary mapping {source_field: (target_field, confidence_score)}
+            Example: {'File Name': ('title', 0.85), 'Alt Text': ('subject', 0.92)}
+        """
+        mapping = {}
+        threshold = 0.8 if strict else self.SIMILARITY_THRESHOLD
+
+        for source_field in source_fields:
+            best_match = self._find_best_match(source_field, threshold)
+            if best_match:
+                target_field, score = best_match
+                mapping[source_field] = (target_field, score)
+                logger.info(f"Auto-mapped '{source_field}' -> '{target_field}' (confidence: {score:.2f})")
+
+        return mapping
+
+    def _find_best_match(self, source_field: str, threshold: float = 0.6) -> Optional[Tuple[str, float]]:
+        """
+        Find best matching standard field for source field.
+
+        Args:
+            source_field: Source field name
+            threshold: Minimum similarity score (0.0 to 1.0)
+
+        Returns:
+            Tuple of (target_field, confidence_score) or None
+        """
+        source_lower = source_field.lower().replace(' ', '_').replace('-', '_')
+        best_score = 0.0
+        best_field = None
+
+        for standard_field, aliases in self.FIELD_ALIASES.items():
+            for alias in aliases:
+                # Calculate similarity score
+                score = SequenceMatcher(None, source_lower, alias).ratio()
+
+                # Exact match bonus
+                if source_lower == alias:
+                    score = 1.0
+
+                # Substring match bonus
+                elif alias in source_lower or source_lower in alias:
+                    score = max(score, 0.85)
+
+                if score > best_score and score >= threshold:
+                    best_score = score
+                    best_field = standard_field
+
+        if best_field:
+            return (best_field, best_score)
+        return None
+
+    def validate_mapping(self, mapping: Dict[str, str]) -> Dict[str, List[str]]:
+        """
+        Validate a field mapping configuration.
+
+        Args:
+            mapping: Dictionary mapping {source_field: target_field}
+
+        Returns:
+            Dictionary with validation results:
+            {
+                'valid': [list of valid mappings],
+                'invalid': [list of invalid mappings],
+                'warnings': [list of warnings]
+            }
+        """
+        result = {
+            'valid': [],
+            'invalid': [],
+            'warnings': []
+        }
+
+        # Track which target fields are used
+        target_usage = {}
+
+        for source_field, target_field in mapping.items():
+            # Check if target field is valid
+            if target_field not in self.STANDARD_FIELDS:
+                result['invalid'].append(
+                    f"'{target_field}' is not a valid target field (source: '{source_field}')"
+                )
+                continue
+
+            result['valid'].append(f"'{source_field}' -> '{target_field}'")
+
+            # Track multiple sources mapping to same target
+            if target_field in target_usage:
+                target_usage[target_field].append(source_field)
+            else:
+                target_usage[target_field] = [source_field]
+
+        # Warn about multiple sources mapping to same target
+        for target_field, sources in target_usage.items():
+            if len(sources) > 1:
+                result['warnings'].append(
+                    f"Multiple source fields map to '{target_field}': {', '.join(sources)}"
+                )
+
+        return result
+
+    def apply_mapping(self, data: Dict[str, str], mapping: Dict[str, str]) -> Dict[str, str]:
+        """
+        Apply field mapping to transform source data to standard format.
+
+        Args:
+            data: Source data dictionary
+            mapping: Field mapping {source_field: target_field}
+
+        Returns:
+            Transformed data with standard field names
+        """
+        result = {field: '' for field in self.STANDARD_FIELDS}
+
+        for source_field, target_field in mapping.items():
+            if source_field in data and target_field in self.STANDARD_FIELDS:
+                value = data[source_field]
+
+                # Handle multiple values mapping to same target (concatenate)
+                if result[target_field]:
+                    result[target_field] += f"; {value}"
+                else:
+                    result[target_field] = value
+
+        return result
+
+    def save_preset(self, name: str, mapping: Dict[str, str], description: str = ""):
+        """
+        Save mapping preset to file.
+
+        Args:
+            name: Preset name
+            mapping: Field mapping dictionary
+            description: Optional description
+        """
+        presets = self._load_presets()
+
+        presets[name] = {
+            'mapping': mapping,
+            'description': description,
+            'created_at': self._get_timestamp()
+        }
+
+        try:
+            with open(self.presets_path, 'w') as f:
+                json.dump(presets, f, indent=2)
+            logger.info(f"Saved mapping preset: {name}")
+        except Exception as e:
+            logger.error(f"Failed to save preset '{name}': {e}")
+            raise
+
+    def load_preset(self, name: str) -> Optional[Dict[str, str]]:
+        """
+        Load mapping preset from file.
+
+        Args:
+            name: Preset name
+
+        Returns:
+            Mapping dictionary or None if not found
+        """
+        presets = self._load_presets()
+
+        if name in presets:
+            logger.info(f"Loaded mapping preset: {name}")
+            return presets[name].get('mapping', {})
+
+        logger.warning(f"Preset not found: {name}")
+        return None
+
+    def list_presets(self) -> List[Dict[str, str]]:
+        """
+        List all saved presets.
+
+        Returns:
+            List of preset information dictionaries
+        """
+        presets = self._load_presets()
+
+        return [
+            {
+                'name': name,
+                'description': data.get('description', ''),
+                'created_at': data.get('created_at', ''),
+                'fields': len(data.get('mapping', {}))
+            }
+            for name, data in presets.items()
+        ]
+
+    def delete_preset(self, name: str) -> bool:
+        """
+        Delete a mapping preset.
+
+        Args:
+            name: Preset name
+
+        Returns:
+            True if deleted, False if not found
+        """
+        presets = self._load_presets()
+
+        if name in presets:
+            del presets[name]
+
+            try:
+                with open(self.presets_path, 'w') as f:
+                    json.dump(presets, f, indent=2)
+                logger.info(f"Deleted mapping preset: {name}")
+                return True
+            except Exception as e:
+                logger.error(f"Failed to delete preset '{name}': {e}")
+                raise
+
+        return False
+
+    def suggest_mapping(self, source_fields: List[str]) -> Dict:
+        """
+        Generate mapping suggestions with confidence scores and alternatives.
+
+        Args:
+            source_fields: List of source field names
+
+        Returns:
+            Dictionary with suggestions:
+            {
+                'source_field': {
+                    'best_match': 'target_field',
+                    'confidence': 0.85,
+                    'alternatives': [
+                        {'field': 'other_target', 'confidence': 0.65},
+                        ...
+                    ]
+                }
+            }
+        """
+        suggestions = {}
+
+        for source_field in source_fields:
+            # Find all potential matches
+            matches = self._find_all_matches(source_field)
+
+            if matches:
+                best_match = matches[0]
+                suggestions[source_field] = {
+                    'best_match': best_match[0],
+                    'confidence': best_match[1],
+                    'alternatives': [
+                        {'field': field, 'confidence': score}
+                        for field, score in matches[1:3]  # Top 2 alternatives
+                    ]
+                }
+            else:
+                suggestions[source_field] = {
+                    'best_match': None,
+                    'confidence': 0.0,
+                    'alternatives': []
+                }
+
+        return suggestions
+
+    def _find_all_matches(self, source_field: str, min_threshold: float = 0.4) -> List[Tuple[str, float]]:
+        """
+        Find all matching standard fields above threshold, sorted by score.
+
+        Args:
+            source_field: Source field name
+            min_threshold: Minimum similarity score
+
+        Returns:
+            List of (target_field, score) tuples sorted by score descending
+        """
+        source_lower = source_field.lower().replace(' ', '_').replace('-', '_')
+        matches = []
+
+        for standard_field, aliases in self.FIELD_ALIASES.items():
+            best_score = 0.0
+
+            for alias in aliases:
+                score = SequenceMatcher(None, source_lower, alias).ratio()
+
+                # Exact match
+                if source_lower == alias:
+                    score = 1.0
+                # Substring match
+                elif alias in source_lower or source_lower in alias:
+                    score = max(score, 0.85)
+
+                best_score = max(best_score, score)
+
+            if best_score >= min_threshold:
+                matches.append((standard_field, best_score))
+
+        # Sort by score descending
+        matches.sort(key=lambda x: x[1], reverse=True)
+        return matches
+
+    def _load_presets(self) -> Dict:
+        """Load all presets from file."""
+        if Path(self.presets_path).exists():
+            try:
+                with open(self.presets_path, 'r') as f:
+                    return json.load(f)
+            except Exception as e:
+                logger.error(f"Failed to load presets: {e}")
+                return {}
+        return {}
+
+    def _get_timestamp(self) -> str:
+        """Get current timestamp as ISO format string."""
+        from datetime import datetime
+        return datetime.now().isoformat()
+
+    def get_unmapped_fields(self, source_fields: List[str], mapping: Dict[str, str]) -> List[str]:
+        """
+        Get list of source fields that are not mapped.
+
+        Args:
+            source_fields: All source field names
+            mapping: Current mapping dictionary
+
+        Returns:
+            List of unmapped source fields
+        """
+        return [field for field in source_fields if field not in mapping]
+
+    def get_mapping_coverage(self, source_fields: List[str], mapping: Dict[str, str]) -> Dict:
+        """
+        Calculate mapping coverage statistics.
+
+        Args:
+            source_fields: All source field names
+            mapping: Current mapping dictionary
+
+        Returns:
+            Statistics dictionary with coverage info
+        """
+        total_fields = len(source_fields)
+        mapped_fields = len(mapping)
+        unmapped = self.get_unmapped_fields(source_fields, mapping)
+
+        # Count unique target fields used
+        unique_targets = len(set(mapping.values()))
+
+        return {
+            'total_source_fields': total_fields,
+            'mapped_fields': mapped_fields,
+            'unmapped_fields': len(unmapped),
+            'coverage_percent': (mapped_fields / total_fields * 100) if total_fields > 0 else 0,
+            'unique_targets_used': unique_targets,
+            'unmapped_field_list': unmapped
+        }
--- a/src/file_detector.py
+++ b/src/file_detector.py
@ -0,0 +1,97 @@
+"""File type detection and routing."""
+
+from enum import Enum
+from pathlib import Path
+from typing import Optional
+import mimetypes
+
+class FileType(Enum):
+    """Supported file types."""
+    PDF = "pdf"
+    IMAGE = "image"
+    OFFICE_DOC = "office_doc"
+    OFFICE_SHEET = "office_sheet"
+    OFFICE_PRESENTATION = "office_presentation"
+    VIDEO = "video"
+    UNSUPPORTED = "unsupported"
+
+class FileDetector:
+    """Detect file type and route to appropriate handlers."""
+
+    # File extension mappings
+    PDF_EXTENSIONS = {'.pdf'}
+    IMAGE_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.tiff', '.tif', '.bmp', '.webp'}
+    OFFICE_DOC_EXTENSIONS = {'.docx'}
+    OFFICE_SHEET_EXTENSIONS = {'.xlsx'}
+    OFFICE_PRESENTATION_EXTENSIONS = {'.pptx'}
+    VIDEO_EXTENSIONS = {'.mp4', '.mov', '.avi', '.mkv', '.m4v', '.wmv'}
+
+    @classmethod
+    def detect_file_type(cls, file_path: str) -> FileType:
+        """
+        Detect file type based on extension and MIME type.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            FileType enum value
+        """
+        path = Path(file_path)
+
+        if not path.exists():
+            raise FileNotFoundError(f"File not found: {file_path}")
+
+        extension = path.suffix.lower()
+
+        # Check by extension first
+        if extension in cls.PDF_EXTENSIONS:
+            return FileType.PDF
+        elif extension in cls.IMAGE_EXTENSIONS:
+            return FileType.IMAGE
+        elif extension in cls.OFFICE_DOC_EXTENSIONS:
+            return FileType.OFFICE_DOC
+        elif extension in cls.OFFICE_SHEET_EXTENSIONS:
+            return FileType.OFFICE_SHEET
+        elif extension in cls.OFFICE_PRESENTATION_EXTENSIONS:
+            return FileType.OFFICE_PRESENTATION
+        elif extension in cls.VIDEO_EXTENSIONS:
+            return FileType.VIDEO
+
+        # Fallback to MIME type check
+        mime_type, _ = mimetypes.guess_type(str(path))
+        if mime_type:
+            if 'pdf' in mime_type:
+                return FileType.PDF
+            elif 'image' in mime_type:
+                return FileType.IMAGE
+            elif 'video' in mime_type:
+                return FileType.VIDEO
+            elif 'officedocument.wordprocessingml' in mime_type:
+                return FileType.OFFICE_DOC
+            elif 'officedocument.spreadsheetml' in mime_type:
+                return FileType.OFFICE_SHEET
+            elif 'officedocument.presentationml' in mime_type:
+                return FileType.OFFICE_PRESENTATION
+
+        return FileType.UNSUPPORTED
+
+    @classmethod
+    def is_supported(cls, file_path: str) -> bool:
+        """Check if file type is supported."""
+        file_type = cls.detect_file_type(file_path)
+        return file_type != FileType.UNSUPPORTED
+
+    @classmethod
+    def get_file_type_name(cls, file_type: FileType) -> str:
+        """Get human-readable file type name."""
+        type_names = {
+            FileType.PDF: "PDF Document",
+            FileType.IMAGE: "Image",
+            FileType.OFFICE_DOC: "Word Document",
+            FileType.OFFICE_SHEET: "Excel Spreadsheet",
+            FileType.OFFICE_PRESENTATION: "PowerPoint Presentation",
+            FileType.VIDEO: "Video",
+            FileType.UNSUPPORTED: "Unsupported File"
+        }
+        return type_names.get(file_type, "Unknown")
--- a/src/main.py
+++ b/src/main.py
@ -0,0 +1,293 @@
+#!/usr/bin/env python3
+"""Main CLI application for metadata automation."""
+
+import sys
+import argparse
+from pathlib import Path
+from typing import List, Dict
+from tqdm import tqdm
+import csv
+from datetime import datetime
+
+# Import project modules
+from .config import Config
+from .file_detector import FileDetector, FileType
+from .metadata_analyzer import MetadataAnalyzer
+from .utils import (
+    create_backup, get_logger, format_metadata_comparison,
+    validate_file_path, create_report_entry
+)
+
+# Import extractors
+from .extractors.pdf_extractor import PDFExtractor
+from .extractors.image_extractor import ImageExtractor
+from .extractors.office_extractor import OfficeExtractor
+from .extractors.video_extractor import VideoExtractor
+
+# Import updaters
+from .updaters.pdf_updater import PDFUpdater
+from .updaters.image_updater import ImageUpdater
+from .updaters.office_updater import OfficeUpdater
+from .updaters.video_updater import VideoUpdater
+
+logger = get_logger(__name__)
+
+class MetadataProcessor:
+    """Main processor for metadata automation."""
+
+    def __init__(self, preview_mode: bool = False):
+        """
+        Initialize the processor.
+
+        Args:
+            preview_mode: If True, show changes without applying them
+        """
+        self.preview_mode = preview_mode
+        self.analyzer = MetadataAnalyzer()
+
+        # Initialize extractors and updaters
+        self.extractors = {
+            FileType.PDF: PDFExtractor(),
+            FileType.IMAGE: ImageExtractor(),
+            FileType.OFFICE_DOC: OfficeExtractor(),
+            FileType.OFFICE_SHEET: OfficeExtractor(),
+            FileType.OFFICE_PRESENTATION: OfficeExtractor(),
+            FileType.VIDEO: VideoExtractor()
+        }
+
+        self.updaters = {
+            FileType.PDF: PDFUpdater(),
+            FileType.IMAGE: ImageUpdater(),
+            FileType.OFFICE_DOC: OfficeUpdater(),
+            FileType.OFFICE_SHEET: OfficeUpdater(),
+            FileType.OFFICE_PRESENTATION: OfficeUpdater(),
+            FileType.VIDEO: VideoUpdater()
+        }
+
+        self.report_data = []
+
+    def process_file(self, file_path: str) -> bool:
+        """
+        Process a single file.
+
+        Args:
+            file_path: Path to the file
+
+        Returns:
+            True if successful
+        """
+        try:
+            logger.info(f"\nProcessing: {file_path}")
+
+            # Validate file
+            if not validate_file_path(file_path):
+                logger.error(f"Invalid file path: {file_path}")
+                return False
+
+            # Detect file type
+            file_type = FileDetector.detect_file_type(file_path)
+
+            if file_type == FileType.UNSUPPORTED:
+                logger.warning(f"Unsupported file type: {file_path}")
+                return False
+
+            logger.info(f"File type: {FileDetector.get_file_type_name(file_type)}")
+
+            # Get appropriate extractor
+            extractor = self.extractors.get(file_type)
+            if not extractor:
+                logger.error(f"No extractor found for {file_type}")
+                return False
+
+            # Extract content and current metadata
+            logger.info("Extracting content...")
+            content = extractor.extract_content(file_path)
+
+            if not content or len(content.strip()) < 10:
+                logger.warning("Insufficient content extracted, using filename only")
+                content = Path(file_path).stem
+
+            logger.info(f"Extracted {len(content)} characters")
+
+            logger.info("Reading current metadata...")
+            old_metadata = extractor.read_metadata(file_path)
+
+            # Analyze content and generate new metadata
+            logger.info("Analyzing content with AI...")
+            filename = Path(file_path).name
+            new_metadata = self.analyzer.analyze_content(content, filename, file_type)
+
+            # Display comparison
+            print(format_metadata_comparison(old_metadata, new_metadata))
+
+            # Store report data
+            self.report_data.append(
+                create_report_entry(
+                    file_path, file_type.value, old_metadata, new_metadata,
+                    "preview" if self.preview_mode else "pending"
+                )
+            )
+
+            # Update metadata if not in preview mode
+            if not self.preview_mode:
+                updater = self.updaters.get(file_type)
+                if not updater:
+                    logger.error(f"No updater found for {file_type}")
+                    return False
+
+                logger.info("Updating metadata...")
+                success = updater.update_metadata(file_path, new_metadata, backup=True)
+
+                if success:
+                    logger.info("✓ Metadata updated successfully!")
+                    self.report_data[-1]['status'] = 'success'
+
+                    # Verify metadata
+                    if updater.verify_metadata(file_path, new_metadata):
+                        logger.info("✓ Metadata verified!")
+                    else:
+                        logger.warning("⚠ Metadata verification failed")
+                else:
+                    logger.error("✗ Failed to update metadata")
+                    self.report_data[-1]['status'] = 'failed'
+                    return False
+            else:
+                logger.info("[PREVIEW MODE] Changes not applied")
+
+            return True
+
+        except Exception as e:
+            logger.error(f"Error processing {file_path}: {e}", exc_info=True)
+            return False
+
+    def process_directory(self, directory: str, recursive: bool = False) -> Dict[str, int]:
+        """
+        Process all supported files in a directory.
+
+        Args:
+            directory: Path to directory
+            recursive: Process subdirectories
+
+        Returns:
+            Dictionary with processing statistics
+        """
+        dir_path = Path(directory)
+
+        if not dir_path.exists() or not dir_path.is_dir():
+            logger.error(f"Invalid directory: {directory}")
+            return {}
+
+        # Find all files
+        pattern = '**/*' if recursive else '*'
+        all_files = list(dir_path.glob(pattern))
+
+        # Filter supported files
+        supported_files = [
+            f for f in all_files
+            if f.is_file() and FileDetector.is_supported(str(f))
+        ]
+
+        logger.info(f"Found {len(supported_files)} supported files")
+
+        # Process files with progress bar
+        stats = {'success': 0, 'failed': 0, 'total': len(supported_files)}
+
+        for file_path in tqdm(supported_files, desc="Processing files"):
+            if self.process_file(str(file_path)):
+                stats['success'] += 1
+            else:
+                stats['failed'] += 1
+
+        return stats
+
+    def save_report(self, output_path: str = None):
+        """Save processing report to CSV."""
+        if not self.report_data:
+            logger.info("No report data to save")
+            return
+
+        if not output_path:
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            output_path = Config.REPORTS_DIR / f"metadata_report_{timestamp}.csv"
+
+        output_path = Path(output_path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+
+        with open(output_path, 'w', newline='', encoding='utf-8') as f:
+            if self.report_data:
+                writer = csv.DictWriter(f, fieldnames=self.report_data[0].keys())
+                writer.writeheader()
+                writer.writerows(self.report_data)
+
+        logger.info(f"Report saved to: {output_path}")
+
+def main():
+    """Main CLI entry point."""
+    parser = argparse.ArgumentParser(
+        description='Universal Metadata Automation Tool',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Process single file
+  python -m src.main file.pdf
+
+  # Preview changes without applying
+  python -m src.main --preview file.pdf
+
+  # Process entire directory
+  python -m src.main --directory ./files
+
+  # Process directory recursively
+  python -m src.main --directory ./files --recursive
+
+  # Save report
+  python -m src.main file.pdf --report report.csv
+        """
+    )
+
+    parser.add_argument('input', nargs='?', help='Input file or directory')
+    parser.add_argument('--directory', '-d', help='Process entire directory')
+    parser.add_argument('--recursive', '-r', action='store_true', help='Process subdirectories')
+    parser.add_argument('--preview', '-p', action='store_true', help='Preview mode (no changes)')
+    parser.add_argument('--report', help='Save report to CSV file')
+
+    args = parser.parse_args()
+
+    # Validate input
+    if not args.input and not args.directory:
+        parser.print_help()
+        sys.exit(1)
+
+    # Initialize processor
+    processor = MetadataProcessor(preview_mode=args.preview)
+
+    try:
+        # Process input
+        if args.directory:
+            stats = processor.process_directory(args.directory, args.recursive)
+            print(f"\n{'='*60}")
+            print(f"BATCH PROCESSING RESULTS")
+            print(f"{'='*60}")
+            print(f"Total files: {stats.get('total', 0)}")
+            print(f"Successful: {stats.get('success', 0)}")
+            print(f"Failed: {stats.get('failed', 0)}")
+            print(f"{'='*60}\n")
+        elif args.input:
+            success = processor.process_file(args.input)
+            sys.exit(0 if success else 1)
+
+        # Save report
+        if args.report:
+            processor.save_report(args.report)
+        elif processor.report_data:
+            processor.save_report()
+
+    except KeyboardInterrupt:
+        print("\n\nOperation cancelled by user")
+        sys.exit(1)
+    except Exception as e:
+        logger.error(f"Fatal error: {e}", exc_info=True)
+        sys.exit(1)
+
+if __name__ == '__main__':
+    main()
--- a/src/metadata_analyzer.py
+++ b/src/metadata_analyzer.py
@ -0,0 +1,424 @@
+"""AI-powered metadata analysis using OpenAI GPT with production-ready features."""
+
+import json
+from openai import OpenAI
+from typing import Dict, Optional
+from .config import Config
+from .file_detector import FileType
+from .utils import get_logger, sanitize_metadata_value
+
+# Production-ready imports
+try:
+    import tiktoken
+    TIKTOKEN_AVAILABLE = True
+except ImportError:
+    TIKTOKEN_AVAILABLE = False
+
+try:
+    from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
+    TENACITY_AVAILABLE = True
+except ImportError:
+    TENACITY_AVAILABLE = False
+
+logger = get_logger(__name__)
+
+class MetadataAnalyzer:
+    """Analyze content and generate metadata using OpenAI GPT with production-ready error handling."""
+
+    # Valid OpenAI models (as of January 2026)
+    VALID_MODELS = [
+        # GPT-5 models (2026 release)
+        'gpt-5', 'gpt-5-mini', 'gpt-5-nano',
+        'gpt-5-mini-2025-08-07', 'gpt-5-nano-2025-08-07',
+        # GPT-4 models
+        'gpt-4o', 'gpt-4o-mini', 'gpt-4o-mini-2024-07-18',
+        'gpt-4-turbo', 'gpt-4', 'gpt-3.5-turbo',
+        # Reasoning models
+        'o1', 'o1-mini', 'o1-preview'
+    ]
+
+    def __init__(self):
+        """Initialize the analyzer with OpenAI client."""
+        if not Config.OPENAI_API_KEY:
+            raise ValueError("OpenAI API key not configured")
+
+        self.client = OpenAI(api_key=Config.OPENAI_API_KEY)
+        self.model = Config.AI_MODEL
+
+        # Validate model name
+        if not self._is_valid_model(self.model):
+            logger.warning(f"⚠️  Model '{self.model}' may not be valid. Valid models: {', '.join(self.VALID_MODELS)}")
+            logger.warning(f"⚠️  Using fallback model: gpt-4o-mini")
+            self.model = 'gpt-4o-mini'
+
+        self.max_tokens = Config.MAX_TOKENS
+        self.temperature = Config.TEMPERATURE
+
+        logger.info(f"Initialized MetadataAnalyzer with model: {self.model}")
+
+        # Initialize tiktoken encoding for proper token counting
+        if TIKTOKEN_AVAILABLE:
+            try:
+                self.encoding = tiktoken.encoding_for_model(self.model)
+            except KeyError:
+                # Fallback for models not in tiktoken registry
+                self.encoding = tiktoken.get_encoding("cl100k_base")
+        else:
+            self.encoding = None
+            logger.warning("tiktoken not available - using character-based truncation")
+
+    def _count_tokens(self, text: str) -> int:
+        """Count tokens using tiktoken (proper tokenization)."""
+        if self.encoding:
+            return len(self.encoding.encode(text))
+        else:
+            # Fallback: rough estimate (1 token ≈ 4 characters)
+            return len(text) // 4
+
+    def _truncate_content(self, content: str, max_tokens: int = 3000) -> str:
+        """Intelligently truncate content to fit token limit."""
+        if not self.encoding:
+            # Character-based fallback
+            max_chars = max_tokens * 4
+            if len(content) <= max_chars:
+                return content
+            return content[:max_chars]
+
+        tokens = self.encoding.encode(content)
+        if len(tokens) <= max_tokens:
+            return content
+
+        # Truncate and decode back
+        truncated_tokens = tokens[:max_tokens]
+        return self.encoding.decode(truncated_tokens)
+
+    def _is_valid_model(self, model: str) -> bool:
+        """Check if model name is valid."""
+        # Exact match
+        if model in self.VALID_MODELS:
+            return True
+        # Check if it starts with a valid prefix (for dated versions)
+        for valid_model in self.VALID_MODELS:
+            if model.startswith(valid_model):
+                return True
+        return False
+
+    def _is_new_model(self) -> bool:
+        """
+        Check if model is a new generation model.
+        New models (GPT-5, GPT-4o, o1) use max_completion_tokens and don't support custom temperature.
+        """
+        new_models = ['gpt-5', 'gpt-4o', 'gpt-4-turbo', 'o1']
+        return any(self.model.startswith(prefix) for prefix in new_models)
+
+    def _get_api_params(self) -> dict:
+        """
+        Get the correct API parameters based on model.
+        Newer models (GPT-5, GPT-4o, o1) use max_completion_tokens and don't support custom temperature.
+        Older models (GPT-3.5-turbo) use max_tokens and support temperature.
+        """
+        params = {}
+
+        # Token parameter
+        if self._is_new_model():
+            params['max_completion_tokens'] = self.max_tokens
+            # New models (GPT-5, GPT-4o, o1) don't support custom temperature (only default value 1)
+            logger.debug(f"Using max_completion_tokens for {self.model}")
+        else:
+            params['max_tokens'] = self.max_tokens
+            params['temperature'] = self.temperature
+            logger.debug(f"Using max_tokens + temperature for {self.model}")
+
+        return params
+
+    def _call_openai_api(self, messages: list) -> dict:
+        """
+        Call OpenAI API with automatic retry on failures.
+        Uses tenacity for exponential backoff if available.
+        """
+        # Get the correct API parameters
+        api_params = self._get_api_params()
+
+        if TENACITY_AVAILABLE:
+            # Use retry decorator dynamically
+            retry_decorator = retry(
+                stop=stop_after_attempt(Config.API_MAX_RETRIES),
+                wait=wait_exponential(multiplier=Config.API_RETRY_DELAY, min=2, max=10),
+                retry=retry_if_exception_type((Exception,)),
+                reraise=True
+            )
+
+            @retry_decorator
+            def _api_call():
+                return self.client.chat.completions.create(
+                    model=self.model,
+                    messages=messages,
+                    timeout=Config.API_TIMEOUT,
+                    **api_params
+                )
+
+            return _api_call()
+        else:
+            # Fallback: simple retry without exponential backoff
+            import time
+            last_error = None
+
+            for attempt in range(Config.API_MAX_RETRIES):
+                try:
+                    return self.client.chat.completions.create(
+                        model=self.model,
+                        messages=messages,
+                        timeout=Config.API_TIMEOUT,
+                        **api_params
+                    )
+                except Exception as e:
+                    last_error = e
+                    if attempt < Config.API_MAX_RETRIES - 1:
+                        wait_time = Config.API_RETRY_DELAY * (2 ** attempt)
+                        logger.warning(f"API call failed (attempt {attempt + 1}/{Config.API_MAX_RETRIES}), retrying in {wait_time}s: {e}")
+                        time.sleep(wait_time)
+
+            raise last_error
+
+    def analyze_content(self, content: str, filename: str, file_type: FileType) -> Dict[str, str]:
+        """
+        Analyze content and generate appropriate metadata with production-ready error handling.
+
+        Args:
+            content: Extracted text content
+            filename: Original filename
+            file_type: Type of file
+
+        Returns:
+            Dictionary with metadata (title, subject, keywords, _tokens_used, _confidence)
+        """
+        try:
+            # Truncate content if needed with proper token counting
+            content_tokens = self._count_tokens(content)
+            if content_tokens > Config.MAX_TEXT_LENGTH:
+                content = self._truncate_content(content, Config.MAX_TEXT_LENGTH)
+                logger.info(f"Truncated content from {content_tokens} to {self._count_tokens(content)} tokens")
+
+            # Generate prompt based on file type
+            prompt = self._create_prompt(content, filename, file_type)
+
+            # Count total tokens before API call
+            prompt_tokens = self._count_tokens(prompt)
+            logger.info(f"API call for {filename}: {prompt_tokens} prompt tokens")
+
+            # Call API with retry logic
+            response = self._call_openai_api([
+                {"role": "system", "content": "You are a metadata expert who generates professional, accurate metadata for documents in English."},
+                {"role": "user", "content": prompt}
+            ])
+
+            # Parse response with detailed logging
+            logger.info(f"API Response for {filename}:")
+            logger.info(f"  - Model used: {response.model}")
+            logger.info(f"  - Finish reason: {response.choices[0].finish_reason}")
+            logger.info(f"  - Tokens: prompt={response.usage.prompt_tokens}, completion={response.usage.completion_tokens}, total={response.usage.total_tokens}")
+
+            metadata_text = response.choices[0].message.content
+            logger.info(f"  - Content length: {len(metadata_text) if metadata_text else 0} chars")
+            logger.info(f"  - Content preview: {metadata_text[:200] if metadata_text else '(empty)'}")
+
+            # Check if content is None or empty
+            if not metadata_text or len(metadata_text.strip()) == 0:
+                logger.error(f"❌ API returned empty content for {filename}!")
+                logger.error(f"   This usually means:")
+                logger.error(f"   1. Invalid model name: {self.model}")
+                logger.error(f"   2. Model doesn't support this request type")
+                logger.error(f"   3. Content was filtered/refused")
+                logger.error(f"   Using fallback metadata instead.")
+                return self._generate_fallback_metadata(filename, file_type)
+
+            metadata = self._parse_metadata_response(metadata_text)
+
+            # Sanitize metadata values
+            metadata = {
+                key: sanitize_metadata_value(value)
+                for key, value in metadata.items()
+            }
+
+            # Add metadata about the generation
+            metadata['_tokens_used'] = response.usage.total_tokens
+            metadata['_confidence'] = 0.9  # Could calculate based on response
+
+            logger.info(f"Generated metadata for {filename} (tokens used: {metadata['_tokens_used']})")
+            return metadata
+
+        except Exception as e:
+            logger.error(f"Error analyzing content for {filename}: {e}")
+            # Return fallback metadata with error info
+            fallback = self._generate_fallback_metadata(filename, file_type)
+            fallback['_ai_error'] = str(e)
+            fallback['_tokens_used'] = 0
+            return fallback
+
+    def _create_prompt(self, content: str, filename: str, file_type: FileType) -> str:
+        """Create AI prompt based on file type."""
+        file_type_descriptions = {
+            FileType.PDF: "PDF document",
+            FileType.IMAGE: "image file",
+            FileType.OFFICE_DOC: "Word document",
+            FileType.OFFICE_SHEET: "Excel spreadsheet",
+            FileType.OFFICE_PRESENTATION: "PowerPoint presentation",
+            FileType.VIDEO: "video file"
+        }
+
+        file_desc = file_type_descriptions.get(file_type, "file")
+
+        prompt = f"""Analyze the following {file_desc} content and generate professional metadata in English.
+
+Filename: {filename}
+Content: {content}
+
+Generate metadata with these fields:
+1. Title: A concise, professional title (50-100 characters) that clearly describes the document/content
+2. Subject: A brief description (1-2 sentences) of the document's purpose and content
+3. Keywords: 5-10 relevant keywords separated by commas (include product names, categories, topics)
+
+Rules:
+- All text MUST be in English
+- Title should identify the main product/service and document type (e.g., "guide", "brochure", "manual")
+- Subject should explain what the document is about and its purpose
+- Keywords should be searchable terms relevant to the content
+- Be professional and concise
+- Return ONLY a JSON object with fields: title, subject, keywords
+
+Example output format:
+{{
+  "title": "3M Filtek Universal Restorative - Shade Selection Guide",
+  "subject": "Shade selection guide for 3M Filtek Universal Restorative dental material",
+  "keywords": "Filtek, Universal Restorative, shade selection, dental, restorative material, 3M, dentistry, composite"
+}}
+
+Return only the JSON object, no additional text."""
+
+        return prompt
+
+    def _parse_metadata_response(self, response_text: str) -> Dict[str, str]:
+        """Parse AI response into metadata dictionary."""
+        try:
+            # Try to parse as JSON first
+            response_text = response_text.strip()
+            logger.info(f"Parsing response (length={len(response_text)}): {response_text[:200]}")
+
+            # Remove markdown code blocks if present
+            if response_text.startswith('```'):
+                lines = response_text.split('\n')
+                # Find first and last code block markers
+                start_idx = 0
+                end_idx = len(lines)
+                for i, line in enumerate(lines):
+                    if line.startswith('```'):
+                        if start_idx == 0:
+                            start_idx = i + 1
+                        else:
+                            end_idx = i
+                            break
+                response_text = '\n'.join(lines[start_idx:end_idx])
+
+            # Try to find JSON object in text
+            # Look for { ... } pattern
+            start = response_text.find('{')
+            end = response_text.rfind('}')
+            if start != -1 and end != -1:
+                json_str = response_text[start:end+1]
+                metadata = json.loads(json_str)
+            else:
+                metadata = json.loads(response_text)
+
+            # Ensure all required fields are present
+            required_fields = ['title', 'subject', 'keywords']
+            for field in required_fields:
+                if field not in metadata:
+                    metadata[field] = ""
+
+            # Validate that we got actual content
+            if not metadata.get('title') or len(metadata.get('title', '').strip()) < 3:
+                logger.warning("JSON parsed but title is empty or too short, using text parsing")
+                return self._parse_metadata_text(response_text)
+
+            return metadata
+
+        except (json.JSONDecodeError, ValueError, KeyError) as e:
+            logger.warning(f"Failed to parse JSON response ({str(e)}), using text parsing")
+            return self._parse_metadata_text(response_text)
+
+    def _parse_metadata_text(self, text: str) -> Dict[str, str]:
+        """Parse metadata from plain text response."""
+        metadata = {
+            'title': '',
+            'subject': '',
+            'keywords': ''
+        }
+
+        # Improved text parsing
+        lines = text.split('\n')
+
+        for line in lines:
+            line = line.strip()
+            if not line or line.startswith('#') or line.startswith('//'):
+                continue
+
+            # Remove quotes and extra whitespace
+            line_clean = line.strip('"\'')
+
+            # Look for field indicators (case insensitive)
+            line_lower = line_clean.lower()
+
+            if ':' in line_clean:
+                parts = line_clean.split(':', 1)
+                key = parts[0].strip().lower()
+                value = parts[1].strip().strip('",\'')
+
+                if 'title' in key and not metadata['title']:
+                    metadata['title'] = value
+                elif 'subject' in key and not metadata['subject']:
+                    metadata['subject'] = value
+                elif 'keyword' in key and not metadata['keywords']:
+                    metadata['keywords'] = value
+
+        # If still empty, try to extract from unstructured text
+        if not metadata['title']:
+            # Look for first substantial line as title
+            for line in lines:
+                line = line.strip().strip('"\'')
+                if len(line) > 10 and not line.lower().startswith(('title', 'subject', 'keyword')):
+                    metadata['title'] = line[:200]  # Limit length
+                    break
+
+        logger.info(f"Text parsing result: title='{metadata['title'][:50]}...', subject='{metadata['subject'][:50]}...'")
+        return metadata
+
+    def _generate_fallback_metadata(self, filename: str, file_type: FileType) -> Dict[str, str]:
+        """Generate basic metadata based on filename when AI fails."""
+        # Remove extension and clean filename
+        from pathlib import Path
+        clean_name = Path(filename).stem.replace('_', ' ').replace('-', ' ')
+
+        return {
+            'title': clean_name,
+            'subject': f"{clean_name} - {FileType(file_type).value}",
+            'keywords': clean_name.replace(' ', ', ')
+        }
+
+    def generate_metadata_for_pdf(self, text: str) -> Dict[str, str]:
+        """Specialized metadata generation for PDF documents."""
+        # Wrapper for PDF-specific logic if needed
+        return self.analyze_content(text, "document.pdf", FileType.PDF)
+
+    def generate_metadata_for_image(self, text: str) -> Dict[str, str]:
+        """Specialized metadata generation for images."""
+        return self.analyze_content(text, "image.jpg", FileType.IMAGE)
+
+    def generate_metadata_for_office(self, text: str) -> Dict[str, str]:
+        """Specialized metadata generation for Office documents."""
+        return self.analyze_content(text, "document.docx", FileType.OFFICE_DOC)
+
+    def generate_metadata_for_video(self, metadata: Dict[str, str]) -> Dict[str, str]:
+        """Specialized metadata generation for videos."""
+        # For videos, we might use existing metadata as input
+        text = f"Video title: {metadata.get('title', 'N/A')}"
+        return self.analyze_content(text, "video.mp4", FileType.VIDEO)
--- a/src/metadata_importer.py
+++ b/src/metadata_importer.py
@ -0,0 +1,427 @@
+"""Metadata importer for external files (CSV, Excel, JSON)."""
+
+import pandas as pd
+import json
+from pathlib import Path
+from typing import Dict, Optional, List, Tuple
+from .utils import get_logger
+from .field_mapper import FieldMapper
+
+logger = get_logger(__name__)
+
+
+class MetadataImporter:
+    """Import metadata from various file formats (CSV, Excel, JSON)."""
+
+    def import_from_csv(self, csv_path: str) -> Dict[str, Dict]:
+        """
+        Import metadata from CSV file.
+        Expected columns: filename, title, subject/description, keywords
+
+        Args:
+            csv_path: Path to CSV file
+
+        Returns:
+            Dictionary mapping filename stems to metadata dicts
+        """
+        try:
+            df = pd.read_csv(csv_path, encoding='utf-8')
+            logger.info(f"Loaded CSV with {len(df)} rows from {csv_path}")
+            return self._parse_dataframe(df)
+
+        except UnicodeDecodeError:
+            # Try alternative encodings
+            for encoding in ['latin1', 'iso-8859-1', 'cp1252']:
+                try:
+                    df = pd.read_csv(csv_path, encoding=encoding)
+                    logger.info(f"Loaded CSV with {len(df)} rows using {encoding} encoding")
+                    return self._parse_dataframe(df)
+                except Exception:
+                    continue
+
+            raise ValueError(f"Could not read CSV file with any supported encoding")
+
+        except Exception as e:
+            logger.error(f"Error importing from CSV: {e}")
+            raise
+
+    def import_from_excel(self, excel_path: str, sheet_name: Optional[str] = None) -> Dict[str, Dict]:
+        """
+        Import metadata from Excel file.
+
+        Args:
+            excel_path: Path to Excel file (.xlsx, .xls)
+            sheet_name: Name of sheet to read (None = first sheet)
+
+        Returns:
+            Dictionary mapping filename stems to metadata dicts
+        """
+        try:
+            # Read Excel file
+            if sheet_name:
+                df = pd.read_excel(excel_path, sheet_name=sheet_name)
+                logger.info(f"Loaded Excel sheet '{sheet_name}' with {len(df)} rows")
+            else:
+                df = pd.read_excel(excel_path)
+                logger.info(f"Loaded Excel with {len(df)} rows from first sheet")
+
+            return self._parse_dataframe(df)
+
+        except Exception as e:
+            logger.error(f"Error importing from Excel: {e}")
+            raise
+
+    def import_from_json(self, json_path: str) -> Dict[str, Dict]:
+        """
+        Import metadata from JSON file.
+
+        Expected format:
+        {
+            "filename.pdf": {"title": "...", "subject": "...", "keywords": "..."},
+            "image.jpg": {"title": "...", "subject": "...", "keywords": "..."}
+        }
+
+        Or array format:
+        [
+            {"filename": "file.pdf", "title": "...", "subject": "...", "keywords": "..."},
+            {"filename": "image.jpg", "title": "...", "subject": "...", "keywords": "..."}
+        ]
+
+        Args:
+            json_path: Path to JSON file
+
+        Returns:
+            Dictionary mapping filename stems to metadata dicts
+        """
+        try:
+            with open(json_path, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+
+            metadata_map = {}
+
+            if isinstance(data, dict):
+                # Object format: {"filename": {metadata}}
+                for filename, metadata in data.items():
+                    filename_stem = Path(filename).stem.lower()
+                    metadata_map[filename_stem] = self._normalize_metadata(metadata)
+
+            elif isinstance(data, list):
+                # Array format: [{filename, metadata}]
+                for item in data:
+                    if not isinstance(item, dict):
+                        continue
+
+                    # Find filename field
+                    filename = None
+                    for key in ['filename', 'file', 'name', 'file_name']:
+                        if key in item:
+                            filename = item[key]
+                            break
+
+                    if not filename:
+                        logger.warning(f"Skipping item without filename: {item}")
+                        continue
+
+                    filename_stem = Path(filename).stem.lower()
+                    metadata_map[filename_stem] = self._normalize_metadata(item)
+
+            else:
+                raise ValueError("JSON must be an object or array")
+
+            logger.info(f"Loaded {len(metadata_map)} metadata records from JSON")
+            return metadata_map
+
+        except Exception as e:
+            logger.error(f"Error importing from JSON: {e}")
+            raise
+
+    def _parse_dataframe(self, df: pd.DataFrame) -> Dict[str, Dict]:
+        """
+        Parse pandas DataFrame into metadata map.
+
+        Args:
+            df: DataFrame with metadata
+
+        Returns:
+            Dictionary mapping filename stems to metadata dicts
+        """
+        metadata_map = {}
+
+        # Detect filename column (try common names)
+        filename_col = self._detect_column(df, ['filename', 'file', 'name', 'file_name', 'path'])
+
+        if not filename_col:
+            raise ValueError("Could not find filename column in data. Tried: filename, file, name, file_name, path")
+
+        # Detect metadata columns
+        title_col = self._detect_column(df, ['title', 'heading', 'name', 'document_title'])
+        subject_col = self._detect_column(df, ['subject', 'description', 'summary', 'desc', 'external_description', 'alt_text'])
+        keywords_col = self._detect_column(df, ['keywords', 'tags', 'categories', 'labels'])
+
+        logger.info(f"Detected columns - filename: {filename_col}, title: {title_col}, subject: {subject_col}, keywords: {keywords_col}")
+
+        # Parse rows
+        for _, row in df.iterrows():
+            filename = str(row.get(filename_col, '')).strip()
+            if not filename or pd.isna(filename):
+                continue
+
+            filename_stem = Path(filename).stem.lower()
+
+            metadata_map[filename_stem] = {
+                'title': self._get_value(row, title_col),
+                'subject': self._get_value(row, subject_col),
+                'keywords': self._get_value(row, keywords_col)
+            }
+
+        logger.info(f"Parsed {len(metadata_map)} metadata records from DataFrame")
+        return metadata_map
+
+    def _detect_column(self, df: pd.DataFrame, candidates: List[str]) -> Optional[str]:
+        """
+        Detect column name from a list of candidates (case-insensitive).
+
+        Args:
+            df: DataFrame to search
+            candidates: List of possible column names
+
+        Returns:
+            Actual column name if found, None otherwise
+        """
+        # Create lowercase mapping
+        col_map = {col.lower(): col for col in df.columns}
+
+        # Try each candidate
+        for candidate in candidates:
+            if candidate.lower() in col_map:
+                return col_map[candidate.lower()]
+
+        return None
+
+    def _get_value(self, row: pd.Series, column: Optional[str]) -> str:
+        """
+        Get value from row, handling None column and NaN values.
+
+        Args:
+            row: DataFrame row
+            column: Column name (can be None)
+
+        Returns:
+            String value or empty string
+        """
+        if column is None:
+            return ''
+
+        value = row.get(column, '')
+
+        if pd.isna(value):
+            return ''
+
+        return str(value).strip()
+
+    def _normalize_metadata(self, metadata: Dict) -> Dict[str, str]:
+        """
+        Normalize metadata dictionary to standard format.
+
+        Args:
+            metadata: Raw metadata dict
+
+        Returns:
+            Normalized metadata with title, subject, keywords keys
+        """
+        normalized = {
+            'title': '',
+            'subject': '',
+            'keywords': ''
+        }
+
+        # Map title
+        for key in ['title', 'heading', 'name', 'document_title']:
+            if key in metadata and metadata[key]:
+                normalized['title'] = str(metadata[key]).strip()
+                break
+
+        # Map subject/description
+        for key in ['subject', 'description', 'summary', 'desc', 'external_description', 'alt_text']:
+            if key in metadata and metadata[key]:
+                normalized['subject'] = str(metadata[key]).strip()
+                break
+
+        # Map keywords
+        for key in ['keywords', 'tags', 'categories', 'labels']:
+            if key in metadata and metadata[key]:
+                value = metadata[key]
+                # Handle arrays
+                if isinstance(value, list):
+                    normalized['keywords'] = ', '.join(str(v) for v in value)
+                else:
+                    normalized['keywords'] = str(value).strip()
+                break
+
+        return normalized
+
+    def get_metadata_for_file(self, metadata_map: Dict[str, Dict], filename: str) -> Optional[Dict[str, str]]:
+        """
+        Get metadata for a specific file from imported map.
+
+        Args:
+            metadata_map: Dictionary returned by import_* methods
+            filename: Filename to look up (with or without extension)
+
+        Returns:
+            Metadata dict if found, None otherwise
+        """
+        filename_stem = Path(filename).stem.lower()
+        return metadata_map.get(filename_stem)
+
+    def validate_import(self, metadata_map: Dict[str, Dict]) -> Dict:
+        """
+        Validate imported metadata and return statistics.
+
+        Args:
+            metadata_map: Dictionary returned by import_* methods
+
+        Returns:
+            Statistics about the import
+        """
+        stats = {
+            'total_records': len(metadata_map),
+            'with_title': 0,
+            'with_subject': 0,
+            'with_keywords': 0,
+            'empty_records': 0
+        }
+
+        for metadata in metadata_map.values():
+            if metadata.get('title'):
+                stats['with_title'] += 1
+            if metadata.get('subject'):
+                stats['with_subject'] += 1
+            if metadata.get('keywords'):
+                stats['with_keywords'] += 1
+
+            if not any([metadata.get('title'), metadata.get('subject'), metadata.get('keywords')]):
+                stats['empty_records'] += 1
+
+        return stats
+
+    def preview_file_structure(self, file_path: str, file_type: str = 'auto') -> Tuple[List[str], List[Dict], Dict]:
+        """
+        Preview file structure and suggest field mappings without importing.
+
+        Args:
+            file_path: Path to file (CSV, Excel, JSON)
+            file_type: File type ('csv', 'excel', 'json', or 'auto')
+
+        Returns:
+            Tuple of (column_names, sample_rows, suggested_mapping)
+        """
+        if file_type == 'auto':
+            ext = Path(file_path).suffix.lower()
+            if ext == '.csv':
+                file_type = 'csv'
+            elif ext in ['.xlsx', '.xls']:
+                file_type = 'excel'
+            elif ext == '.json':
+                file_type = 'json'
+            else:
+                raise ValueError(f"Unsupported file type: {ext}")
+
+        # Load file
+        if file_type == 'csv':
+            df = pd.read_csv(file_path, encoding='utf-8', nrows=10)
+        elif file_type == 'excel':
+            df = pd.read_excel(file_path, nrows=10)
+        elif file_type == 'json':
+            with open(file_path, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+                if isinstance(data, list) and len(data) > 0:
+                    df = pd.DataFrame(data[:10])
+                elif isinstance(data, dict):
+                    # Convert dict to list
+                    items = [{'filename': k, **v} for k, v in list(data.items())[:10]]
+                    df = pd.DataFrame(items)
+                else:
+                    raise ValueError("JSON format not supported for preview")
+
+        # Get column names
+        columns = df.columns.tolist()
+
+        # Get sample rows
+        sample_rows = df.head(5).to_dict('records')
+
+        # Suggest field mapping
+        mapper = FieldMapper()
+        suggestions = mapper.suggest_mapping(columns)
+
+        return (columns, sample_rows, suggestions)
+
+    def import_with_mapping(self, file_path: str, mapping: Dict[str, str], file_type: str = 'auto') -> Dict[str, Dict]:
+        """
+        Import file with custom field mapping.
+
+        Args:
+            file_path: Path to file
+            mapping: Field mapping {source_field: target_field}
+            file_type: File type ('csv', 'excel', 'json', or 'auto')
+
+        Returns:
+            Dictionary mapping filename stems to metadata dicts
+        """
+        # Load file
+        if file_type == 'auto':
+            ext = Path(file_path).suffix.lower()
+            if ext == '.csv':
+                file_type = 'csv'
+            elif ext in ['.xlsx', '.xls']:
+                file_type = 'excel'
+            elif ext == '.json':
+                file_type = 'json'
+
+        if file_type == 'csv':
+            df = pd.read_csv(file_path, encoding='utf-8')
+        elif file_type == 'excel':
+            df = pd.read_excel(file_path)
+        elif file_type == 'json':
+            with open(file_path, 'r', encoding='utf-8') as f:
+                data = json.load(f)
+                if isinstance(data, list):
+                    df = pd.DataFrame(data)
+                elif isinstance(data, dict):
+                    items = [{'filename': k, **v} for k, v in data.items()]
+                    df = pd.DataFrame(items)
+
+        # Apply field mapper
+        mapper = FieldMapper()
+        metadata_map = {}
+
+        # Find filename column
+        filename_col = None
+        for col in df.columns:
+            if col.lower() in ['filename', 'file', 'name', 'file_name']:
+                filename_col = col
+                break
+
+        if not filename_col:
+            raise ValueError("Could not find filename column")
+
+        # Process each row
+        for _, row in df.iterrows():
+            filename = str(row.get(filename_col, '')).strip()
+            if not filename or pd.isna(filename):
+                continue
+
+            filename_stem = Path(filename).stem.lower()
+
+            # Apply mapping to transform row data
+            row_dict = row.to_dict()
+            metadata = mapper.apply_mapping(row_dict, mapping)
+
+            metadata_map[filename_stem] = {
+                'title': str(metadata.get('title', '')).strip(),
+                'subject': str(metadata.get('subject', '')).strip(),
+                'keywords': str(metadata.get('keywords', '')).strip()
+            }
+
+        logger.info(f"Imported {len(metadata_map)} records with custom mapping")
+        return metadata_map
--- a/src/template_manager.py
+++ b/src/template_manager.py
@ -0,0 +1,410 @@
+"""Metadata template manager with variable substitution."""
+
+import json
+from pathlib import Path
+from typing import Dict, List, Optional
+from datetime import datetime
+from .utils import get_logger
+
+logger = get_logger(__name__)
+
+
+class TemplateManager:
+    """Manage metadata templates with variable substitution."""
+
+    # Available variables for substitution
+    AVAILABLE_VARIABLES = {
+        '{filename}': 'Original filename without extension',
+        '{date}': 'Current date (YYYY-MM-DD)',
+        '{datetime}': 'Current date and time',
+        '{user}': 'Current username',
+        '{year}': 'Current year',
+        '{month}': 'Current month',
+        '{day}': 'Current day'
+    }
+
+    def __init__(self, templates_path: Optional[str] = None):
+        """
+        Initialize template manager.
+
+        Args:
+            templates_path: Path to JSON file for storing templates
+        """
+        self.templates_path = templates_path or 'metadata_templates.json'
+
+    def create_template(
+        self,
+        name: str,
+        title_template: str,
+        subject_template: str,
+        keywords_template: str,
+        description: str = ''
+    ) -> Dict:
+        """
+        Create a new metadata template.
+
+        Args:
+            name: Template name
+            title_template: Title template with variables (e.g., "{filename} - Product Guide")
+            subject_template: Subject template with variables
+            keywords_template: Keywords template with variables
+            description: Optional description of template usage
+
+        Returns:
+            Template dictionary
+        """
+        template = {
+            'name': name,
+            'description': description,
+            'title': title_template,
+            'subject': subject_template,
+            'keywords': keywords_template,
+            'created_at': self._get_timestamp(),
+            'updated_at': self._get_timestamp()
+        }
+
+        # Validate template
+        validation = self.validate_template(template)
+        if validation['invalid']:
+            logger.warning(f"Template '{name}' has invalid variables: {validation['invalid']}")
+
+        return template
+
+    def save_template(self, template: Dict) -> bool:
+        """
+        Save template to storage.
+
+        Args:
+            template: Template dictionary
+
+        Returns:
+            True if successful
+        """
+        try:
+            templates = self._load_templates()
+            template['updated_at'] = self._get_timestamp()
+            templates[template['name']] = template
+
+            with open(self.templates_path, 'w', encoding='utf-8') as f:
+                json.dump(templates, f, indent=2, ensure_ascii=False)
+
+            logger.info(f"Saved template: {template['name']}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to save template '{template['name']}': {e}")
+            return False
+
+    def load_template(self, name: str) -> Optional[Dict]:
+        """
+        Load template by name.
+
+        Args:
+            name: Template name
+
+        Returns:
+            Template dictionary or None if not found
+        """
+        templates = self._load_templates()
+        template = templates.get(name)
+
+        if template:
+            logger.info(f"Loaded template: {name}")
+        else:
+            logger.warning(f"Template not found: {name}")
+
+        return template
+
+    def list_templates(self) -> List[Dict]:
+        """
+        List all available templates.
+
+        Returns:
+            List of template summaries
+        """
+        templates = self._load_templates()
+
+        return [
+            {
+                'name': name,
+                'description': data.get('description', ''),
+                'created_at': data.get('created_at', ''),
+                'updated_at': data.get('updated_at', ''),
+                'variables_used': self._extract_variables(data)
+            }
+            for name, data in templates.items()
+        ]
+
+    def delete_template(self, name: str) -> bool:
+        """
+        Delete a template.
+
+        Args:
+            name: Template name
+
+        Returns:
+            True if deleted, False if not found
+        """
+        templates = self._load_templates()
+
+        if name in templates:
+            del templates[name]
+
+            try:
+                with open(self.templates_path, 'w', encoding='utf-8') as f:
+                    json.dump(templates, f, indent=2, ensure_ascii=False)
+
+                logger.info(f"Deleted template: {name}")
+                return True
+            except Exception as e:
+                logger.error(f"Failed to delete template '{name}': {e}")
+                return False
+
+        logger.warning(f"Template not found: {name}")
+        return False
+
+    def apply_template(
+        self,
+        template: Dict,
+        filename: str,
+        user: str = 'Unknown',
+        custom_vars: Optional[Dict[str, str]] = None
+    ) -> Dict[str, str]:
+        """
+        Apply template to generate metadata for a file.
+
+        Args:
+            template: Template dictionary
+            filename: Filename to process
+            user: Username for {user} variable
+            custom_vars: Additional custom variables (e.g., {'product_line': 'Dental'})
+
+        Returns:
+            Dictionary with title, subject, keywords
+        """
+        # Build variable substitution map
+        variables = self._build_variable_map(filename, user, custom_vars)
+
+        # Apply substitutions
+        metadata = {
+            'title': self._substitute_variables(template.get('title', ''), variables),
+            'subject': self._substitute_variables(template.get('subject', ''), variables),
+            'keywords': self._substitute_variables(template.get('keywords', ''), variables)
+        }
+
+        logger.info(f"Applied template '{template['name']}' to {filename}")
+        return metadata
+
+    def validate_template(self, template: Dict) -> Dict[str, List[str]]:
+        """
+        Validate template for correct variable usage.
+
+        Args:
+            template: Template dictionary
+
+        Returns:
+            Dictionary with 'valid' and 'invalid' variable lists
+        """
+        result = {
+            'valid': [],
+            'invalid': []
+        }
+
+        # Extract all variables from template
+        all_text = (
+            template.get('title', '') +
+            template.get('subject', '') +
+            template.get('keywords', '')
+        )
+
+        # Find all {variable} patterns
+        import re
+        variables = re.findall(r'\{[^}]+\}', all_text)
+
+        for var in variables:
+            if var in self.AVAILABLE_VARIABLES:
+                if var not in result['valid']:
+                    result['valid'].append(var)
+            else:
+                if var not in result['invalid']:
+                    result['invalid'].append(var)
+
+        return result
+
+    def _load_templates(self) -> Dict:
+        """Load all templates from file."""
+        if Path(self.templates_path).exists():
+            try:
+                with open(self.templates_path, 'r', encoding='utf-8') as f:
+                    return json.load(f)
+            except Exception as e:
+                logger.error(f"Failed to load templates: {e}")
+                return {}
+        return {}
+
+    def _get_timestamp(self) -> str:
+        """Get current timestamp as ISO format string."""
+        return datetime.now().isoformat()
+
+    def _build_variable_map(
+        self,
+        filename: str,
+        user: str,
+        custom_vars: Optional[Dict[str, str]]
+    ) -> Dict[str, str]:
+        """
+        Build variable substitution map.
+
+        Args:
+            filename: Filename (with or without extension)
+            user: Username
+            custom_vars: Custom variables
+
+        Returns:
+            Dictionary mapping variable names to values
+        """
+        # Get filename without extension
+        filename_stem = Path(filename).stem
+
+        # Current date/time
+        now = datetime.now()
+
+        variables = {
+            '{filename}': filename_stem,
+            '{date}': now.strftime('%Y-%m-%d'),
+            '{datetime}': now.strftime('%Y-%m-%d %H:%M:%S'),
+            '{user}': user,
+            '{year}': str(now.year),
+            '{month}': now.strftime('%m'),
+            '{day}': now.strftime('%d')
+        }
+
+        # Add custom variables
+        if custom_vars:
+            for key, value in custom_vars.items():
+                # Ensure custom variables are wrapped in {}
+                var_key = f'{{{key}}}' if not key.startswith('{') else key
+                variables[var_key] = value
+
+        return variables
+
+    def _substitute_variables(self, template_text: str, variables: Dict[str, str]) -> str:
+        """
+        Substitute variables in template text.
+
+        Args:
+            template_text: Text with {variable} placeholders
+            variables: Variable substitution map
+
+        Returns:
+            Text with variables replaced
+        """
+        result = template_text
+
+        for var, value in variables.items():
+            result = result.replace(var, value)
+
+        return result
+
+    def _extract_variables(self, template: Dict) -> List[str]:
+        """
+        Extract all variables used in a template.
+
+        Args:
+            template: Template dictionary
+
+        Returns:
+            List of variable names (e.g., ['{filename}', '{date}'])
+        """
+        import re
+        all_text = (
+            template.get('title', '') +
+            template.get('subject', '') +
+            template.get('keywords', '')
+        )
+
+        variables = re.findall(r'\{[^}]+\}', all_text)
+        return list(set(variables))
+
+    def get_available_variables(self) -> Dict[str, str]:
+        """
+        Get list of available variables with descriptions.
+
+        Returns:
+            Dictionary mapping variable names to descriptions
+        """
+        return self.AVAILABLE_VARIABLES.copy()
+
+    def preview_template(
+        self,
+        template: Dict,
+        sample_filename: str = 'example.pdf',
+        user: str = 'User',
+        custom_vars: Optional[Dict[str, str]] = None
+    ) -> Dict[str, str]:
+        """
+        Preview template output with sample data.
+
+        Args:
+            template: Template dictionary
+            sample_filename: Sample filename for preview
+            user: Sample username
+            custom_vars: Sample custom variables
+
+        Returns:
+            Preview metadata
+        """
+        return self.apply_template(template, sample_filename, user, custom_vars)
+
+    def export_template(self, name: str, export_path: str) -> bool:
+        """
+        Export single template to JSON file.
+
+        Args:
+            name: Template name
+            export_path: Path to save template
+
+        Returns:
+            True if successful
+        """
+        template = self.load_template(name)
+        if not template:
+            return False
+
+        try:
+            with open(export_path, 'w', encoding='utf-8') as f:
+                json.dump(template, f, indent=2, ensure_ascii=False)
+
+            logger.info(f"Exported template '{name}' to {export_path}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to export template '{name}': {e}")
+            return False
+
+    def import_template(self, import_path: str) -> Optional[Dict]:
+        """
+        Import template from JSON file.
+
+        Args:
+            import_path: Path to template JSON file
+
+        Returns:
+            Imported template dictionary or None
+        """
+        try:
+            with open(import_path, 'r', encoding='utf-8') as f:
+                template = json.load(f)
+
+            # Validate required fields
+            required_fields = ['name', 'title', 'subject', 'keywords']
+            if not all(field in template for field in required_fields):
+                logger.error(f"Invalid template file: missing required fields")
+                return None
+
+            logger.info(f"Imported template from {import_path}")
+            return template
+
+        except Exception as e:
+            logger.error(f"Failed to import template: {e}")
+            return None
--- a/src/updaters/init.py
+++ b/src/updaters/init.py
@ -0,0 +1 @@
+"""Metadata updaters for different file types."""
--- a/src/updaters/exiftool_updater.py
+++ b/src/updaters/exiftool_updater.py
@ -0,0 +1,223 @@
+"""Unified metadata updater using ExifTool for images, video, and PDF files."""
+
+from typing import Dict
+from pathlib import Path
+import logging
+
+try:
+    from exiftool import ExifToolHelper
+    EXIFTOOL_AVAILABLE = True
+except ImportError:
+    EXIFTOOL_AVAILABLE = False
+
+from ..base_updater import BaseUpdater
+from ..utils import get_logger, create_backup
+
+logger = get_logger(__name__)
+
+
+class ExifToolUpdater(BaseUpdater):
+    """
+    Update metadata using ExifTool.
+
+    Supports images (JPEG, PNG, GIF, TIFF, HEIC, RAW),
+    videos (MP4, MOV, AVI, MKV), and PDF files.
+
+    Provides a unified API for metadata updates across all supported formats.
+    """
+
+    def __init__(self):
+        """Initialize ExifTool updater."""
+        if not EXIFTOOL_AVAILABLE:
+            raise ImportError(
+                "PyExifTool not installed. Install with: pip install PyExifTool>=0.5.6\n"
+                "Also ensure ExifTool is installed on your system."
+            )
+
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update file metadata using ExifTool.
+
+        Writes title, subject, and keywords to appropriate metadata fields
+        based on file type (images use EXIF/IPTC/XMP, PDFs use PDF fields, etc.).
+
+        Args:
+            file_path: Path to the file
+            metadata: Dictionary with 'title', 'subject', 'keywords' keys
+            backup: Whether to create backup before updating (default: True)
+
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            # Validate metadata
+            if not self.validate_metadata(metadata):
+                logger.error(f"Invalid metadata for {file_path}")
+                return False
+
+            # Create backup if requested
+            if backup:
+                backup_path = create_backup(file_path)
+                if not backup_path:
+                    logger.warning(f"Failed to create backup for {file_path}, proceeding anyway")
+
+            # Build ExifTool tags dict
+            updates = {}
+
+            # Determine file type and set appropriate tags
+            file_ext = Path(file_path).suffix.lower()
+
+            if self._is_image(file_ext):
+                updates = self._build_image_tags(metadata)
+            elif self._is_video(file_ext):
+                updates = self._build_video_tags(metadata)
+            elif self._is_pdf(file_ext):
+                updates = self._build_pdf_tags(metadata)
+            else:
+                logger.warning(f"Unknown file type {file_ext}, trying generic metadata tags")
+                updates = self._build_generic_tags(metadata)
+
+            # Apply updates using ExifTool
+            if not updates:
+                logger.warning(f"No metadata tags to update for {file_path}")
+                return True
+
+            with ExifToolHelper() as et:
+                et.set_tags(
+                    [file_path],
+                    tags=updates,
+                    params=["-overwrite_original", "-P"]  # Preserve file modification date
+                )
+
+            logger.info(f"Successfully updated metadata for {Path(file_path).name}")
+
+            # Verify the update
+            if self.verify_update(file_path, metadata):
+                logger.info(f"Metadata verification passed for {Path(file_path).name}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {Path(file_path).name}, but update succeeded")
+                return True  # Still return True as update itself worked
+
+        except Exception as e:
+            logger.error(f"Failed to update metadata for {file_path}: {e}")
+            return False
+
+    def verify_update(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify that metadata was successfully written to the file.
+
+        Args:
+            file_path: Path to the file
+            expected_metadata: Metadata that was supposed to be written
+
+        Returns:
+            True if verification passes, False otherwise
+        """
+        try:
+            from .exiftool_extractor import ExifToolExtractor
+            extractor = ExifToolExtractor()
+            actual_metadata = extractor.read_metadata(file_path)
+
+            # Check each field (allow partial matches for verification)
+            for key in ['title', 'subject', 'keywords']:
+                expected = expected_metadata.get(key, '').strip()
+                actual = actual_metadata.get(key, '').strip()
+
+                if expected and expected not in actual:
+                    logger.warning(f"Verification mismatch for {key}: expected '{expected}', got '{actual}'")
+                    return False
+
+            return True
+
+        except Exception as e:
+            logger.error(f"Verification failed for {file_path}: {e}")
+            return False
+
+    def _is_image(self, ext: str) -> bool:
+        """Check if file extension is an image format."""
+        image_exts = {'.jpg', '.jpeg', '.png', '.gif', '.tif', '.tiff', '.bmp', '.webp', '.heic', '.heif'}
+        return ext in image_exts
+
+    def _is_video(self, ext: str) -> bool:
+        """Check if file extension is a video format."""
+        video_exts = {'.mp4', '.mov', '.avi', '.mkv', '.m4v', '.wmv', '.flv', '.webm'}
+        return ext in video_exts
+
+    def _is_pdf(self, ext: str) -> bool:
+        """Check if file extension is PDF."""
+        return ext == '.pdf'
+
+    def _build_image_tags(self, metadata: Dict[str, str]) -> Dict[str, str]:
+        """
+        Build ExifTool tags for image files.
+
+        Uses EXIF, IPTC, and XMP tags for maximum compatibility.
+        """
+        tags = {}
+
+        if metadata.get('title'):
+            tags['EXIF:ImageDescription'] = metadata['title']
+            tags['IPTC:Headline'] = metadata['title']
+            tags['XMP:Title'] = metadata['title']
+
+        if metadata.get('subject'):
+            tags['EXIF:XPSubject'] = metadata['subject']
+            tags['IPTC:Caption-Abstract'] = metadata['subject']
+            tags['XMP:Description'] = metadata['subject']
+
+        if metadata.get('keywords'):
+            tags['EXIF:XPKeywords'] = metadata['keywords']
+            tags['IPTC:Keywords'] = metadata['keywords']
+            tags['XMP:Subject'] = metadata['keywords']
+
+        return tags
+
+    def _build_video_tags(self, metadata: Dict[str, str]) -> Dict[str, str]:
+        """Build ExifTool tags for video files."""
+        tags = {}
+
+        if metadata.get('title'):
+            tags['QuickTime:Title'] = metadata['title']
+            tags['UserData:Title'] = metadata['title']
+
+        if metadata.get('subject'):
+            tags['QuickTime:Description'] = metadata['subject']
+            tags['UserData:Description'] = metadata['subject']
+
+        if metadata.get('keywords'):
+            tags['QuickTime:Keywords'] = metadata['keywords']
+
+        return tags
+
+    def _build_pdf_tags(self, metadata: Dict[str, str]) -> Dict[str, str]:
+        """Build ExifTool tags for PDF files."""
+        tags = {}
+
+        if metadata.get('title'):
+            tags['PDF:Title'] = metadata['title']
+
+        if metadata.get('subject'):
+            tags['PDF:Subject'] = metadata['subject']
+
+        if metadata.get('keywords'):
+            tags['PDF:Keywords'] = metadata['keywords']
+
+        return tags
+
+    def _build_generic_tags(self, metadata: Dict[str, str]) -> Dict[str, str]:
+        """Build generic metadata tags for unknown file types."""
+        tags = {}
+
+        # Try common tags that might work
+        if metadata.get('title'):
+            tags['Title'] = metadata['title']
+
+        if metadata.get('subject'):
+            tags['Description'] = metadata['subject']
+            tags['Subject'] = metadata['subject']
+
+        if metadata.get('keywords'):
+            tags['Keywords'] = metadata['keywords']
+
+        return tags
--- a/src/updaters/image_updater.py
+++ b/src/updaters/image_updater.py
@ -0,0 +1,221 @@
+"""Image metadata updater."""
+
+import piexif
+from PIL import Image
+from PIL.PngImagePlugin import PngInfo
+from typing import Dict
+from pathlib import Path
+
+from ..base_updater import BaseUpdater
+from ..utils import get_logger, create_backup, sanitize_metadata_value
+
+logger = get_logger(__name__)
+
+
+class ImageUpdater(BaseUpdater):
+    """Updater for image file metadata (JPEG, PNG)."""
+
+    SUPPORTED_FORMATS = ['jpg', 'jpeg', 'png', 'gif', 'bmp']
+
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update image metadata using EXIF for JPEG and PIL for PNG.
+
+        Args:
+            file_path: Path to the image file
+            metadata: Dictionary with 'title', 'subject', 'keywords' keys
+            backup: Whether to create backup before updating
+
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            # Validate metadata
+            if not self.validate_metadata(metadata):
+                logger.error(f"Invalid metadata for {file_path}")
+                return False
+
+            # Check file format
+            file_ext = file_path.lower().split('.')[-1]
+            if file_ext not in self.SUPPORTED_FORMATS:
+                logger.error(f"Unsupported image format: {file_ext}")
+                return False
+
+            # Create backup if requested
+            if backup:
+                backup_path = create_backup(file_path)
+                if not backup_path:
+                    logger.warning(f"Failed to create backup for {file_path}, proceeding anyway")
+
+            # Route to appropriate update method
+            if file_ext in ['jpg', 'jpeg']:
+                success = self._update_jpeg_metadata(file_path, metadata)
+            elif file_ext == 'png':
+                success = self._update_png_metadata(file_path, metadata)
+            else:
+                # For GIF, BMP and other formats - skip metadata update
+                # These formats don't support metadata in the same way
+                logger.warning(f"Metadata update not supported for {file_ext} format")
+                return True  # Return success to not block the workflow
+
+            if success:
+                logger.info(f"Successfully updated metadata for {file_path}")
+            else:
+                logger.error(f"Failed to update metadata for {file_path}")
+
+            return success
+
+        except Exception as e:
+            logger.error(f"Failed to update image metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def _update_jpeg_metadata(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """
+        Update JPEG metadata using EXIF.
+
+        Args:
+            file_path: Path to JPEG file
+            metadata: Metadata dictionary
+
+        Returns:
+            True if successful
+        """
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Read existing EXIF
+            try:
+                exif_dict = piexif.load(file_path)
+            except (piexif.InvalidImageDataError, FileNotFoundError):
+                exif_dict = {"0th": {}, "Exif": {}, "GPS": {}, "1st": {}}
+
+            # Update metadata fields
+            exif_dict["0th"][piexif.ImageIFD.ImageDescription] = title.encode('utf-8')
+            exif_dict["0th"][piexif.ImageIFD.XPSubject] = subject.encode('utf-8')
+            exif_dict["0th"][piexif.ImageIFD.XPKeywords] = keywords.encode('utf-8')
+
+            # Encode EXIF data
+            exif_bytes = piexif.dump(exif_dict)
+
+            # Open image and save with new EXIF
+            image = Image.open(file_path)
+            image.save(file_path, exif=exif_bytes)
+
+            logger.debug(f"Updated JPEG metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update JPEG metadata: {e}", exc_info=True)
+            return False
+
+    def _update_png_metadata(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """
+        Update PNG metadata using PIL.
+
+        Args:
+            file_path: Path to PNG file
+            metadata: Metadata dictionary
+
+        Returns:
+            True if successful
+        """
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Open image
+            image = Image.open(file_path)
+
+            # Create metadata dictionary
+            pnginfo = PngInfo()
+            pnginfo.add_text("Title", title)
+            pnginfo.add_text("Subject", subject)
+            pnginfo.add_text("Keywords", keywords)
+
+            # Save image with new metadata
+            image.save(file_path, pnginfo=pnginfo)
+
+            logger.debug(f"Updated PNG metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update PNG metadata: {e}", exc_info=True)
+            return False
+
+    def verify_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify that metadata was written correctly to image.
+
+        Args:
+            file_path: Path to the image file
+            expected_metadata: Expected metadata values
+
+        Returns:
+            True if metadata matches expected values, False otherwise
+        """
+        try:
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext in ['jpg', 'jpeg']:
+                return self._verify_jpeg_metadata(file_path, expected_metadata)
+            else:
+                return self._verify_png_metadata(file_path, expected_metadata)
+
+        except Exception as e:
+            logger.error(f"Failed to verify image metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def _verify_jpeg_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """Verify JPEG metadata."""
+        try:
+            exif_dict = piexif.load(file_path)
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            # Check fields
+            actual_title = exif_dict["0th"].get(piexif.ImageIFD.ImageDescription, b"").decode('utf-8', errors='ignore')
+            actual_subject = exif_dict["0th"].get(piexif.ImageIFD.XPSubject, b"").decode('utf-8', errors='ignore')
+            actual_keywords = exif_dict["0th"].get(piexif.ImageIFD.XPKeywords, b"").decode('utf-8', errors='ignore')
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                return False
+
+        except Exception as e:
+            logger.debug(f"JPEG metadata verification failed: {e}")
+            return False
+
+    def _verify_png_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """Verify PNG metadata."""
+        try:
+            image = Image.open(file_path)
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            # Check metadata
+            actual_title = image.info.get('Title', '').strip()
+            actual_subject = image.info.get('Subject', '').strip()
+            actual_keywords = image.info.get('Keywords', '').strip()
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                return False
+
+        except Exception as e:
+            logger.debug(f"PNG metadata verification failed: {e}")
+            return False
--- a/src/updaters/office_updater.py
+++ b/src/updaters/office_updater.py
@ -0,0 +1,253 @@
+"""Office document metadata updater."""
+
+from docx import Document as DocxDocument
+from openpyxl import load_workbook
+from pptx import Presentation
+from typing import Dict
+
+from ..base_updater import BaseUpdater
+from ..utils import get_logger, create_backup, sanitize_metadata_value
+
+logger = get_logger(__name__)
+
+
+class OfficeUpdater(BaseUpdater):
+    """Updater for Office file metadata (DOCX, XLSX, PPTX)."""
+
+    SUPPORTED_FORMATS = ['docx', 'xlsx', 'pptx']
+
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update Office document metadata.
+
+        Updates core properties (title, subject, keywords) for DOCX, XLSX, and PPTX files.
+
+        Args:
+            file_path: Path to the Office file
+            metadata: Dictionary with 'title', 'subject', 'keywords' keys
+            backup: Whether to create backup before updating
+
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            # Validate metadata
+            if not self.validate_metadata(metadata):
+                logger.error(f"Invalid metadata for {file_path}")
+                return False
+
+            # Check file format
+            file_ext = file_path.lower().split('.')[-1]
+            if file_ext not in self.SUPPORTED_FORMATS:
+                logger.error(f"Unsupported Office format: {file_ext}")
+                return False
+
+            # Create backup if requested
+            if backup:
+                backup_path = create_backup(file_path)
+                if not backup_path:
+                    logger.warning(f"Failed to create backup for {file_path}, proceeding anyway")
+
+            # Route to appropriate update method
+            if file_ext == 'docx':
+                success = self._update_docx_metadata(file_path, metadata)
+            elif file_ext == 'xlsx':
+                success = self._update_xlsx_metadata(file_path, metadata)
+            elif file_ext == 'pptx':
+                success = self._update_pptx_metadata(file_path, metadata)
+            else:
+                return False
+
+            if success:
+                logger.info(f"Successfully updated metadata for {file_path}")
+            else:
+                logger.error(f"Failed to update metadata for {file_path}")
+
+            return success
+
+        except Exception as e:
+            logger.error(f"Failed to update Office metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def _update_docx_metadata(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """Update DOCX metadata."""
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Open document
+            doc = DocxDocument(file_path)
+            core_props = doc.core_properties
+
+            # Update properties
+            core_props.title = title
+            core_props.subject = subject
+            core_props.keywords = keywords
+
+            # Save document
+            doc.save(file_path)
+
+            logger.debug(f"Updated DOCX metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update DOCX metadata: {e}", exc_info=True)
+            return False
+
+    def _update_xlsx_metadata(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """Update XLSX metadata."""
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Open workbook
+            workbook = load_workbook(file_path)
+            props = workbook.properties
+
+            # Update properties
+            props.title = title
+            props.subject = subject
+            props.keywords = keywords
+
+            # Save workbook
+            workbook.save(file_path)
+
+            logger.debug(f"Updated XLSX metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update XLSX metadata: {e}", exc_info=True)
+            return False
+
+    def _update_pptx_metadata(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """Update PPTX metadata."""
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Open presentation
+            presentation = Presentation(file_path)
+            core_props = presentation.core_properties
+
+            # Update properties
+            core_props.title = title
+            core_props.subject = subject
+            core_props.keywords = keywords
+
+            # Save presentation
+            presentation.save(file_path)
+
+            logger.debug(f"Updated PPTX metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update PPTX metadata: {e}", exc_info=True)
+            return False
+
+    def verify_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify that metadata was written correctly to Office document.
+
+        Args:
+            file_path: Path to the Office file
+            expected_metadata: Expected metadata values
+
+        Returns:
+            True if metadata matches expected values, False otherwise
+        """
+        try:
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext == 'docx':
+                return self._verify_docx_metadata(file_path, expected_metadata)
+            elif file_ext == 'xlsx':
+                return self._verify_xlsx_metadata(file_path, expected_metadata)
+            elif file_ext == 'pptx':
+                return self._verify_pptx_metadata(file_path, expected_metadata)
+            else:
+                return False
+
+        except Exception as e:
+            logger.error(f"Failed to verify Office metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def _verify_docx_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """Verify DOCX metadata."""
+        try:
+            doc = DocxDocument(file_path)
+            core_props = doc.core_properties
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            actual_title = (core_props.title or '').strip()
+            actual_subject = (core_props.subject or '').strip()
+            actual_keywords = (core_props.keywords or '').strip()
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                return False
+
+        except Exception as e:
+            logger.debug(f"DOCX metadata verification failed: {e}")
+            return False
+
+    def _verify_xlsx_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """Verify XLSX metadata."""
+        try:
+            workbook = load_workbook(file_path)
+            props = workbook.properties
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            actual_title = (props.title or '').strip()
+            actual_subject = (props.subject or '').strip()
+            actual_keywords = (props.keywords or '').strip()
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                return False
+
+        except Exception as e:
+            logger.debug(f"XLSX metadata verification failed: {e}")
+            return False
+
+    def _verify_pptx_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """Verify PPTX metadata."""
+        try:
+            presentation = Presentation(file_path)
+            core_props = presentation.core_properties
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            actual_title = (core_props.title or '').strip()
+            actual_subject = (core_props.subject or '').strip()
+            actual_keywords = (core_props.keywords or '').strip()
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                return False
+
+        except Exception as e:
+            logger.debug(f"PPTX metadata verification failed: {e}")
+            return False
--- a/src/updaters/pdf_updater.py
+++ b/src/updaters/pdf_updater.py
@ -0,0 +1,132 @@
+"""PDF metadata updater."""
+
+import pypdf
+from typing import Dict
+from pathlib import Path
+
+from ..base_updater import BaseUpdater
+from ..utils import get_logger, create_backup, sanitize_metadata_value
+
+logger = get_logger(__name__)
+
+
+class PDFUpdater(BaseUpdater):
+    """Updater for PDF file metadata."""
+
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update PDF metadata fields.
+
+        Updates /Title, /Subject, /Keywords fields in the PDF document information dictionary.
+
+        Args:
+            file_path: Path to the PDF file
+            metadata: Dictionary with 'title', 'subject', 'keywords' keys
+            backup: Whether to create backup before updating
+
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            # Validate metadata
+            if not self.validate_metadata(metadata):
+                logger.error(f"Invalid metadata for {file_path}")
+                return False
+
+            # Create backup if requested
+            if backup:
+                backup_path = create_backup(file_path)
+                if not backup_path:
+                    logger.warning(f"Failed to create backup for {file_path}, proceeding anyway")
+
+            # Sanitize metadata values
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Read existing PDF
+            with open(file_path, 'rb') as f:
+                pdf_reader = pypdf.PdfReader(f)
+                pdf_writer = pypdf.PdfWriter()
+
+                # Copy all pages
+                for page in pdf_reader.pages:
+                    pdf_writer.add_page(page)
+
+                # Update metadata
+                pdf_writer.add_metadata({
+                    '/Title': title,
+                    '/Subject': subject,
+                    '/Keywords': keywords,
+                })
+
+            # Write updated PDF
+            with open(file_path, 'wb') as f:
+                pdf_writer.write(f)
+
+            logger.info(f"Successfully updated metadata for {file_path}")
+            logger.debug(f"Updated fields - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update PDF metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def verify_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify that metadata was written correctly to PDF.
+
+        Checks if the written metadata matches the expected values.
+
+        Args:
+            file_path: Path to the PDF file
+            expected_metadata: Expected metadata values
+
+        Returns:
+            True if metadata matches expected values, False otherwise
+        """
+        try:
+            # Read the updated PDF
+            with open(file_path, 'rb') as f:
+                pdf_reader = pypdf.PdfReader(f)
+                doc_info = pdf_reader.metadata
+
+                if not doc_info:
+                    logger.warning(f"No metadata found in {file_path}")
+                    return False
+
+                # Check each expected field
+                expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+                expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+                expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+                # Get actual values and handle bytes
+                actual_title = doc_info.get('/Title')
+                if isinstance(actual_title, bytes):
+                    actual_title = actual_title.decode('utf-8', errors='ignore')
+                actual_title = str(actual_title).strip() if actual_title else ""
+
+                actual_subject = doc_info.get('/Subject')
+                if isinstance(actual_subject, bytes):
+                    actual_subject = actual_subject.decode('utf-8', errors='ignore')
+                actual_subject = str(actual_subject).strip() if actual_subject else ""
+
+                actual_keywords = doc_info.get('/Keywords')
+                if isinstance(actual_keywords, bytes):
+                    actual_keywords = actual_keywords.decode('utf-8', errors='ignore')
+                actual_keywords = str(actual_keywords).strip() if actual_keywords else ""
+
+                # Compare
+                if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                    logger.info(f"Metadata verification successful for {file_path}")
+                    return True
+                else:
+                    logger.warning(f"Metadata verification failed for {file_path}")
+                    logger.debug(f"Expected - Title: {expected_title}, Subject: {expected_subject}, Keywords: {expected_keywords}")
+                    logger.debug(f"Actual - Title: {actual_title}, Subject: {actual_subject}, Keywords: {actual_keywords}")
+                    return False
+
+        except Exception as e:
+            logger.error(f"Failed to verify PDF metadata for {file_path}: {e}", exc_info=True)
+            return False
--- a/src/updaters/video_updater.py
+++ b/src/updaters/video_updater.py
@ -0,0 +1,185 @@
+"""Video metadata updater."""
+
+from typing import Dict
+
+from ..base_updater import BaseUpdater
+from ..utils import get_logger, create_backup, sanitize_metadata_value
+
+logger = get_logger(__name__)
+
+
+class VideoUpdater(BaseUpdater):
+    """Updater for video file metadata (MP4, MOV, AVI)."""
+
+    SUPPORTED_FORMATS = ['mp4', 'mov', 'avi', 'mkv', 'flv', 'wmv', 'webm']
+
+    def update_metadata(self, file_path: str, metadata: Dict[str, str], backup: bool = True) -> bool:
+        """
+        Update video metadata using mutagen.
+
+        Args:
+            file_path: Path to the video file
+            metadata: Dictionary with 'title', 'subject', 'keywords' keys
+            backup: Whether to create backup before updating
+
+        Returns:
+            True if successful, False otherwise
+        """
+        try:
+            # Validate metadata
+            if not self.validate_metadata(metadata):
+                logger.error(f"Invalid metadata for {file_path}")
+                return False
+
+            # Check file format
+            file_ext = file_path.lower().split('.')[-1]
+            if file_ext not in self.SUPPORTED_FORMATS:
+                logger.error(f"Unsupported video format: {file_ext}")
+                return False
+
+            # Create backup if requested
+            if backup:
+                backup_path = create_backup(file_path)
+                if not backup_path:
+                    logger.warning(f"Failed to create backup for {file_path}, proceeding anyway")
+
+            # Update using mutagen
+            success = self._update_with_mutagen(file_path, metadata)
+
+            if success:
+                logger.info(f"Successfully updated metadata for {file_path}")
+            else:
+                logger.error(f"Failed to update metadata for {file_path}")
+
+            return success
+
+        except Exception as e:
+            logger.error(f"Failed to update video metadata for {file_path}: {e}", exc_info=True)
+            return False
+
+    def _update_with_mutagen(self, file_path: str, metadata: Dict[str, str]) -> bool:
+        """
+        Update video metadata using mutagen.
+
+        Args:
+            file_path: Path to video file
+            metadata: Metadata dictionary
+
+        Returns:
+            True if successful
+        """
+        try:
+            from mutagen import File
+        except ImportError:
+            logger.error("mutagen not installed, cannot update video metadata")
+            return False
+
+        try:
+            # Sanitize metadata
+            title = sanitize_metadata_value(metadata.get('title', ''), max_length=200)
+            subject = sanitize_metadata_value(metadata.get('subject', ''), max_length=300)
+            keywords = sanitize_metadata_value(metadata.get('keywords', ''), max_length=500)
+
+            # Open audio file
+            audio = File(file_path)
+
+            if audio is None:
+                logger.warning(f"mutagen could not identify file format: {file_path}")
+                return False
+
+            # Update tags based on file format
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext == 'mp4':
+                # MP4 uses specific atom names
+                audio['\xa9nam'] = title
+                audio['\xa9cmt'] = subject
+                if 'TXXX:Keywords' not in audio:
+                    audio['TXXX:Keywords'] = keywords
+            elif file_ext == 'mov':
+                # MOV is similar to MP4
+                audio['\xa9nam'] = title
+                audio['\xa9cmt'] = subject
+                if 'TXXX:Keywords' not in audio:
+                    audio['TXXX:Keywords'] = keywords
+            else:
+                # For other formats (AVI, MKV, etc.), use generic ID3/Vorbis tags
+                if hasattr(audio, 'add'):
+                    # ID3v2 style
+                    audio.add_tags()
+                    audio['TIT2'] = title
+                    audio['TXXX:Subject'] = subject
+                    audio['TXXX:Keywords'] = keywords
+                else:
+                    # Vorbis Comment style
+                    audio['title'] = title
+                    audio['subject'] = subject
+                    audio['keywords'] = keywords
+
+            # Save file
+            audio.save()
+
+            logger.debug(f"Updated video metadata - Title: {title}, Subject: {subject}, Keywords: {keywords}")
+            return True
+
+        except Exception as e:
+            logger.error(f"Failed to update video metadata with mutagen: {e}", exc_info=True)
+            return False
+
+    def verify_metadata(self, file_path: str, expected_metadata: Dict[str, str]) -> bool:
+        """
+        Verify that metadata was written correctly to video.
+
+        Args:
+            file_path: Path to the video file
+            expected_metadata: Expected metadata values
+
+        Returns:
+            True if metadata matches expected values, False otherwise
+        """
+        try:
+            from mutagen import File
+        except ImportError:
+            logger.error("mutagen not installed, cannot verify video metadata")
+            return False
+
+        try:
+            audio = File(file_path)
+
+            if audio is None:
+                logger.warning(f"Could not read file for verification: {file_path}")
+                return False
+
+            expected_title = sanitize_metadata_value(expected_metadata.get('title', ''), max_length=200)
+            expected_subject = sanitize_metadata_value(expected_metadata.get('subject', ''), max_length=300)
+            expected_keywords = sanitize_metadata_value(expected_metadata.get('keywords', ''), max_length=500)
+
+            # Get actual values
+            file_ext = file_path.lower().split('.')[-1]
+
+            if file_ext in ['mp4', 'mov']:
+                actual_title = audio.get('\xa9nam', [''])[0] if '\xa9nam' in audio else ""
+                actual_subject = audio.get('\xa9cmt', [''])[0] if '\xa9cmt' in audio else ""
+                actual_keywords = audio.get('TXXX:Keywords', [''])[0] if 'TXXX:Keywords' in audio else ""
+            else:
+                actual_title = audio.get('TIT2', [''])[0] if 'TIT2' in audio else audio.get('title', [''])[0] if 'title' in audio else ""
+                actual_subject = audio.get('TXXX:Subject', [''])[0] if 'TXXX:Subject' in audio else audio.get('subject', [''])[0] if 'subject' in audio else ""
+                actual_keywords = audio.get('TXXX:Keywords', [''])[0] if 'TXXX:Keywords' in audio else audio.get('keywords', [''])[0] if 'keywords' in audio else ""
+
+            # Normalize strings
+            actual_title = str(actual_title).strip() if actual_title else ""
+            actual_subject = str(actual_subject).strip() if actual_subject else ""
+            actual_keywords = str(actual_keywords).strip() if actual_keywords else ""
+
+            if actual_title == expected_title and actual_subject == expected_subject and actual_keywords == expected_keywords:
+                logger.info(f"Metadata verification successful for {file_path}")
+                return True
+            else:
+                logger.warning(f"Metadata verification failed for {file_path}")
+                logger.debug(f"Expected - Title: {expected_title}, Subject: {expected_subject}, Keywords: {expected_keywords}")
+                logger.debug(f"Actual - Title: {actual_title}, Subject: {actual_subject}, Keywords: {actual_keywords}")
+                return False
+
+        except Exception as e:
+            logger.error(f"Failed to verify video metadata for {file_path}: {e}", exc_info=True)
+            return False
--- a/src/utils.py
+++ b/src/utils.py
@ -0,0 +1,175 @@
+"""Utility functions for backup, logging, and file operations."""
+
+import shutil
+import logging
+from pathlib import Path
+from datetime import datetime
+from typing import Optional
+from .config import Config
+
+# Setup logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+
+def create_backup(file_path: str) -> Optional[Path]:
+    """
+    Create a backup of the file before modification.
+
+    Args:
+        file_path: Path to the file to backup
+
+    Returns:
+        Path to the backup file, or None if backup failed
+    """
+    try:
+        source = Path(file_path)
+        if not source.exists():
+            logger.error(f"File not found for backup: {file_path}")
+            return None
+
+        # Create backup filename with timestamp
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        backup_filename = f"{source.stem}_{timestamp}{source.suffix}"
+        backup_path = Config.BACKUP_DIR / backup_filename
+
+        # Ensure backup directory exists
+        Config.BACKUP_DIR.mkdir(parents=True, exist_ok=True)
+
+        # Copy file
+        shutil.copy2(source, backup_path)
+        logger.info(f"Backup created: {backup_path}")
+
+        return backup_path
+
+    except Exception as e:
+        logger.error(f"Failed to create backup for {file_path}: {e}")
+        return None
+
+def get_logger(name: str) -> logging.Logger:
+    """
+    Get a logger instance.
+
+    Args:
+        name: Logger name
+
+    Returns:
+        Logger instance
+    """
+    return logging.getLogger(name)
+
+def format_metadata_comparison(old_metadata: dict, new_metadata: dict) -> str:
+    """
+    Format metadata comparison for display.
+
+    Args:
+        old_metadata: Old metadata dictionary
+        new_metadata: New metadata dictionary
+
+    Returns:
+        Formatted comparison string
+    """
+    lines = ["\n" + "="*60]
+    lines.append("METADATA COMPARISON")
+    lines.append("="*60)
+
+    all_keys = set(old_metadata.keys()) | set(new_metadata.keys())
+
+    for key in sorted(all_keys):
+        old_value = old_metadata.get(key, "N/A")
+        new_value = new_metadata.get(key, "N/A")
+
+        lines.append(f"\n{key.upper()}:")
+        lines.append(f"  Old: {old_value}")
+        lines.append(f"  New: {new_value}")
+
+        if old_value != new_value:
+            lines.append("  [CHANGED]")
+
+    lines.append("="*60 + "\n")
+    return "\n".join(lines)
+
+def sanitize_metadata_value(value: str, max_length: int = 500) -> str:
+    """
+    Sanitize and truncate metadata value.
+
+    Args:
+        value: Metadata value
+        max_length: Maximum length
+
+    Returns:
+        Sanitized value
+    """
+    if not value:
+        return ""
+
+    # Remove control characters and excessive whitespace
+    value = ' '.join(value.split())
+
+    # Truncate if too long
+    if len(value) > max_length:
+        value = value[:max_length-3] + "..."
+
+    return value.strip()
+
+def validate_file_path(file_path: str) -> bool:
+    """
+    Validate file path exists and is accessible.
+
+    Args:
+        file_path: Path to validate
+
+    Returns:
+        True if valid
+    """
+    try:
+        path = Path(file_path)
+        return path.exists() and path.is_file()
+    except Exception:
+        return False
+
+def get_file_size_mb(file_path: str) -> float:
+    """
+    Get file size in MB.
+
+    Args:
+        file_path: Path to file
+
+    Returns:
+        File size in MB
+    """
+    try:
+        size_bytes = Path(file_path).stat().st_size
+        return size_bytes / (1024 * 1024)
+    except Exception:
+        return 0.0
+
+def create_report_entry(file_path: str, file_type: str, old_metadata: dict,
+                       new_metadata: dict, status: str) -> dict:
+    """
+    Create a report entry for CSV export.
+
+    Args:
+        file_path: Path to file
+        file_type: Type of file
+        old_metadata: Old metadata
+        new_metadata: New metadata
+        status: Processing status (success/failed)
+
+    Returns:
+        Dictionary with report data
+    """
+    return {
+        'timestamp': datetime.now().isoformat(),
+        'file_path': file_path,
+        'file_type': file_type,
+        'old_title': old_metadata.get('title', 'N/A'),
+        'new_title': new_metadata.get('title', 'N/A'),
+        'old_subject': old_metadata.get('subject', 'N/A'),
+        'new_subject': new_metadata.get('subject', 'N/A'),
+        'old_keywords': old_metadata.get('keywords', 'N/A'),
+        'new_keywords': new_metadata.get('keywords', 'N/A'),
+        'status': status
+    }
--- a/static/css/admin.css
+++ b/static/css/admin.css
@ -0,0 +1,204 @@
+/* Admin Dashboard Styles */
+
+.admin-stats {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
+    gap: 15px;
+    margin-bottom: 25px;
+}
+
+.stat-card {
+    background: white;
+    border-radius: 12px;
+    padding: 20px;
+    text-align: center;
+    box-shadow: 0 2px 8px rgba(0,0,0,0.06);
+    border: 1px solid #e5e7eb;
+}
+
+.stat-value {
+    font-size: 28px;
+    font-weight: 700;
+    color: var(--primary-gold-dark, #e6b007);
+}
+
+.stat-label {
+    font-size: 13px;
+    color: #6b7280;
+    margin-top: 5px;
+}
+
+.admin-tabs {
+    display: flex;
+    gap: 5px;
+    margin-bottom: 20px;
+    border-bottom: 2px solid #e5e7eb;
+    padding-bottom: 0;
+}
+
+.admin-tab {
+    padding: 10px 20px;
+    border: none;
+    background: none;
+    cursor: pointer;
+    font-size: 14px;
+    font-weight: 500;
+    color: #6b7280;
+    border-bottom: 2px solid transparent;
+    margin-bottom: -2px;
+    transition: all 0.2s;
+}
+
+.admin-tab:hover {
+    color: #1f2937;
+}
+
+.admin-tab.active {
+    color: var(--primary-gold-dark, #e6b007);
+    border-bottom-color: var(--primary-gold, #FFC407);
+}
+
+.admin-panel {
+    background: white;
+    border-radius: 12px;
+    padding: 20px;
+    box-shadow: 0 2px 8px rgba(0,0,0,0.06);
+    border: 1px solid #e5e7eb;
+}
+
+.panel-header {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    margin-bottom: 15px;
+}
+
+.panel-header h3 {
+    margin: 0;
+    font-size: 18px;
+    color: #1f2937;
+}
+
+.admin-table-container {
+    overflow-x: auto;
+}
+
+.admin-table {
+    width: 100%;
+    border-collapse: collapse;
+    font-size: 13px;
+}
+
+.admin-table th,
+.admin-table td {
+    padding: 10px 12px;
+    text-align: left;
+    border-bottom: 1px solid #e5e7eb;
+}
+
+.admin-table th {
+    background: #f9fafb;
+    font-weight: 600;
+    color: #374151;
+    white-space: nowrap;
+}
+
+.admin-table tr:hover {
+    background: #f9fafb;
+}
+
+.badge {
+    display: inline-block;
+    padding: 2px 8px;
+    border-radius: 10px;
+    font-size: 11px;
+    font-weight: 600;
+}
+
+.badge-admin {
+    background: #fef3c7;
+    color: #92400e;
+}
+
+.badge-user {
+    background: #dbeafe;
+    color: #1e40af;
+}
+
+.badge-active {
+    background: #d1fae5;
+    color: #065f46;
+}
+
+.badge-inactive {
+    background: #fee2e2;
+    color: #991b1b;
+}
+
+.btn-sm {
+    padding: 6px 14px;
+    font-size: 12px;
+    border-radius: 6px;
+}
+
+.btn-action {
+    padding: 4px 10px;
+    font-size: 11px;
+    border: 1px solid #d1d5db;
+    background: white;
+    border-radius: 4px;
+    cursor: pointer;
+    color: #374151;
+}
+
+.btn-action:hover {
+    background: #f3f4f6;
+}
+
+.btn-action.danger {
+    color: #dc2626;
+    border-color: #fca5a5;
+}
+
+.btn-action.danger:hover {
+    background: #fef2f2;
+}
+
+.audit-filters {
+    display: flex;
+    gap: 10px;
+    align-items: center;
+}
+
+.audit-filters select {
+    padding: 6px 10px;
+    border: 1px solid #d1d5db;
+    border-radius: 6px;
+    font-size: 13px;
+}
+
+.ai-stats-grid {
+    display: grid;
+    grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
+    gap: 12px;
+    margin-bottom: 20px;
+}
+
+.ai-stat-card {
+    background: #f9fafb;
+    border-radius: 8px;
+    padding: 15px;
+    text-align: center;
+}
+
+.ai-stat-value {
+    font-size: 22px;
+    font-weight: 600;
+    color: #1f2937;
+}
+
+.ai-stat-label {
+    font-size: 12px;
+    color: #6b7280;
+    margin-top: 3px;
+}
--- a/static/css/app.css
+++ b/static/css/app.css
@ -0,0 +1,811 @@
+        /* ========== CSS VARIABLES ========== */
+        :root {
+            /* Main colors */
+            --primary-gold: #FFC407;
+            --primary-gold-dark: #e6b007;
+            --primary-gold-light: #ffcf33;
+
+            /* Dark colors */
+            --dark-primary: #2c2c2c;
+            --dark-secondary: #1a1a1a;
+
+            /* Light colors */
+            --white: #ffffff;
+            --light-bg: #fafafa;
+            --light-bg-gradient: #f8fafc;
+
+            /* Text colors */
+            --text-primary: #1f2937;
+            --text-secondary: #374151;
+            --text-muted: #6b7280;
+
+            /* Status colors */
+            --success-green: #4ade80;
+            --error-red: #ef4444;
+
+            /* Opacity */
+            --overlay-light: rgba(255, 255, 255, 0.95);
+            --overlay-dark: rgba(0, 0, 0, 0.5);
+            --border-light: rgba(255, 255, 255, 0.2);
+            --border-subtle: rgba(0, 0, 0, 0.05);
+
+            /* Shadows */
+            --shadow-sm: 0 2px 8px rgba(0, 0, 0, 0.1);
+            --shadow-md: 0 10px 25px rgba(0, 0, 0, 0.15);
+            --shadow-lg: 0 20px 40px rgba(0, 0, 0, 0.1);
+
+            /* Radius */
+            --radius-sm: 4px;
+            --radius-md: 12px;
+            --radius-lg: 18px;
+            --radius-xl: 20px;
+
+            /* Spacing */
+            --spacing-xs: 4px;
+            --spacing-sm: 8px;
+            --spacing-md: 12px;
+            --spacing-lg: 16px;
+            --spacing-xl: 20px;
+            --spacing-2xl: 25px;
+
+            /* Fonts */
+            --font-family: 'Montserrat', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+
+            /* Transitions */
+            --transition-fast: 0.15s ease;
+            --transition-normal: 0.3s ease;
+            --transition-slow: 0.5s ease;
+        }
+
+        * { margin: 0; padding: 0; box-sizing: border-box; }
+
+        body {
+            font-family: var(--font-family);
+            background: linear-gradient(135deg, var(--dark-primary) 0%, var(--dark-secondary) 100%);
+            min-height: 100vh;
+            padding: 20px;
+        }
+        .container {
+            max-width: 1200px;
+            margin: 0 auto;
+            background: var(--overlay-light);
+            backdrop-filter: blur(20px);
+            border-radius: var(--radius-xl);
+            box-shadow: var(--shadow-lg);
+            overflow: hidden;
+            border: 1px solid var(--border-light);
+        }
+        .header {
+            background: linear-gradient(135deg, var(--primary-gold) 0%, var(--primary-gold-dark) 100%);
+            color: var(--dark-secondary);
+            padding: 30px;
+            text-align: center;
+            position: relative;
+        }
+        .header::before {
+            content: '';
+            position: absolute;
+            top: 0;
+            left: 0;
+            right: 0;
+            bottom: 0;
+            background: linear-gradient(45deg, transparent 30%, rgba(255,255,255,0.1) 50%, transparent 70%);
+            animation: shimmer 3s infinite;
+            pointer-events: none;
+        }
+        .header h1 {
+            font-size: 28px;
+            margin-bottom: 10px;
+            font-weight: 600;
+            position: relative;
+            z-index: 1;
+        }
+        .header p {
+            opacity: 0.9;
+            font-size: 14px;
+            position: relative;
+            z-index: 1;
+        }
+        @keyframes shimmer {
+            0% { transform: translateX(-100%); }
+            100% { transform: translateX(100%); }
+        }
+        .content {
+            padding: 40px;
+            background: linear-gradient(180deg, var(--light-bg) 0%, var(--light-bg-gradient) 100%);
+        }
+
+        @keyframes slideIn {
+            from {
+                opacity: 0;
+                transform: translateY(20px);
+            }
+            to {
+                opacity: 1;
+                transform: translateY(0);
+            }
+        }
+
+        @keyframes fadeIn {
+            from { opacity: 0; }
+            to { opacity: 1; }
+        }
+
+        @keyframes pulse {
+            0%, 100% { transform: scale(1); }
+            50% { transform: scale(1.05); }
+        }
+
+        .upload-section {
+            background: var(--white);
+            border-radius: var(--radius-md);
+            padding: 20px;
+            margin-bottom: 30px;
+            box-shadow: var(--shadow-sm);
+        }
+
+        .upload-area {
+            border: 3px dashed var(--primary-gold);
+            border-radius: var(--radius-md);
+            padding: 60px 20px;
+            text-align: center;
+            cursor: pointer;
+            transition: all var(--transition-normal);
+            background: var(--light-bg);
+            margin-bottom: 20px;
+        }
+        .upload-area:hover {
+            background: #fffbf0;
+            border-color: var(--primary-gold-dark);
+            transform: translateY(-2px);
+        }
+        .upload-area.dragover {
+            background: #fff9e6;
+            transform: scale(1.02);
+            border-color: var(--primary-gold-dark);
+        }
+
+        #fileInput { display: none; }
+        .upload-icon { font-size: 48px; margin-bottom: 15px; }
+
+        .output-dir-section {
+            display: flex;
+            align-items: center;
+            gap: 15px;
+            margin-bottom: 20px;
+            padding: 15px;
+            background: white;
+            border-radius: 8px;
+        }
+
+        .output-dir-section label {
+            font-weight: 600;
+            color: #495057;
+            min-width: 120px;
+        }
+
+        #outputDir {
+            flex: 1;
+            padding: 10px;
+            border: 2px solid #dee2e6;
+            border-radius: var(--radius-sm);
+            font-size: 14px;
+            font-family: var(--font-family);
+            transition: border-color var(--transition-fast);
+        }
+
+        #outputDir:focus {
+            outline: none;
+            border-color: var(--primary-gold);
+        }
+
+        .output-dir-hint {
+            font-size: 12px;
+            color: #6c757d;
+            margin-top: 5px;
+        }
+
+        .btn {
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            color: var(--dark-secondary);
+            border: none;
+            padding: 12px 30px;
+            border-radius: var(--radius-md);
+            cursor: pointer;
+            font-size: 16px;
+            font-weight: 600;
+            font-family: var(--font-family);
+            transition: all var(--transition-fast);
+            margin: 5px;
+        }
+        .btn:hover:not(:disabled) {
+            transform: translateY(-2px);
+            box-shadow: 0 4px 12px rgba(255, 196, 7, 0.4);
+        }
+        .btn:active:not(:disabled) {
+            transform: translateY(0);
+        }
+        .btn:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+            transform: none;
+        }
+
+        .btn-small {
+            padding: 8px 20px;
+            font-size: 14px;
+        }
+
+        .progress-bar {
+            width: 100%;
+            height: 30px;
+            background: #e9ecef;
+            border-radius: 15px;
+            overflow: hidden;
+            margin: 20px 0;
+            display: none;
+        }
+
+        .progress-fill {
+            height: 100%;
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            transition: width var(--transition-normal);
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            color: var(--dark-secondary);
+            font-weight: 600;
+            font-size: 14px;
+        }
+
+        .file-list {
+            margin-top: 30px;
+            display: none;
+        }
+
+        .batch-toolbar {
+            background: var(--white);
+            border-radius: var(--radius-md);
+            padding: 15px;
+            margin-bottom: 20px;
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            gap: 15px;
+            box-shadow: var(--shadow-sm);
+        }
+
+        .batch-toolbar-left {
+            display: flex;
+            gap: 10px;
+            align-items: center;
+        }
+
+        .batch-toolbar-right {
+            display: flex;
+            gap: 10px;
+        }
+
+        .btn-toolbar {
+            background: #6c757d;
+            color: white;
+            border: none;
+            padding: 8px 16px;
+            border-radius: 20px;
+            cursor: pointer;
+            font-size: 13px;
+            font-weight: 600;
+            transition: transform 0.2s;
+        }
+
+        .btn-toolbar:hover {
+            transform: translateY(-2px);
+            background: #5a6268;
+        }
+
+        .btn-export {
+            background: linear-gradient(135deg, #28a745 0%, #20c997 100%);
+        }
+
+        .btn-export:hover {
+            background: linear-gradient(135deg, #218838 0%, #1fa589 100%);
+        }
+
+        .selection-count {
+            font-size: 13px;
+            color: #495057;
+            font-weight: 600;
+        }
+
+        .file-item {
+            background: var(--white);
+            border-radius: var(--radius-md);
+            padding: 20px;
+            margin-bottom: 20px;
+            border-left: 4px solid var(--primary-gold);
+            box-shadow: var(--shadow-sm);
+            transition: all var(--transition-fast);
+        }
+
+        .file-item:hover {
+            box-shadow: var(--shadow-md);
+            transform: translateX(2px);
+        }
+
+        .file-item.selected {
+            background: #fffbf0;
+            border-left-color: var(--success-green);
+        }
+
+        .file-header {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            margin-bottom: 15px;
+        }
+
+        .file-header-left {
+            display: flex;
+            align-items: center;
+            gap: 12px;
+        }
+
+        .file-checkbox {
+            width: 20px;
+            height: 20px;
+            cursor: pointer;
+        }
+
+        .file-name {
+            font-weight: 600;
+            font-size: 16px;
+            color: #495057;
+        }
+
+        .file-type {
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            color: var(--dark-secondary);
+            padding: 4px 12px;
+            border-radius: 12px;
+            font-size: 12px;
+            font-weight: 600;
+        }
+
+        .metadata-comparison {
+            display: grid;
+            grid-template-columns: 1fr 1fr;
+            gap: 15px;
+        }
+
+        .metadata-box {
+            background: var(--light-bg);
+            border-radius: var(--radius-sm);
+            padding: 15px;
+            border: 1px solid var(--border-subtle);
+        }
+
+        .metadata-box h4 {
+            color: var(--primary-gold-dark);
+            margin-bottom: 10px;
+            font-size: 14px;
+            font-weight: 600;
+        }
+
+        .metadata-item {
+            display: flex;
+            flex-direction: column;
+            padding: 8px 0;
+            border-bottom: 1px solid #dee2e6;
+        }
+
+        .metadata-item:last-child { border-bottom: none; }
+        .metadata-label { font-weight: 600; color: #495057; font-size: 12px; margin-bottom: 4px; }
+        .metadata-value { color: #6c757d; font-size: 13px; }
+
+        .alert {
+            padding: 15px;
+            border-radius: 8px;
+            margin: 15px 0;
+            display: none;
+        }
+        .alert-error { background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb; }
+        .alert-success { background: #d4edda; color: #155724; border: 1px solid #c3e6cb; }
+        .alert-info { background: #d1ecf1; color: #0c5460; border: 1px solid #bee5eb; }
+
+        .actions {
+            text-align: center;
+            margin-top: 20px;
+        }
+
+        .spinner {
+            border: 3px solid #f3f3f3;
+            border-top: 3px solid var(--primary-gold);
+            border-radius: 50%;
+            width: 40px;
+            height: 40px;
+            animation: spin 1s linear infinite;
+            margin: 20px auto;
+            display: none;
+        }
+
+        @keyframes spin {
+            0% { transform: rotate(0deg); }
+            100% { transform: rotate(360deg); }
+        }
+
+        .footer {
+            text-align: center;
+            padding: 20px;
+            color: #6c757d;
+            font-size: 12px;
+            border-top: 1px solid #dee2e6;
+        }
+
+        /* Metadata Source Selector */
+        .metadata-source-selector {
+            background: white;
+            border-radius: 8px;
+            padding: 15px;
+            margin-bottom: 20px;
+            display: flex;
+            align-items: center;
+            gap: 15px;
+        }
+
+        .metadata-source-selector label {
+            font-weight: 600;
+            color: #495057;
+            min-width: 140px;
+        }
+
+        .source-select {
+            flex: 1;
+            padding: 10px;
+            border: 2px solid var(--primary-gold);
+            border-radius: var(--radius-sm);
+            font-size: 14px;
+            font-family: var(--font-family);
+            cursor: pointer;
+            background: var(--white);
+            transition: border-color var(--transition-fast);
+        }
+        .source-select:focus {
+            outline: none;
+            border-color: var(--primary-gold-dark);
+        }
+
+        .source-info {
+            font-size: 12px;
+            color: #6c757d;
+            margin-left: 10px;
+        }
+
+        /* Editable Metadata Fields */
+        .editable-field {
+            width: 100%;
+            padding: 8px;
+            border: 2px solid #dee2e6;
+            border-radius: 5px;
+            font-size: 13px;
+            font-family: inherit;
+            transition: border-color 0.3s;
+        }
+
+        .editable-field:focus {
+            outline: none;
+            border-color: var(--primary-gold);
+            box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.1);
+        }
+
+        .editable-field.invalid {
+            border-color: #dc3545;
+        }
+
+        textarea.editable-field {
+            min-height: 60px;
+            resize: vertical;
+        }
+
+        .char-count {
+            font-size: 11px;
+            color: #6c757d;
+            margin-top: 4px;
+            display: block;
+        }
+
+        .char-count.warning {
+            color: #ffc107;
+        }
+
+        .char-count.danger {
+            color: #dc3545;
+        }
+
+        .metadata-field {
+            margin-bottom: 15px;
+        }
+
+        .metadata-field label {
+            display: block;
+            font-weight: 600;
+            color: #495057;
+            font-size: 12px;
+            margin-bottom: 5px;
+        }
+
+        /* File Action Buttons */
+        .file-actions {
+            display: flex;
+            gap: 10px;
+            margin-top: 15px;
+        }
+
+        .btn-save {
+            background: linear-gradient(135deg, #28a745 0%, #20c997 100%);
+            color: white;
+            border: none;
+            padding: 8px 20px;
+            border-radius: 20px;
+            cursor: pointer;
+            font-size: 14px;
+            font-weight: 600;
+            transition: transform 0.2s;
+        }
+
+        .btn-save:hover {
+            transform: translateY(-2px);
+        }
+
+        .btn-save:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+            transform: none;
+        }
+
+        .btn-reset {
+            background: #6c757d;
+            color: white;
+            border: none;
+            padding: 8px 20px;
+            border-radius: 20px;
+            cursor: pointer;
+            font-size: 14px;
+            font-weight: 600;
+            transition: transform 0.2s;
+        }
+
+        .btn-reset:hover {
+            transform: translateY(-2px);
+            background: #5a6268;
+        }
+
+        /* Import Metadata Section */
+        .import-section {
+            background: white;
+            border-radius: 8px;
+            padding: 15px;
+            margin-bottom: 15px;
+            border: 2px dashed #dee2e6;
+        }
+
+        .import-section.active {
+            border-color: var(--success-green);
+            background: #f0fff4;
+        }
+
+        .btn-import {
+            background: linear-gradient(135deg, #17a2b8 0%, #138496 100%);
+            color: white;
+            border: none;
+            padding: 8px 20px;
+            border-radius: 20px;
+            cursor: pointer;
+            font-size: 14px;
+            font-weight: 600;
+            transition: transform 0.2s;
+        }
+
+        .btn-import:hover {
+            transform: translateY(-2px);
+        }
+
+        .import-stats {
+            font-size: 12px;
+            color: #28a745;
+            margin-top: 10px;
+            padding: 8px;
+            background: white;
+            border-radius: 5px;
+        }
+
+        /* Template Section */
+        .template-section {
+            background: white;
+            border-radius: 8px;
+            padding: 15px;
+            margin-bottom: 15px;
+            border: 2px dashed #dee2e6;
+        }
+
+        .template-section.active {
+            border-color: var(--primary-gold);
+            background: #fffbf0;
+        }
+
+        .template-controls {
+            display: flex;
+            gap: 10px;
+            align-items: center;
+            flex-wrap: wrap;
+        }
+
+        .template-select {
+            flex: 1;
+            min-width: 200px;
+            padding: 8px;
+            border: 2px solid var(--primary-gold);
+            border-radius: var(--radius-sm);
+            font-size: 13px;
+            font-family: var(--font-family);
+            cursor: pointer;
+            transition: border-color var(--transition-fast);
+        }
+
+        .template-select:focus {
+            outline: none;
+            border-color: var(--primary-gold-dark);
+        }
+
+        .btn-template {
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            color: var(--dark-secondary);
+            border: none;
+            padding: 8px 16px;
+            border-radius: var(--radius-md);
+            cursor: pointer;
+            font-size: 13px;
+            font-weight: 600;
+            font-family: var(--font-family);
+            transition: all var(--transition-fast);
+        }
+
+        .btn-template:hover:not(:disabled) {
+            transform: translateY(-2px);
+            box-shadow: 0 4px 12px rgba(255, 196, 7, 0.3);
+        }
+
+        .btn-template:disabled {
+            opacity: 0.5;
+            cursor: not-allowed;
+            transform: none;
+        }
+
+        .template-preview {
+            margin-top: 10px;
+            padding: 10px;
+            background: white;
+            border-radius: 5px;
+            font-size: 12px;
+            color: #495057;
+            display: none;
+        }
+
+        .template-preview-item {
+            margin-bottom: 5px;
+        }
+
+        .template-preview-label {
+            font-weight: 600;
+            color: var(--primary-gold-dark);
+        }
+
+        /* Modal Styles */
+        .modal {
+            display: none;
+            position: fixed;
+            z-index: 1000;
+            left: 0;
+            top: 0;
+            width: 100%;
+            height: 100%;
+            background-color: rgba(0,0,0,0.5);
+        }
+
+        .modal-content {
+            background-color: white;
+            margin: 5% auto;
+            padding: 30px;
+            border-radius: 15px;
+            width: 90%;
+            max-width: 600px;
+            box-shadow: 0 20px 60px rgba(0,0,0,0.3);
+        }
+
+        .modal-header {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            margin-bottom: 20px;
+        }
+
+        .modal-header h3 {
+            color: var(--primary-gold-dark);
+            margin: 0;
+            font-weight: 600;
+        }
+
+        .close-modal {
+            font-size: 28px;
+            font-weight: bold;
+            color: #aaa;
+            cursor: pointer;
+        }
+
+        .close-modal:hover {
+            color: #000;
+        }
+
+        .form-group {
+            margin-bottom: 15px;
+        }
+
+        .form-group label {
+            display: block;
+            font-weight: 600;
+            color: #495057;
+            margin-bottom: 5px;
+            font-size: 13px;
+        }
+
+        .form-group input,
+        .form-group textarea {
+            width: 100%;
+            padding: 10px;
+            border: 2px solid #dee2e6;
+            border-radius: var(--radius-sm);
+            font-size: 13px;
+            font-family: var(--font-family);
+            transition: border-color var(--transition-fast);
+        }
+
+        .form-group input:focus,
+        .form-group textarea:focus {
+            outline: none;
+            border-color: var(--primary-gold);
+            box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.1);
+        }
+
+        .form-group textarea {
+            min-height: 60px;
+            resize: vertical;
+        }
+
+        .form-group small {
+            font-size: 11px;
+            color: #6c757d;
+            margin-top: 3px;
+            display: block;
+        }
+
+        .variable-hint {
+            background: #fffbf0;
+            padding: 8px;
+            border-radius: var(--radius-sm);
+            font-size: 11px;
+            color: var(--primary-gold-dark);
+            margin-top: 5px;
+            border: 1px solid rgba(255, 196, 7, 0.2);
+        }
+
+        @media (max-width: 768px) {
+            .metadata-comparison {
+                grid-template-columns: 1fr;
+            }
+            .metadata-source-selector {
+                flex-direction: column;
+                align-items: flex-start;
+            }
+            .metadata-source-selector label {
+                min-width: auto;
+            }
+        }
--- a/static/js/admin.js
+++ b/static/js/admin.js
@ -0,0 +1,265 @@
+// Admin Dashboard JavaScript
+
+document.addEventListener('DOMContentLoaded', () => {
+    loadUsers();
+});
+
+function switchTab(tab) {
+    document.querySelectorAll('.admin-tab').forEach(t => t.classList.remove('active'));
+    document.querySelectorAll('.admin-panel').forEach(p => p.style.display = 'none');
+
+    event.target.classList.add('active');
+
+    if (tab === 'users') {
+        document.getElementById('usersPanel').style.display = 'block';
+        loadUsers();
+    } else if (tab === 'audit') {
+        document.getElementById('auditPanel').style.display = 'block';
+        loadAuditLog();
+    } else if (tab === 'ai-usage') {
+        document.getElementById('aiUsagePanel').style.display = 'block';
+        loadAiUsage();
+    }
+}
+
+// --- Users ---
+
+async function loadUsers() {
+    try {
+        const resp = await fetch(BASE_PATH + '/admin/users?include_inactive=true');
+        const data = await resp.json();
+        if (data.success) {
+            renderUsersTable(data.users);
+            populateAuditUserFilter(data.users);
+        }
+    } catch (err) {
+        console.error('Failed to load users:', err);
+    }
+}
+
+function renderUsersTable(users) {
+    const tbody = document.getElementById('usersTableBody');
+    if (!users.length) {
+        tbody.innerHTML = '<tr><td colspan="8" style="text-align:center;color:#6b7280;">No users found</td></tr>';
+        return;
+    }
+    tbody.innerHTML = users.map(u => `
+        <tr>
+            <td>${u.id}</td>
+            <td><strong>${escapeHtml(u.username)}</strong></td>
+            <td>${escapeHtml(u.email || '-')}</td>
+            <td><span class="badge badge-${u.role}">${u.role}</span></td>
+            <td>${u.auth_method || 'local'}</td>
+            <td>${u.last_login ? formatDate(u.last_login) : 'Never'}</td>
+            <td><span class="badge badge-${u.is_active ? 'active' : 'inactive'}">${u.is_active ? 'Active' : 'Inactive'}</span></td>
+            <td>
+                ${u.is_active
+                    ? `<button class="btn-action danger" onclick="toggleUser(${u.id}, false)">Deactivate</button>`
+                    : `<button class="btn-action" onclick="toggleUser(${u.id}, true)">Activate</button>`
+                }
+                <button class="btn-action" onclick="toggleRole(${u.id}, '${u.role}')">${u.role === 'admin' ? 'Demote' : 'Promote'}</button>
+            </td>
+        </tr>
+    `).join('');
+}
+
+async function toggleUser(userId, activate) {
+    try {
+        const resp = await fetch(`${BASE_PATH}/admin/users/${userId}`, {
+            method: 'PUT',
+            headers: {'Content-Type': 'application/json'},
+            body: JSON.stringify({is_active: activate ? 1 : 0}),
+        });
+        const data = await resp.json();
+        if (data.success) loadUsers();
+        else alert(data.error || 'Failed to update user');
+    } catch (err) {
+        alert('Error: ' + err.message);
+    }
+}
+
+async function toggleRole(userId, currentRole) {
+    const newRole = currentRole === 'admin' ? 'user' : 'admin';
+    if (!confirm(`Change user role to "${newRole}"?`)) return;
+    try {
+        const resp = await fetch(`${BASE_PATH}/admin/users/${userId}`, {
+            method: 'PUT',
+            headers: {'Content-Type': 'application/json'},
+            body: JSON.stringify({role: newRole}),
+        });
+        const data = await resp.json();
+        if (data.success) loadUsers();
+        else alert(data.error || 'Failed to update role');
+    } catch (err) {
+        alert('Error: ' + err.message);
+    }
+}
+
+function showCreateUserModal() {
+    document.getElementById('createUserModal').style.display = 'flex';
+}
+
+function closeCreateUserModal() {
+    document.getElementById('createUserModal').style.display = 'none';
+    document.getElementById('newUsername').value = '';
+    document.getElementById('newEmail').value = '';
+    document.getElementById('newFullName').value = '';
+    document.getElementById('newPassword').value = '';
+    document.getElementById('newRole').value = 'user';
+    document.getElementById('newAuthMethod').value = 'local';
+}
+
+async function createUser() {
+    const username = document.getElementById('newUsername').value.trim();
+    if (!username) { alert('Username is required'); return; }
+
+    const payload = {
+        username,
+        email: document.getElementById('newEmail').value.trim(),
+        full_name: document.getElementById('newFullName').value.trim(),
+        password: document.getElementById('newPassword').value || null,
+        role: document.getElementById('newRole').value,
+        auth_method: document.getElementById('newAuthMethod').value,
+    };
+
+    try {
+        const resp = await fetch(BASE_PATH + '/admin/users', {
+            method: 'POST',
+            headers: {'Content-Type': 'application/json'},
+            body: JSON.stringify(payload),
+        });
+        const data = await resp.json();
+        if (data.success) {
+            closeCreateUserModal();
+            loadUsers();
+        } else {
+            alert(data.error || 'Failed to create user');
+        }
+    } catch (err) {
+        alert('Error: ' + err.message);
+    }
+}
+
+// --- Audit Log ---
+
+function populateAuditUserFilter(users) {
+    const select = document.getElementById('auditUserFilter');
+    const currentVal = select.value;
+    select.innerHTML = '<option value="">All Users</option>';
+    users.forEach(u => {
+        select.innerHTML += `<option value="${u.id}">${escapeHtml(u.username)}</option>`;
+    });
+    select.value = currentVal;
+}
+
+async function loadAuditLog() {
+    const userId = document.getElementById('auditUserFilter').value;
+    let url = BASE_PATH + '/admin/audit?limit=200';
+    if (userId) url += `&user_id=${userId}`;
+
+    try {
+        const resp = await fetch(url);
+        const data = await resp.json();
+        if (data.success) {
+            renderAuditTable(data.entries);
+        }
+    } catch (err) {
+        console.error('Failed to load audit log:', err);
+    }
+}
+
+function renderAuditTable(entries) {
+    const tbody = document.getElementById('auditTableBody');
+    if (!entries.length) {
+        tbody.innerHTML = '<tr><td colspan="4" style="text-align:center;color:#6b7280;">No audit entries</td></tr>';
+        return;
+    }
+    tbody.innerHTML = entries.map(e => `
+        <tr>
+            <td style="white-space:nowrap;">${formatDate(e.timestamp)}</td>
+            <td>${escapeHtml(e.username || 'Unknown')}</td>
+            <td><strong>${escapeHtml(e.action)}</strong></td>
+            <td style="max-width:400px;overflow:hidden;text-overflow:ellipsis;">${escapeHtml(e.details || '-')}</td>
+        </tr>
+    `).join('');
+}
+
+// --- AI Usage ---
+
+async function loadAiUsage() {
+    try {
+        const resp = await fetch(BASE_PATH + '/admin/ai-usage');
+        const data = await resp.json();
+        if (data.success) {
+            renderAiStats(data.stats);
+            renderAiUsageTable(data.by_user);
+        }
+    } catch (err) {
+        console.error('Failed to load AI usage:', err);
+    }
+}
+
+function renderAiStats(stats) {
+    const grid = document.getElementById('aiStatsGrid');
+    grid.innerHTML = `
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${stats.total_requests || 0}</div>
+            <div class="ai-stat-label">Total Requests</div>
+        </div>
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${(stats.total_tokens || 0).toLocaleString()}</div>
+            <div class="ai-stat-label">Total Tokens</div>
+        </div>
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${stats.requests_24h || 0}</div>
+            <div class="ai-stat-label">Requests (24h)</div>
+        </div>
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${(stats.tokens_24h || 0).toLocaleString()}</div>
+            <div class="ai-stat-label">Tokens (24h)</div>
+        </div>
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${stats.requests_7d || 0}</div>
+            <div class="ai-stat-label">Requests (7d)</div>
+        </div>
+        <div class="ai-stat-card">
+            <div class="ai-stat-value">${(stats.tokens_7d || 0).toLocaleString()}</div>
+            <div class="ai-stat-label">Tokens (7d)</div>
+        </div>
+    `;
+}
+
+function renderAiUsageTable(byUser) {
+    const tbody = document.getElementById('aiUsageTableBody');
+    if (!byUser.length) {
+        tbody.innerHTML = '<tr><td colspan="4" style="text-align:center;color:#6b7280;">No AI usage data</td></tr>';
+        return;
+    }
+    tbody.innerHTML = byUser.map(u => `
+        <tr>
+            <td><strong>${escapeHtml(u.username)}</strong></td>
+            <td>${u.request_count}</td>
+            <td>${(u.total_tokens || 0).toLocaleString()}</td>
+            <td>${u.last_used ? formatDate(u.last_used) : '-'}</td>
+        </tr>
+    `).join('');
+}
+
+// --- Helpers ---
+
+function escapeHtml(str) {
+    if (!str) return '';
+    const div = document.createElement('div');
+    div.textContent = str;
+    return div.innerHTML;
+}
+
+function formatDate(dateStr) {
+    if (!dateStr) return '-';
+    try {
+        const d = new Date(dateStr);
+        return d.toLocaleString();
+    } catch {
+        return dateStr;
+    }
+}
--- a/static/js/app.js
+++ b/static/js/app.js
--- a/templates/admin.html
+++ b/templates/admin.html
@ -0,0 +1,187 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Admin - Oliver Metadata Tool</title>
+    <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="{{ request.scope.get('root_path', '') }}/static/css/app.css">
+    <link rel="stylesheet" href="{{ request.scope.get('root_path', '') }}/static/css/admin.css">
+</head>
+<body>
+{% set base = request.scope.get('root_path', '') %}
+    <div class="container">
+        <div class="header">
+            <h1>Admin Dashboard</h1>
+            <p>Oliver Metadata Tool - Administration</p>
+            <div style="position: absolute; top: 15px; right: 20px; font-size: 13px; color: #6b7280;">
+                {{ username }} |
+                <a href="{{ base }}/" style="color: #FFC407;">Home</a> |
+                <a href="{{ base }}/logout" style="color: #FFC407;">Logout</a>
+            </div>
+        </div>
+
+        <div class="content">
+            <!-- Stats Cards -->
+            <div class="admin-stats">
+                <div class="stat-card">
+                    <div class="stat-value" id="statActiveUsers">{{ stats.active_users | default(0) }}</div>
+                    <div class="stat-label">Active Users</div>
+                </div>
+                <div class="stat-card">
+                    <div class="stat-value" id="statActiveSessions">{{ stats.active_sessions | default(0) }}</div>
+                    <div class="stat-label">Active Sessions</div>
+                </div>
+                <div class="stat-card">
+                    <div class="stat-value" id="statAuditEntries">{{ stats.recent_activity | default(0) }}</div>
+                    <div class="stat-label">Activity (24h)</div>
+                </div>
+                <div class="stat-card">
+                    <div class="stat-value" id="statTotalTokens">{{ stats.ai_usage.total_tokens | default(0) }}</div>
+                    <div class="stat-label">AI Tokens Used</div>
+                </div>
+            </div>
+
+            <!-- Tabs -->
+            <div class="admin-tabs">
+                <button class="admin-tab active" onclick="switchTab('users')">Users</button>
+                <button class="admin-tab" onclick="switchTab('audit')">Audit Log</button>
+                <button class="admin-tab" onclick="switchTab('ai-usage')">AI Usage</button>
+            </div>
+
+            <!-- Users Tab -->
+            <div class="admin-panel" id="usersPanel">
+                <div class="panel-header">
+                    <h3>User Management</h3>
+                    <button class="btn btn-sm" onclick="showCreateUserModal()">+ Create User</button>
+                </div>
+                <div class="admin-table-container">
+                    <table class="admin-table" id="usersTable">
+                        <thead>
+                            <tr>
+                                <th>ID</th>
+                                <th>Username</th>
+                                <th>Email</th>
+                                <th>Role</th>
+                                <th>Auth</th>
+                                <th>Last Login</th>
+                                <th>Status</th>
+                                <th>Actions</th>
+                            </tr>
+                        </thead>
+                        <tbody id="usersTableBody">
+                            <tr><td colspan="8" style="text-align: center; color: #6b7280;">Loading...</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+            </div>
+
+            <!-- Audit Log Tab -->
+            <div class="admin-panel" id="auditPanel" style="display: none;">
+                <div class="panel-header">
+                    <h3>Audit Log</h3>
+                    <div class="audit-filters">
+                        <select id="auditUserFilter" onchange="loadAuditLog()">
+                            <option value="">All Users</option>
+                        </select>
+                        <button class="btn btn-sm" onclick="loadAuditLog()">Refresh</button>
+                    </div>
+                </div>
+                <div class="admin-table-container">
+                    <table class="admin-table" id="auditTable">
+                        <thead>
+                            <tr>
+                                <th>Time</th>
+                                <th>User</th>
+                                <th>Action</th>
+                                <th>Details</th>
+                            </tr>
+                        </thead>
+                        <tbody id="auditTableBody">
+                            <tr><td colspan="4" style="text-align: center; color: #6b7280;">Loading...</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+            </div>
+
+            <!-- AI Usage Tab -->
+            <div class="admin-panel" id="aiUsagePanel" style="display: none;">
+                <div class="panel-header">
+                    <h3>AI Usage Statistics</h3>
+                    <button class="btn btn-sm" onclick="loadAiUsage()">Refresh</button>
+                </div>
+                <div class="ai-stats-grid" id="aiStatsGrid">
+                    <!-- Populated by JS -->
+                </div>
+                <h4 style="margin: 20px 0 10px;">Usage by User</h4>
+                <div class="admin-table-container">
+                    <table class="admin-table" id="aiUsageTable">
+                        <thead>
+                            <tr>
+                                <th>User</th>
+                                <th>Requests</th>
+                                <th>Total Tokens</th>
+                                <th>Last Used</th>
+                            </tr>
+                        </thead>
+                        <tbody id="aiUsageTableBody">
+                            <tr><td colspan="4" style="text-align: center; color: #6b7280;">Loading...</td></tr>
+                        </tbody>
+                    </table>
+                </div>
+            </div>
+        </div>
+
+        <div class="footer">
+            Oliver Metadata Tool v4.0.0 | Admin Dashboard
+        </div>
+    </div>
+
+    <!-- Create User Modal -->
+    <div id="createUserModal" class="modal">
+        <div class="modal-content" style="max-width: 500px;">
+            <div class="modal-header">
+                <h3>Create New User</h3>
+                <span class="close-modal" onclick="closeCreateUserModal()">&times;</span>
+            </div>
+            <div class="form-group">
+                <label for="newUsername">Username *</label>
+                <input type="text" id="newUsername" placeholder="username" required>
+            </div>
+            <div class="form-group">
+                <label for="newEmail">Email</label>
+                <input type="email" id="newEmail" placeholder="user@example.com">
+            </div>
+            <div class="form-group">
+                <label for="newFullName">Full Name</label>
+                <input type="text" id="newFullName" placeholder="Full Name">
+            </div>
+            <div class="form-group">
+                <label for="newPassword">Password (for local auth)</label>
+                <input type="password" id="newPassword" placeholder="Leave empty for SSO-only">
+            </div>
+            <div class="form-group">
+                <label for="newRole">Role</label>
+                <select id="newRole">
+                    <option value="user">User</option>
+                    <option value="admin">Admin</option>
+                </select>
+            </div>
+            <div class="form-group">
+                <label for="newAuthMethod">Auth Method</label>
+                <select id="newAuthMethod">
+                    <option value="local">Local (Password)</option>
+                    <option value="sso">SSO (Microsoft)</option>
+                </select>
+            </div>
+            <div style="display: flex; gap: 10px; margin-top: 20px;">
+                <button class="btn" onclick="createUser()">Create User</button>
+                <button class="btn" style="background: #6c757d;" onclick="closeCreateUserModal()">Cancel</button>
+            </div>
+        </div>
+    </div>
+
+    <script>const BASE_PATH = "{{ base }}";</script>
+    <script src="{{ base }}/static/js/admin.js"></script>
+</body>
+</html>
--- a/templates/index.html
+++ b/templates/index.html
@ -0,0 +1,184 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Oliver Metadata Tool</title>
+    <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="{{ request.scope.get('root_path', '') }}/static/css/app.css">
+</head>
+<body>
+{% set base = request.scope.get('root_path', '') %}
+    <div class="container">
+        <div class="header">
+            <h1>Oliver Metadata Tool</h1>
+            <p>Universal metadata creation and management for all file types</p>
+            <div style="position: absolute; top: 15px; right: 20px; font-size: 13px; color: #6b7280;">
+                {{ username }} | <a href="{{ base }}/logout" style="color: #FFC407;">Logout</a>
+            </div>
+        </div>
+
+        <div class="content">
+            <div class="upload-section">
+                <div class="metadata-source-selector">
+                    <label for="metadataSource">Metadata Source:</label>
+                    <select id="metadataSource" class="source-select" onchange="handleSourceChange()">
+                        <option value="import" selected>Import from File (CSV/Excel/JSON)</option>
+                        <option value="manual">Manual Entry</option>
+                        <option value="ai">AI Generation (Slower)</option>
+                    </select>
+                    <span class="source-info">Choose how to generate metadata</span>
+                </div>
+
+                <div class="import-section" id="importSection" style="display: block;">
+                    <h4 style="margin-bottom: 10px; color: #495057;">Import Metadata File</h4>
+                    <p style="font-size: 13px; color: #6c757d; margin-bottom: 10px;">
+                        Upload a CSV, Excel (.xlsx, .xls), or JSON file with metadata. You'll configure column mapping after upload.
+                    </p>
+                    <input type="file" id="importFileInput" accept=".csv,.xlsx,.xls,.json" style="display: none;">
+                    <button class="btn-import" onclick="document.getElementById('importFileInput').click()">
+                        Choose File to Import
+                    </button>
+                    <div id="importStats" class="import-stats" style="display: none;"></div>
+                </div>
+
+                <div class="template-section" id="templateSection">
+                    <h4 style="margin-bottom: 10px; color: #495057;">Metadata Templates</h4>
+                    <p style="font-size: 13px; color: #6c757d; margin-bottom: 10px;">
+                        Use templates with variables like {filename}, {date}, {user} for quick metadata generation
+                    </p>
+                    <div class="template-controls">
+                        <select id="templateSelect" class="template-select">
+                            <option value="">Select a template...</option>
+                        </select>
+                        <button class="btn-template" onclick="applyTemplate()" id="applyTemplateBtn" disabled>
+                            Apply Template
+                        </button>
+                        <button class="btn-template" onclick="showCreateTemplateModal()">
+                            Create New
+                        </button>
+                        <button class="btn-template" onclick="manageTemplates()">
+                            Manage
+                        </button>
+                    </div>
+                    <div id="templatePreview" class="template-preview"></div>
+                </div>
+
+                <div class="upload-area" id="uploadArea">
+                    <div class="upload-icon">📁</div>
+                    <h3>Drop files here or click to browse</h3>
+                    <p style="color: #6c757d; margin-top: 10px;">Supported: PDF, JPG, PNG, DOCX, XLSX, PPTX, MP4, MOV</p>
+                    <p style="color: #667eea; margin-top: 5px; font-weight: 600;">Multiple files supported!</p>
+                    <input type="file" id="fileInput" accept=".pdf,.jpg,.jpeg,.png,.gif,.docx,.xlsx,.pptx,.mp4,.mov,.avi" multiple>
+                </div>
+
+                {% if not docker_mode %}
+                <div class="output-dir-section">
+                    <label for="outputDir">Save to folder:</label>
+                    <input type="text" id="outputDir" placeholder="Leave empty to save in original location or paste folder path here" style="flex: 1;" />
+                </div>
+                <div class="output-dir-hint">
+                    <strong>How to copy folder path:</strong><br>
+                    <span style="display: inline-block; margin-top: 5px;">
+                        <strong>Mac:</strong> Right-click folder in Finder → hold Option key → click "Copy ... as Pathname"<br>
+                        <strong>Windows:</strong> Shift + Right-click folder → "Copy as path" (remove quotes after pasting)
+                    </span>
+                </div>
+                {% else %}
+                <div class="output-dir-hint" style="background: #e3f2fd; border-left: 4px solid #2196f3; padding: 12px; margin: 10px 0;">
+                    <strong>Docker Mode:</strong> Files will be updated and available for download from your browser after processing.
+                </div>
+                {% endif %}
+            </div>
+
+            <div class="progress-bar" id="progressBar">
+                <div class="progress-fill" id="progressFill">0%</div>
+            </div>
+
+            <div class="spinner" id="spinner"></div>
+            <div class="alert alert-error" id="errorAlert"></div>
+            <div class="alert alert-success" id="successAlert"></div>
+            <div class="alert alert-info" id="infoAlert"></div>
+
+            <div class="file-list" id="fileList">
+                <div class="batch-toolbar" id="batchToolbar" style="display: none;">
+                    <div class="batch-toolbar-left">
+                        <button class="btn-toolbar" onclick="selectAllFiles()">Select All</button>
+                        <button class="btn-toolbar" onclick="deselectAllFiles()">Deselect All</button>
+                        <span class="selection-count" id="selectionCount">0 selected</span>
+                    </div>
+                    <div class="batch-toolbar-right">
+                        <button class="btn-toolbar btn-export" onclick="exportResults()">Export Results</button>
+                    </div>
+                </div>
+            </div>
+
+            <div class="actions" id="actions" style="display: none;">
+                <button class="btn" id="updateAllBtn" onclick="updateAllFiles()">
+                    Update Selected Files
+                </button>
+                <button class="btn" onclick="resetForm()">
+                    Process More Files
+                </button>
+            </div>
+        </div>
+
+        <div class="footer">
+            Oliver Metadata Tool v4.0.0 | Multiple metadata sources | Import &bull; AI &bull; Manual &bull; Templates
+        </div>
+    </div>
+
+    <!-- Import Mapping Modal -->
+    <div id="importMappingModal" class="modal">
+        <div class="modal-content" style="max-width: 700px;">
+            <div class="modal-header">
+                <h3>Configure Import Mapping</h3>
+                <span class="close-modal" onclick="closeImportMappingModal()">&times;</span>
+            </div>
+            <div id="importMappingContent">
+                <!-- Will be populated dynamically -->
+            </div>
+        </div>
+    </div>
+
+    <!-- Create Template Modal -->
+    <div id="createTemplateModal" class="modal">
+        <div class="modal-content">
+            <div class="modal-header">
+                <h3>Create Metadata Template</h3>
+                <span class="close-modal" onclick="closeCreateTemplateModal()">&times;</span>
+            </div>
+            <div class="form-group">
+                <label for="templateName">Template Name *</label>
+                <input type="text" id="templateName" placeholder="e.g., Product Brochure Template" required>
+            </div>
+            <div class="form-group">
+                <label for="templateDescription">Description</label>
+                <input type="text" id="templateDescription" placeholder="Optional description of this template">
+            </div>
+            <div class="form-group">
+                <label for="templateTitle">Title Template *</label>
+                <input type="text" id="templateTitle" placeholder="e.g., {filename} - Product Guide">
+                <div class="variable-hint">
+                    Available variables: {filename}, {date}, {datetime}, {user}, {year}, {month}, {day}
+                </div>
+            </div>
+            <div class="form-group">
+                <label for="templateSubject">Subject Template *</label>
+                <textarea id="templateSubject" placeholder="e.g., Product information guide for {filename}"></textarea>
+            </div>
+            <div class="form-group">
+                <label for="templateKeywords">Keywords Template *</label>
+                <input type="text" id="templateKeywords" placeholder="e.g., product, guide, {year}">
+            </div>
+            <div style="display: flex; gap: 10px; margin-top: 20px;">
+                <button class="btn" onclick="saveNewTemplate()">Save Template</button>
+                <button class="btn" style="background: #6c757d;" onclick="closeCreateTemplateModal()">Cancel</button>
+            </div>
+        </div>
+    </div>
+
+    <script>const BASE_PATH = "{{ base }}";</script>
+    <script src="{{ base }}/static/js/app.js"></script>
+</body>
+</html>
--- a/templates/login.html
+++ b/templates/login.html
@ -0,0 +1,302 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Login - Oliver Metadata Tool</title>
+    <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap" rel="stylesheet">
+    <style>
+        :root {
+            --primary-gold: #FFC407;
+            --primary-gold-dark: #e6b007;
+            --primary-gold-light: #ffcf33;
+            --dark-primary: #2c2c2c;
+            --dark-secondary: #1a1a1a;
+            --white: #ffffff;
+            --text-primary: #1f2937;
+            --text-muted: #6b7280;
+            --overlay-light: rgba(255, 255, 255, 0.95);
+            --border-light: rgba(255, 255, 255, 0.2);
+            --shadow-lg: 0 20px 40px rgba(0, 0, 0, 0.1);
+            --radius-md: 12px;
+            --radius-xl: 20px;
+            --font-family: 'Montserrat', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+            --transition-fast: 0.15s ease;
+        }
+
+        * { margin: 0; padding: 0; box-sizing: border-box; }
+
+        @keyframes shimmer {
+            0% { transform: translateX(-100%); }
+            100% { transform: translateX(100%); }
+        }
+
+        @keyframes pulse {
+            0%, 100% { transform: scale(1); }
+            50% { transform: scale(1.05); }
+        }
+
+        body {
+            font-family: var(--font-family);
+            background: linear-gradient(135deg, var(--dark-primary) 0%, var(--dark-secondary) 100%);
+            min-height: 100vh;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            padding: 20px;
+        }
+
+        .login-container {
+            background: var(--overlay-light);
+            backdrop-filter: blur(20px);
+            border-radius: var(--radius-xl);
+            box-shadow: var(--shadow-lg);
+            border: 1px solid var(--border-light);
+            width: 100%;
+            max-width: 450px;
+            padding: 40px;
+        }
+
+        .logo {
+            text-align: center;
+            margin-bottom: 30px;
+            position: relative;
+        }
+
+        .logo h1 {
+            color: var(--primary-gold-dark);
+            font-size: 32px;
+            margin-bottom: 10px;
+            font-weight: 700;
+            text-shadow: 0 2px 4px rgba(255, 196, 7, 0.2);
+        }
+
+        .logo p {
+            color: var(--text-muted);
+            font-size: 14px;
+            font-weight: 500;
+        }
+
+        .divider {
+            text-align: center;
+            margin: 30px 0;
+            position: relative;
+        }
+
+        .divider::before {
+            content: '';
+            position: absolute;
+            left: 0;
+            right: 0;
+            top: 50%;
+            height: 2px;
+            background: linear-gradient(90deg, transparent, var(--primary-gold-light), transparent);
+        }
+
+        .divider span {
+            background: var(--overlay-light);
+            padding: 0 15px;
+            color: var(--text-muted);
+            font-size: 13px;
+            font-weight: 600;
+            position: relative;
+            z-index: 1;
+        }
+
+        .form-group {
+            margin-bottom: 20px;
+        }
+
+        .form-group label {
+            display: block;
+            font-weight: 600;
+            color: var(--text-primary);
+            margin-bottom: 8px;
+            font-size: 14px;
+        }
+
+        .form-group input {
+            width: 100%;
+            padding: 12px;
+            border: 2px solid #dee2e6;
+            border-radius: var(--radius-md);
+            font-size: 14px;
+            font-family: var(--font-family);
+            transition: all var(--transition-fast);
+        }
+
+        .form-group input:focus {
+            outline: none;
+            border-color: var(--primary-gold);
+            box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.1);
+        }
+
+        .btn {
+            width: 100%;
+            padding: 14px;
+            border: none;
+            border-radius: var(--radius-md);
+            font-size: 16px;
+            font-weight: 600;
+            font-family: var(--font-family);
+            cursor: pointer;
+            transition: all var(--transition-fast);
+        }
+
+        .btn:hover {
+            transform: translateY(-2px);
+        }
+
+        .btn-primary {
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            color: var(--dark-secondary);
+            margin-bottom: 15px;
+            box-shadow: 0 4px 12px rgba(255, 196, 7, 0.3);
+        }
+
+        .btn-primary:hover {
+            box-shadow: 0 6px 16px rgba(255, 196, 7, 0.4);
+        }
+
+        .btn-sso {
+            background: var(--white);
+            color: var(--text-primary);
+            border: 2px solid var(--primary-gold);
+            text-decoration: none;
+            display: block;
+            text-align: center;
+        }
+
+        .btn-sso:hover {
+            border-color: var(--primary-gold-dark);
+            background: #fffbf0;
+            color: var(--primary-gold-dark);
+        }
+
+        .alert {
+            padding: 12px;
+            border-radius: var(--radius-md);
+            margin-bottom: 20px;
+            font-size: 14px;
+            font-weight: 500;
+        }
+
+        .alert-error {
+            background: #fee;
+            color: #c33;
+            border: 2px solid #fcc;
+        }
+
+        .alert-info {
+            background: #fffbf0;
+            color: var(--primary-gold-dark);
+            border: 2px solid var(--primary-gold-light);
+        }
+
+        .test-user-info {
+            background: #fffbf0;
+            border: 2px dashed var(--primary-gold);
+            border-radius: var(--radius-md);
+            padding: 15px;
+            margin-bottom: 20px;
+            font-size: 13px;
+            color: var(--text-primary);
+            animation: pulse 3s infinite;
+        }
+
+        .test-user-info strong {
+            color: var(--primary-gold-dark);
+            font-weight: 600;
+        }
+
+        .test-user-info code {
+            background: rgba(255, 196, 7, 0.15);
+            padding: 2px 6px;
+            border-radius: 4px;
+            font-family: 'Courier New', monospace;
+            color: var(--primary-gold-dark);
+            font-weight: 600;
+        }
+
+        .footer-text {
+            text-align: center;
+            margin-top: 20px;
+            font-size: 12px;
+            color: var(--text-muted);
+            font-weight: 500;
+        }
+
+        .microsoft-icon {
+            display: inline-block;
+            margin-right: 8px;
+        }
+    </style>
+</head>
+<body>
+{% set base = request.scope.get('root_path', '') %}
+    <div class="login-container">
+        <div class="logo">
+            <h1>Oliver Metadata Tool</h1>
+            <p>Sign in to continue</p>
+        </div>
+
+        {% if error %}
+        <div class="alert alert-error">
+            {{ error }}
+        </div>
+        {% endif %}
+
+        {% if info %}
+        <div class="alert alert-info">
+            {{ info }}
+        </div>
+        {% endif %}
+
+        {% if enable_test_user %}
+        <div class="test-user-info">
+            <strong>Test Account</strong><br>
+            Username: <code>tester</code><br>
+            Password: <code>oliveradmin</code>
+        </div>
+        {% endif %}
+
+        <form method="POST" action="{{ base }}/login">
+            <div class="form-group">
+                <label for="username">Username</label>
+                <input type="text" id="username" name="username" required autofocus placeholder="Enter your username">
+            </div>
+
+            <div class="form-group">
+                <label for="password">Password</label>
+                <input type="password" id="password" name="password" required placeholder="Enter your password">
+            </div>
+
+            <button type="submit" class="btn btn-primary">
+                Sign In
+            </button>
+        </form>
+
+        {% if sso_enabled %}
+        <div class="divider">
+            <span>OR</span>
+        </div>
+
+        <a href="{{ base }}/login/microsoft" class="btn btn-sso">
+            <span class="microsoft-icon">
+                <svg width="20" height="20" viewBox="0 0 23 23" style="vertical-align: middle;">
+                    <path fill="#f25022" d="M1 1h10v10H1z"/>
+                    <path fill="#00a4ef" d="M12 1h10v10H12z"/>
+                    <path fill="#7fba00" d="M1 12h10v10H1z"/>
+                    <path fill="#ffb900" d="M12 12h10v10H12z"/>
+                </svg>
+            </span>
+            Sign in with Microsoft
+        </a>
+        {% endif %}
+
+        <div class="footer-text">
+            Oliver Metadata Tool v{{ app_version | default('4.0.0') }} | Enterprise Edition
+        </div>
+    </div>
+</body>
+</html>
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -0,0 +1,95 @@
+"""Test fixtures for Oliver Metadata Tool."""
+
+import os
+import tempfile
+import shutil
+from pathlib import Path
+
+import pytest
+from fastapi.testclient import TestClient
+
+# Set test environment BEFORE importing app
+os.environ["SECRET_KEY"] = "test-secret-key-for-testing-only"
+os.environ["ENABLE_TEST_USER"] = "true"
+os.environ["DOCKER_MODE"] = "false"
+os.environ["OPENAI_API_KEY"] = ""  # No AI in tests
+
+
+@pytest.fixture(scope="session")
+def temp_dir():
+    """Create a temporary directory for test artifacts."""
+    d = tempfile.mkdtemp(prefix="oliver_test_")
+    yield d
+    shutil.rmtree(d, ignore_errors=True)
+
+
+@pytest.fixture(scope="session")
+def app(temp_dir):
+    """Create test FastAPI application."""
+    os.environ["UPLOAD_FOLDER"] = str(Path(temp_dir) / "uploads")
+    os.environ["DB_PATH"] = str(Path(temp_dir) / "test.db")
+    os.environ["SESSION_DB_PATH"] = str(Path(temp_dir) / "test_sessions.db")
+    os.environ["TEMPLATES_DIR"] = str(Path(__file__).parent.parent / "templates")
+
+    # Force settings reload
+    from app.config import get_settings
+    import app.config as config_module
+    config_module._settings = None
+
+    from app.main import create_app
+    return create_app()
+
+
+@pytest.fixture(scope="session")
+def client(app):
+    """Create test HTTP client."""
+    return TestClient(app)
+
+
+@pytest.fixture
+def auth_client(client):
+    """Authenticated test client (logged in as tester)."""
+    # Login as test user
+    response = client.post(
+        "/login",
+        data={"username": "tester", "password": "oliveradmin"},
+        follow_redirects=False,
+    )
+    assert response.status_code == 302
+    return client
+
+
+@pytest.fixture
+def sample_pdf(temp_dir):
+    """Create a minimal PDF for testing."""
+    pdf_path = Path(temp_dir) / "test.pdf"
+    # Minimal valid PDF
+    pdf_content = b"""%PDF-1.4
+1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
+2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj
+3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R>>endobj
+xref
+0 4
+0000000000 65535 f
+0000000009 00000 n
+0000000058 00000 n
+0000000115 00000 n
+trailer<</Size 4/Root 1 0 R>>
+startxref
+190
+%%EOF"""
+    pdf_path.write_bytes(pdf_content)
+    return str(pdf_path)
+
+
+@pytest.fixture
+def sample_csv(temp_dir):
+    """Create a sample CSV for import testing."""
+    csv_path = Path(temp_dir) / "metadata.csv"
+    csv_path.write_text(
+        "filename,title,subject,keywords\n"
+        "test.pdf,Test Title,Test Subject,keyword1 keyword2\n"
+        "image.jpg,Image Title,Image Subject,photo landscape\n",
+        encoding="utf-8",
+    )
+    return str(csv_path)
--- a/tests/test_admin.py
+++ b/tests/test_admin.py
@ -0,0 +1,30 @@
+"""Tests for admin endpoints."""
+
+
+class TestAdminAccess:
+    def test_admin_requires_auth(self, client):
+        """GET /admin requires authentication."""
+        client.cookies.clear()
+        response = client.get("/admin", follow_redirects=False)
+        assert response.status_code == 302
+
+    def test_admin_requires_admin_role(self, auth_client):
+        """GET /admin returns 403 for non-admin users."""
+        response = auth_client.get("/admin")
+        # tester user has role='user', should get 403
+        assert response.status_code == 403 or "detail" in response.json()
+
+    def test_admin_users_requires_admin(self, auth_client):
+        """GET /admin/users returns 403 for non-admin users."""
+        response = auth_client.get("/admin/users")
+        assert response.status_code == 403
+
+    def test_admin_audit_requires_admin(self, auth_client):
+        """GET /admin/audit returns 403 for non-admin users."""
+        response = auth_client.get("/admin/audit")
+        assert response.status_code == 403
+
+    def test_admin_ai_usage_requires_admin(self, auth_client):
+        """GET /admin/ai-usage returns 403 for non-admin users."""
+        response = auth_client.get("/admin/ai-usage")
+        assert response.status_code == 403
--- a/tests/test_auth.py
+++ b/tests/test_auth.py
@ -0,0 +1,68 @@
+"""Tests for authentication endpoints."""
+
+
+class TestLoginPage:
+    def test_login_page_renders(self, client):
+        """GET /login returns login form."""
+        response = client.get("/login")
+        assert response.status_code == 200
+        assert "login" in response.text.lower()
+
+    def test_unauthenticated_redirect(self, client):
+        """Unauthenticated access to / redirects to /login."""
+        response = client.get("/", follow_redirects=False)
+        assert response.status_code == 302
+        assert "/login" in response.headers.get("location", "")
+
+
+class TestLogin:
+    def test_login_success(self, client):
+        """POST /login with valid credentials redirects to /."""
+        response = client.post(
+            "/login",
+            data={"username": "tester", "password": "oliveradmin"},
+            follow_redirects=False,
+        )
+        assert response.status_code == 302
+        assert response.headers.get("location") == "/"
+
+    def test_login_wrong_password(self, client):
+        """POST /login with wrong password shows error."""
+        response = client.post(
+            "/login",
+            data={"username": "tester", "password": "wrongpass"},
+        )
+        assert response.status_code == 200
+        # Should show error message on the login page
+        assert "error" in response.text.lower() or "invalid" in response.text.lower() or "incorrect" in response.text.lower()
+
+    def test_login_empty_fields(self, client):
+        """POST /login with empty fields shows error."""
+        response = client.post(
+            "/login",
+            data={"username": "", "password": ""},
+        )
+        assert response.status_code == 200
+
+
+class TestLogout:
+    def test_logout_redirects(self, auth_client):
+        """GET /logout redirects to /login."""
+        response = auth_client.get("/logout", follow_redirects=False)
+        assert response.status_code == 302
+        assert "/login" in response.headers.get("location", "")
+
+
+class TestProtectedRoutes:
+    def test_index_requires_auth(self, client):
+        """/ requires authentication."""
+        # Clear any existing session
+        client.cookies.clear()
+        response = client.get("/", follow_redirects=False)
+        assert response.status_code == 302
+
+    def test_index_accessible_when_authenticated(self, auth_client):
+        """/ is accessible after login."""
+        response = auth_client.get("/")
+        assert response.status_code == 200
+        assert "Oliver Metadata Tool" in response.text
--- a/tests/test_imports.py
+++ b/tests/test_imports.py
@ -0,0 +1,36 @@
+"""Tests for import endpoints."""
+
+
+class TestImport:
+    def test_import_csv(self, auth_client, sample_csv):
+        """POST /import-metadata with CSV file returns columns and sample data."""
+        with open(sample_csv, "rb") as f:
+            response = auth_client.post(
+                "/import-metadata",
+                files={"import_file": ("metadata.csv", f, "text/csv")},
+            )
+        data = response.json()
+        assert data.get("success") is True
+        assert "columns" in data
+        assert "filename" in data["columns"]
+        assert "title" in data["columns"]
+        assert len(data["sample_data"]) > 0
+
+    def test_import_unsupported_format(self, auth_client, temp_dir):
+        """POST /import-metadata with unsupported file returns error."""
+        import io
+        response = auth_client.post(
+            "/import-metadata",
+            files={"import_file": ("data.txt", io.BytesIO(b"hello"), "text/plain")},
+        )
+        assert response.status_code == 400 or "error" in response.json()
+
+    def test_import_requires_auth(self, client):
+        """POST /import-metadata requires authentication."""
+        client.cookies.clear()
+        response = client.post(
+            "/import-metadata",
+            files={"import_file": ("data.csv", b"a,b\n1,2", "text/csv")},
+            follow_redirects=False,
+        )
+        assert response.status_code == 302
--- a/tests/test_session_store.py
+++ b/tests/test_session_store.py
@ -0,0 +1,95 @@
+"""Tests for the SQLite-backed session store."""
+
+import tempfile
+import os
+from pathlib import Path
+
+import pytest
+
+from app.session.store import SessionStore
+
+
+@pytest.fixture
+def store():
+    """Create a temporary session store."""
+    fd, path = tempfile.mkstemp(suffix=".db")
+    os.close(fd)
+    s = SessionStore(db_path=path)
+    yield s
+    os.unlink(path)
+
+
+class TestFileSession:
+    def test_create_and_get(self, store):
+        """Create and retrieve a file session."""
+        sid = store.create_file_session(user_id=1, metadata_source="manual")
+        assert sid
+        session = store.get_file_session(sid)
+        assert session is not None
+        assert session["user_id"] == 1
+        assert session["files"] == []
+
+    def test_add_file_to_session(self, store):
+        """Add files to a session."""
+        sid = store.create_file_session(user_id=1)
+        store.add_file_to_session(sid, {"filename": "test.pdf", "success": True})
+        store.add_file_to_session(sid, {"filename": "img.jpg", "success": True})
+
+        session = store.get_file_session(sid)
+        assert len(session["files"]) == 2
+        assert session["files"][0]["filename"] == "test.pdf"
+
+    def test_update_file_in_session(self, store):
+        """Update a specific file entry."""
+        sid = store.create_file_session(user_id=1)
+        store.add_file_to_session(sid, {"filename": "test.pdf", "status": "pending"})
+        store.update_file_in_session(sid, 0, {"status": "complete", "metadata": {"title": "T"}})
+
+        session = store.get_file_session(sid)
+        assert session["files"][0]["status"] == "complete"
+        assert session["files"][0]["metadata"]["title"] == "T"
+
+    def test_delete_session(self, store):
+        """Delete a file session."""
+        sid = store.create_file_session(user_id=1)
+        store.delete_file_session(sid)
+        assert store.get_file_session(sid) is None
+
+    def test_session_id_is_secure(self, store):
+        """Session IDs should be cryptographically random."""
+        ids = [store.create_file_session(user_id=1) for _ in range(5)]
+        assert len(set(ids)) == 5  # All unique
+        for sid in ids:
+            assert len(sid) > 20  # Long enough for security
+
+
+class TestImportSession:
+    def test_create_import_session(self, store):
+        """Create and retrieve an import session."""
+        sid = store.create_import_session(
+            user_id=1,
+            session_type="import",
+            file_info={"path": "/tmp/test.csv", "filename": "test.csv"},
+        )
+        session = store.get_import_session(sid)
+        assert session is not None
+        assert session["file_info"]["filename"] == "test.csv"
+
+    def test_update_import_metadata_map(self, store):
+        """Update import session with metadata map."""
+        sid = store.create_import_session(user_id=1, session_type="import")
+        metadata_map = {"test": {"title": "Test Title", "subject": "Test Subject"}}
+        store.update_import_session(sid, metadata_map=metadata_map)
+
+        session = store.get_import_session(sid)
+        assert session["metadata_map"]["test"]["title"] == "Test Title"
+
+
+class TestCleanup:
+    def test_cleanup_expired(self, store):
+        """Cleanup removes expired sessions."""
+        # Create a session with 0 hours expiry (immediately expired)
+        sid = store.create_file_session(user_id=1, expires_hours=0)
+        count = store.cleanup_expired()
+        assert count >= 1
+        assert store.get_file_session(sid) is None
--- a/tests/test_templates.py
+++ b/tests/test_templates.py
@ -0,0 +1,93 @@
+"""Tests for template management endpoints."""
+
+import json
+
+
+class TestTemplates:
+    def test_list_templates(self, auth_client):
+        """GET /templates/list returns template list."""
+        response = auth_client.get("/templates/list")
+        data = response.json()
+        assert data.get("success") is True
+        assert "templates" in data
+
+    def test_save_template(self, auth_client):
+        """POST /templates/save creates a new template."""
+        response = auth_client.post(
+            "/templates/save",
+            content=json.dumps({
+                "name": "Test Template",
+                "title": "{filename} - Test",
+                "subject": "Test subject for {filename}",
+                "keywords": "test, {year}",
+                "description": "A test template",
+            }),
+            headers={"Content-Type": "application/json"},
+        )
+        data = response.json()
+        assert data.get("success") is True
+
+    def test_load_template(self, auth_client):
+        """GET /templates/load/{name} loads a template."""
+        # First save, then load
+        auth_client.post(
+            "/templates/save",
+            content=json.dumps({
+                "name": "LoadTest",
+                "title": "{filename}",
+                "subject": "Subject",
+                "keywords": "kw",
+            }),
+            headers={"Content-Type": "application/json"},
+        )
+        response = auth_client.get("/templates/load/LoadTest")
+        data = response.json()
+        assert data.get("success") is True
+        assert data["template"]["name"] == "LoadTest"
+
+    def test_load_nonexistent_template(self, auth_client):
+        """GET /templates/load/{name} returns 404 for missing template."""
+        response = auth_client.get("/templates/load/NonExistent12345")
+        assert response.status_code == 404
+
+    def test_save_template_empty_name(self, auth_client):
+        """POST /templates/save with empty name returns error."""
+        response = auth_client.post(
+            "/templates/save",
+            content=json.dumps({"name": "", "title": "t", "subject": "s", "keywords": "k"}),
+            headers={"Content-Type": "application/json"},
+        )
+        assert response.status_code == 400
+
+    def test_delete_template(self, auth_client):
+        """DELETE /templates/delete/{name} removes a template."""
+        # Create first
+        auth_client.post(
+            "/templates/save",
+            content=json.dumps({
+                "name": "DeleteMe",
+                "title": "t",
+                "subject": "s",
+                "keywords": "k",
+            }),
+            headers={"Content-Type": "application/json"},
+        )
+        response = auth_client.delete("/templates/delete/DeleteMe")
+        data = response.json()
+        assert data.get("success") is True
+
+    def test_preview_template(self, auth_client):
+        """POST /templates/preview returns preview output."""
+        response = auth_client.post(
+            "/templates/preview",
+            content=json.dumps({
+                "title": "{filename} - Preview",
+                "subject": "Subject for {filename}",
+                "keywords": "test, {year}",
+                "sample_filename": "example.pdf",
+            }),
+            headers={"Content-Type": "application/json"},
+        )
+        data = response.json()
+        assert data.get("success") is True
+        assert "preview" in data
--- a/tests/test_upload.py
+++ b/tests/test_upload.py
@ -0,0 +1,52 @@
+"""Tests for upload endpoints."""
+
+import io
+from pathlib import Path
+
+
+class TestUpload:
+    def test_upload_no_files(self, auth_client):
+        """POST /upload with no files returns error."""
+        response = auth_client.post(
+            "/upload",
+            data={"metadata_source": "manual"},
+            files={"files": ("", b"", "application/octet-stream")},
+        )
+        assert response.status_code == 400
+
+    def test_upload_manual_source(self, auth_client, sample_pdf):
+        """POST /upload with manual source processes file."""
+        with open(sample_pdf, "rb") as f:
+            response = auth_client.post(
+                "/upload",
+                data={"metadata_source": "manual"},
+                files={"files": ("test.pdf", f, "application/pdf")},
+            )
+        data = response.json()
+        assert data.get("success") is True
+        assert "session_id" in data
+        assert len(data["files"]) == 1
+
+    def test_upload_response_no_filepath(self, auth_client, sample_pdf):
+        """API response should not expose server file paths."""
+        with open(sample_pdf, "rb") as f:
+            response = auth_client.post(
+                "/upload",
+                data={"metadata_source": "manual"},
+                files={"files": ("test.pdf", f, "application/pdf")},
+            )
+        data = response.json()
+        for file_result in data.get("files", []):
+            assert "filepath" not in file_result
+
+
+class TestUploadExcel:
+    def test_upload_excel_requires_auth(self, client):
+        """POST /upload-excel requires authentication."""
+        client.cookies.clear()
+        response = client.post(
+            "/upload-excel",
+            files={"excel_file": ("test.xlsx", b"fake", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")},
+            follow_redirects=False,
+        )
+        assert response.status_code == 302
--- a/web_app.py
+++ b/web_app.py
				`@ -0,0 +1 @@`
				`"""Content extractors for different file types."""`
				`@ -0,0 +1 @@`
				`"""Metadata updaters for different file types."""`