v3.1 Enterprise Edition: Excel/Import mapping, UI fixes, documentation update

Features: - Smart column mapping for Excel and Import files (CSV/Excel/JSON) - Modal dialogs for configuring sheet and column mappings - Auto-detection of common column names (filename, title, description, keywords) - Preview of first 3 rows before confirming mapping - Case-insensitive filename matching without extension UI Improvements: - Fixed output folder selection (now uses text input instead of folder browser) - Removed non-functional Reset button from metadata editor - Clear button for output folder path Documentation: - Updated README.md with v3.1 Enterprise Edition information - Developer: Vadym Samoilenko - License: Corporate License - Oliver Marketing - Added AI usage tracking and logging documentation - Complete installation guide with all dependencies - API endpoint documentation - Security and privacy section - Troubleshooting guide Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 17:06:18 +00:00 · 2026-01-25 17:06:18 +00:00 · 804c8acbbb
commit 804c8acbbb
parent e9784d7da8
5 changed files with 1849 additions and 221 deletions
--- a/.gitignore
+++ b/.gitignore
@ -51,6 +51,7 @@ Thumbs.db
 # Python virtual environments
 venv/
 venv_new/
+venv_local/
 env/
 ENV/
 .venv/
@ -76,3 +77,19 @@ Files/
 .vscode/
 .claude/

+# Database files
+*.db
+*.sqlite
+*.sqlite3
+
+# Server files
+server.pid
+server.log
+nohup.out
+
+# Test files
+test_*.csv
+test_*.xlsx
+test_*.json
+TEST_REPORT.md
+
--- a/README.md
+++ b/README.md
@ -1,97 +1,486 @@
-# Oliver Metadata Tool
+# Oliver Metadata Tool v3.1 Enterprise Edition

-Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface.
+Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface, user authentication, and AI-powered metadata generation.
+
+**Developer:** Vadym Samoilenko
+**License:** Corporate License - Oliver Marketing
+**Version:** 3.1 (Enterprise Edition)
+
+---

 ## Features

- **Excel-based metadata lookup**: Reads metadata from "Celum ID to Adobe Asset Path Mapping Spreadsheet"
- **Multi-format support**: PDF, images (JPG, PNG, etc.), Office documents (Word, Excel, PowerPoint), video files
- **Unicode support**: Full support for Chinese, Japanese, Korean characters (CGA region)
- **OCR capabilities**: Multi-language text extraction with Tesseract
- **Web interface**: Flask-based UI for easy batch processing
- **Dual-sheet Excel lookup**: Primary lookup from DSB sheet, fallback to Medsurg sheet
+### Multiple Metadata Sources
+- **📊 Excel Lookup**: Configure custom Excel files with column mapping
+- **🤖 AI Generation**: OpenAI-powered intelligent metadata generation
+- **✏️ Manual Entry**: Direct editing with real-time validation
+- **📂 File Import**: Import from CSV, Excel, or JSON with custom mapping
+- **📋 Templates**: Reusable metadata templates with variables
+
+### Enterprise Features
+- **🔐 Authentication**: Local user authentication + Microsoft SSO support
+- **👥 User Management**: SQLite database for users and sessions
+- **📊 Audit Logging**: Track all user actions and metadata changes
+- **🔍 AI Usage Tracking**: Monitor OpenAI token usage and costs
+
+### File Support
+- **300+ File Formats** via ExifTool integration
+- **PDF Files**: Full metadata support (title, subject, keywords, author, copyright)
+- **Images**: JPEG, PNG, GIF, HEIC, TIFF, RAW formats
+- **Office Documents**: Word, Excel, PowerPoint
+- **Video Files**: MP4, MOV, AVI, MKV
+- **Unicode Support**: Full support for Chinese, Japanese, Korean characters
+
+### Advanced Capabilities
+- **Smart Field Mapping**: Auto-detect columns with fuzzy matching
+- **Batch Processing**: Process multiple files with selective updates
+- **Custom Metadata Fields**: Add unlimited custom fields
+- **CSV Export**: Export metadata and processing results
+- **Template Variables**: {filename}, {date}, {user}, custom variables
+
+---

 ## Requirements

- Python 3.8+
- Tesseract OCR (for image text extraction)
- Poppler (for PDF processing)
- **ExifTool 12.15+** (recommended - enables 300+ file formats and improved performance)
+### System Dependencies
+- **Python 3.8+**
+- **ExifTool 12.15+** (required for 300+ format support)
+- **Tesseract OCR** (optional - for image text extraction)
+- **Poppler** (optional - for PDF content extraction)
+
+### Python Dependencies
+All listed in `requirements.txt`:
+- Flask 2.3.0+ (Web framework)
+- pandas, openpyxl (Excel/CSV processing)
+- PyExifTool 0.5.6+ (Metadata operations)
+- openai 1.0.0+ (AI generation)
+- tiktoken 0.5.0+ (Token counting)
+- tenacity 8.2.0+ (Retry logic)
+- msal (Microsoft SSO - optional)
+
+---

 ## Installation

-1. Install system dependencies:
-```bash
-# macOS
-brew install tesseract tesseract-lang poppler exiftool
+### 1. Install System Dependencies

-# Linux (Ubuntu/Debian)
-sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils libimage-exiftool-perl
+**macOS:**
+```bash
+brew install exiftool tesseract tesseract-lang poppler
 ```

-**Note:** ExifTool is optional but highly recommended. It provides:
- Support for 300+ file formats
- 10-60x faster batch operations
- Better PDF metadata writing
- See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions
-
-2. Create virtual environment and install Python packages:
+**Linux (Ubuntu/Debian):**
+```bash
+sudo apt-get install libimage-exiftool-perl tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils
+```
+
+**Windows:**
+```bash
+# Install ExifTool from: https://exiftool.org/
+choco install exiftool tesseract
+```
+
+**Verify ExifTool Installation:**
+```bash
+exiftool -ver
+# Should show version 12.15 or higher
+```
+
+See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions.
+
+### 2. Create Virtual Environment
+
+```bash
+python3 -m venv venv_local
+source venv_local/bin/activate  # On Windows: venv_local\Scripts\activate
+```
+
+### 3. Install Python Dependencies
+
 ```bash
-python3 -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
 pip install -r requirements.txt
 ```

-3. Set up environment variables (create `.env` file):
+### 4. Configure Environment Variables
+
+Create a `.env` file in the project root:
+
+```env
+# Required: OpenAI API Key (for AI metadata generation)
+OPENAI_API_KEY=your-openai-api-key-here
+
+# Optional: Microsoft SSO (for enterprise authentication)
+# AZURE_CLIENT_ID=your-azure-client-id
+# AZURE_CLIENT_SECRET=your-azure-client-secret
+# AZURE_TENANT_ID=your-azure-tenant-id
+# REDIRECT_URI=http://localhost:5001/auth/callback
+
+# Optional: Flask secret key (auto-generated if not set)
+# SECRET_KEY=your-secret-key-here
+
+# Optional: AI settings (defaults shown)
+# AI_MODEL=gpt-4o-mini
+# MAX_TOKENS=500
+# TEMPERATURE=0.5
+# API_TIMEOUT=30
+# API_MAX_RETRIES=3
 ```
-UPLOAD_FOLDER=uploads
-OUTPUT_FOLDER=output
-TESSERACT_PATH=/opt/homebrew/bin/tesseract
-OCR_LANGUAGES=eng+chi_sim+chi_tra+jpn+kor
+
+### 5. Initialize Database
+
+The database will be created automatically on first run. To manually initialize:
+
+```bash
+python -c "from src.database import Database; db = Database(); print('Database initialized')"
 ```

+---
+
 ## Usage

-### Web Interface
+### Starting the Web Application

 ```bash
 python web_app.py
 ```

-Open browser at `http://localhost:5001`
+The application will:
+1. ✅ Check for ExifTool availability
+2. ✅ Initialize SQLite database (users, sessions, audit_log)
+3. ✅ Start Flask server on http://localhost:5001
+4. 🌐 Open browser automatically

-### GUI Application
+### Login

-```bash
-python run_gui.py
+**Test Account:**
+- Username: `tester`
+- Password: `oliveradmin`
+
+**Microsoft SSO** (if configured):
+- Click "Sign in with Microsoft" button
+- Authenticate via Azure AD
+- Users auto-created on first login
+
+### Using Metadata Sources
+
+#### 1. Excel Lookup
+1. Click "Upload Excel File"
+2. Configure mapping modal:
+   - Select sheet name
+   - Map columns: Filename (required), Title, Description, Keywords
+   - Preview first 3 rows
+3. Confirm mapping
+4. Upload files to process
+
+#### 2. AI Generation
+1. Select "AI Generation" from metadata source dropdown
+2. Upload files
+3. AI generates metadata (10-30 seconds per file)
+4. Review and edit generated metadata
+5. Save changes
+
+#### 3. Manual Entry
+1. Select "Manual Entry"
+2. Upload files
+3. Fill in metadata fields manually
+4. Save changes
+
+#### 4. Import from File
+1. Click "Import from File"
+2. Upload CSV/Excel/JSON file
+3. Configure column mapping (same as Excel)
+4. Upload files to match metadata
+
+#### 5. Templates
+1. Create template with variables
+2. Select template from dropdown
+3. Apply to selected files
+4. Review and save
+
+### Batch Operations
+
+1. Upload multiple files
+2. Use checkboxes to select files
+3. "Select All" / "Deselect All" buttons
+4. Edit metadata individually
+5. Click "Update Selected Files" to save all at once
+6. Export results to CSV
+
+---
+
+## Configuration
+
+### Database Schema
+
+**Users Table:**
+- id, username, password_hash, email, full_name
+- auth_method (local/sso)
+- created_at, last_login, is_active
+
+**Sessions Table:**
+- session_id, user_id, created_at, expires_at
+- ip_address, user_agent
+
+**Audit Log Table:**
+- id, user_id, action, details, timestamp
+
+### AI Usage Tracking
+
+Every AI metadata generation is logged with:
+- User ID
+- Timestamp
+- Tokens used (prompt + completion)
+- Cost estimate (based on gpt-4o-mini pricing)
+
+View logs in database:
+```sql
+SELECT * FROM audit_log WHERE action = 'ai_generation' ORDER BY timestamp DESC;
 ```

-## Excel Data Structure
+### User Management

-The tool reads metadata from Excel file with two sheets:
+**Create New User:**
+```python
+from src.database import Database
+db = Database()
+db.create_user(
+    username='newuser',
+    password='password123',
+    email='user@example.com',
+    full_name='New User',
+    auth_method='local'
+)
+```

-### Sheet 1: DSB Celum ID to Path mapping (Primary)
- Column B: Celum ID
- Column E: Title
- Column F: External Description/Alt Text
+**List All Users:**
+```python
+users = db.get_all_users()
+for user in users:
+    print(f"{user['username']} - Last login: {user['last_login']}")
+```

-### Sheet 2: Medsurg Metadata Cheat (Fallback)
- Column: Solventum DAM Asset Path (contains filename)
- Metadata columns for Title and Description
-
-Lookup is performed by filename (without extension), case-insensitive.
+---

 ## Architecture

- `web_app.py` - Flask web application
- `run_gui.py` - GUI launcher
- `src/` - Core modules
-  - `extractors/` - Content extraction for different file types
-  - `updaters/` - Metadata update for different file types
-  - `excel_metadata_lookup.py` - Excel-based metadata lookup
-  - `main.py` - Core processing logic
-  - `config.py` - Configuration management
+### File Structure

-## License
+```
+oliver-metadata-tool/
+├── web_app.py              # Flask web application (main entry point)
+├── requirements.txt        # Python dependencies
+├── .env                    # Environment configuration
+├── oliver_metadata.db      # SQLite database (auto-created)
+├── src/
+│   ├── config.py           # Configuration management
+│   ├── database.py         # Database operations
+│   ├── auth.py             # Authentication logic
+│   ├── metadata_analyzer.py    # AI metadata generation
+│   ├── metadata_importer.py    # Import from files
+│   ├── template_manager.py     # Template system
+│   ├── field_mapper.py         # Column mapping
+│   ├── excel_metadata_lookup.py # Excel lookup
+│   ├── extractors/
+│   │   ├── pdf_extractor.py
+│   │   ├── image_extractor.py
+│   │   ├── office_extractor.py
+│   │   ├── video_extractor.py
+│   │   └── exiftool_extractor.py
+│   └── updaters/
+│       ├── pdf_updater.py
+│       ├── image_updater.py
+│       ├── office_updater.py
+│       ├── video_updater.py
+│       └── exiftool_updater.py
+├── templates/
+│   ├── index.html          # Main UI
+│   └── login.html          # Login page
+└── docs/
+    └── EXIFTOOL_SETUP.md   # ExifTool setup guide
+```

-Proprietary - Solventum
+### Technology Stack
+
+- **Backend:** Flask (Python)
+- **Database:** SQLite
+- **Frontend:** HTML5, CSS3, JavaScript (Vanilla)
+- **Design:** Montserrat font, Dark & Gold theme
+- **Authentication:** Flask-Session, werkzeug.security, MSAL
+- **AI:** OpenAI API (gpt-4o-mini)
+- **Metadata:** PyExifTool, pypdf, python-docx, openpyxl
+
+---
+
+## API Endpoints
+
+### Authentication
+- `GET /login` - Login page
+- `POST /login` - Authenticate user
+- `GET /logout` - Destroy session
+- `GET /login/microsoft` - Microsoft SSO redirect
+- `GET /auth/callback` - SSO callback
+
+### File Operations
+- `POST /upload` - Upload files and generate metadata
+- `POST /update-manual` - Update file metadata manually
+- `GET /download/<filename>` - Download processed file
+
+### Metadata Sources
+- `POST /upload-excel` - Upload Excel file for mapping
+- `POST /preview-excel-sheet` - Preview Excel sheet structure
+- `POST /configure-excel-mapping` - Configure Excel column mapping
+- `POST /import-metadata` - Upload import file for mapping
+- `POST /configure-import-mapping` - Configure import column mapping
+
+### Templates
+- `GET /templates/list` - List all templates
+- `POST /templates/save` - Save new template
+- `POST /templates/load` - Load template by name
+- `DELETE /templates/delete` - Delete template
+- `POST /templates/apply` - Apply template to files
+- `POST /templates/preview` - Preview template output
+
+---
+
+## Security & Privacy
+
+### Authentication
+- Passwords hashed with werkzeug.security (pbkdf2:sha256)
+- Session tokens: 32-byte cryptographically secure random strings
+- Sessions expire after 24 hours
+- Microsoft SSO via OAuth2 + Azure AD
+
+### Data Protection
+- All credentials stored in `.env` (excluded from git)
+- Database file excluded from git
+- API keys never logged or exposed to frontend
+- Audit trail for all user actions
+
+### Production Recommendations
+1. **HTTPS:** Use SSL/TLS certificates in production
+2. **Database:** Migrate to PostgreSQL for better concurrency
+3. **Rate Limiting:** Add rate limits to prevent abuse
+4. **CSRF Protection:** Enable Flask-WTF for form security
+5. **Error Tracking:** Integrate Sentry or similar service
+6. **Backups:** Regular database backups
+7. **Monitoring:** Track AI token usage for cost management
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**ExifTool not found:**
+```bash
+# Verify installation
+exiftool -ver
+
+# macOS: Reinstall with Homebrew
+brew reinstall exiftool
+
+# Linux: Reinstall with apt
+sudo apt-get install --reinstall libimage-exiftool-perl
+```
+
+**Database locked error:**
+```bash
+# Stop all instances
+lsof -ti:5001 | xargs kill -9
+
+# Restart application
+python web_app.py
+```
+
+**OpenAI API errors:**
+- Check API key in `.env` file
+- Verify API key is valid at https://platform.openai.com/api-keys
+- Check token usage limits on OpenAI dashboard
+
+**Import failed - column not found:**
+- Use the mapping modal to manually select columns
+- Check that your file has headers in the first row
+- Verify file encoding is UTF-8
+
+---
+
+## Development
+
+### Running Tests
+
+```bash
+# Unit tests (if implemented)
+pytest tests/
+
+# Manual integration test
+python -c "from src.database import Database; from src.config import Config; print('✅ All imports successful')"
+```
+
+### Git Workflow
+
+```bash
+# Check status
+git status
+
+# Add changes
+git add .
+
+# Commit with message
+git commit -m "Your commit message"
+
+# Push to remote
+git push origin main
+```
+
+---
+
+## License & Credits
+
+**License:** Corporate License - Oliver Marketing
+All rights reserved. Unauthorized copying, distribution, or modification is prohibited.
+
+**Developer:** Vadym Samoilenko
+**Company:** Oliver Marketing
+**Version:** 3.1 Enterprise Edition
+**Release Date:** January 2026
+
+**Third-Party Software:**
+- ExifTool by Phil Harvey (Perl Artistic License)
+- Flask by Pallets (BSD License)
+- OpenAI API (Commercial License)
+- PyExifTool (LGPL License)
+
+---
+
+## Support
+
+For issues, questions, or feature requests:
+- **Internal Support:** Contact IT department
+- **Developer:** Vadym Samoilenko
+- **Documentation:** See `docs/` folder
+
+---
+
+## Changelog
+
+### v3.1 (January 2026) - Enterprise Edition
+- ✅ User authentication (local + Microsoft SSO)
+- ✅ SQLite database with audit logging
+- ✅ Smart column mapping for Excel/CSV import
+- ✅ Custom metadata fields support
+- ✅ AI usage tracking and cost monitoring
+- ✅ Dark & Gold UI redesign
+- ✅ Template variables and preview
+- ✅ Batch selection and CSV export
+
+### v3.0 (January 2026)
+- ✅ ExifTool integration (300+ formats)
+- ✅ Multiple metadata sources (Excel, AI, Manual, Import)
+- ✅ Field mapping with fuzzy matching
+- ✅ Metadata templates system
+- ✅ Rebranded to Oliver Metadata Tool
+
+### v2.x (Prior)
+- Basic Excel lookup functionality
+- Multi-format file support
+- Web and GUI interfaces
--- a/templates/index.html
+++ b/templates/index.html
--- a/templates/login.html
+++ b/templates/login.html
@ -4,11 +4,41 @@
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Login - Oliver Metadata Tool</title>
+    <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@300;400;500;600;700&display=swap" rel="stylesheet">
    <style>
+        :root {
+            --primary-gold: #FFC407;
+            --primary-gold-dark: #e6b007;
+            --primary-gold-light: #ffcf33;
+            --dark-primary: #2c2c2c;
+            --dark-secondary: #1a1a1a;
+            --white: #ffffff;
+            --text-primary: #1f2937;
+            --text-muted: #6b7280;
+            --overlay-light: rgba(255, 255, 255, 0.95);
+            --border-light: rgba(255, 255, 255, 0.2);
+            --shadow-lg: 0 20px 40px rgba(0, 0, 0, 0.1);
+            --radius-md: 12px;
+            --radius-xl: 20px;
+            --font-family: 'Montserrat', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+            --transition-fast: 0.15s ease;
+        }
+
        * { margin: 0; padding: 0; box-sizing: border-box; }
+
+        @keyframes shimmer {
+            0% { transform: translateX(-100%); }
+            100% { transform: translateX(100%); }
+        }
+
+        @keyframes pulse {
+            0%, 100% { transform: scale(1); }
+            50% { transform: scale(1.05); }
+        }
+
        body {
-            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
-            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            font-family: var(--font-family);
+            background: linear-gradient(135deg, var(--dark-primary) 0%, var(--dark-secondary) 100%);
            min-height: 100vh;
            display: flex;
            align-items: center;
@ -17,9 +47,11 @@
        }

        .login-container {
-            background: white;
-            border-radius: 20px;
-            box-shadow: 0 20px 60px rgba(0,0,0,0.3);
+            background: var(--overlay-light);
+            backdrop-filter: blur(20px);
+            border-radius: var(--radius-xl);
+            box-shadow: var(--shadow-lg);
+            border: 1px solid var(--border-light);
            width: 100%;
            max-width: 450px;
            padding: 40px;
@ -28,17 +60,21 @@
        .logo {
            text-align: center;
            margin-bottom: 30px;
+            position: relative;
        }

        .logo h1 {
-            color: #667eea;
-            font-size: 28px;
+            color: var(--primary-gold-dark);
+            font-size: 32px;
            margin-bottom: 10px;
+            font-weight: 700;
+            text-shadow: 0 2px 4px rgba(255, 196, 7, 0.2);
        }

        .logo p {
-            color: #6c757d;
+            color: var(--text-muted);
            font-size: 14px;
+            font-weight: 500;
        }

        .divider {
@ -53,15 +89,16 @@
            left: 0;
            right: 0;
            top: 50%;
-            height: 1px;
-            background: #dee2e6;
+            height: 2px;
+            background: linear-gradient(90deg, transparent, var(--primary-gold-light), transparent);
        }

        .divider span {
-            background: white;
+            background: var(--overlay-light);
            padding: 0 15px;
-            color: #6c757d;
+            color: var(--text-muted);
            font-size: 13px;
+            font-weight: 600;
            position: relative;
            z-index: 1;
        }
@ -73,7 +110,7 @@
        .form-group label {
            display: block;
            font-weight: 600;
-            color: #495057;
+            color: var(--text-primary);
            margin-bottom: 8px;
            font-size: 14px;
        }
@ -82,25 +119,28 @@
            width: 100%;
            padding: 12px;
            border: 2px solid #dee2e6;
-            border-radius: 8px;
+            border-radius: var(--radius-md);
            font-size: 14px;
-            transition: border-color 0.3s;
+            font-family: var(--font-family);
+            transition: all var(--transition-fast);
        }

        .form-group input:focus {
            outline: none;
-            border-color: #667eea;
+            border-color: var(--primary-gold);
+            box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.1);
        }

        .btn {
            width: 100%;
            padding: 14px;
            border: none;
-            border-radius: 8px;
+            border-radius: var(--radius-md);
            font-size: 16px;
            font-weight: 600;
+            font-family: var(--font-family);
            cursor: pointer;
-            transition: transform 0.2s;
+            transition: all var(--transition-fast);
        }

        .btn:hover {
@ -108,60 +148,79 @@
        }

        .btn-primary {
-            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-            color: white;
+            background: linear-gradient(135deg, var(--primary-gold), var(--primary-gold-dark));
+            color: var(--dark-secondary);
            margin-bottom: 15px;
+            box-shadow: 0 4px 12px rgba(255, 196, 7, 0.3);
+        }
+
+        .btn-primary:hover {
+            box-shadow: 0 6px 16px rgba(255, 196, 7, 0.4);
        }

        .btn-sso {
-            background: white;
-            color: #495057;
-            border: 2px solid #dee2e6;
+            background: var(--white);
+            color: var(--text-primary);
+            border: 2px solid var(--primary-gold);
        }

        .btn-sso:hover {
-            border-color: #667eea;
-            color: #667eea;
+            border-color: var(--primary-gold-dark);
+            background: #fffbf0;
+            color: var(--primary-gold-dark);
        }

        .alert {
            padding: 12px;
-            border-radius: 8px;
+            border-radius: var(--radius-md);
            margin-bottom: 20px;
            font-size: 14px;
+            font-weight: 500;
        }

        .alert-error {
-            background: #f8d7da;
-            color: #721c24;
-            border: 1px solid #f5c6cb;
+            background: #fee;
+            color: #c33;
+            border: 2px solid #fcc;
        }

        .alert-info {
-            background: #d1ecf1;
-            color: #0c5460;
-            border: 1px solid #bee5eb;
+            background: #fffbf0;
+            color: var(--primary-gold-dark);
+            border: 2px solid var(--primary-gold-light);
        }

        .test-user-info {
-            background: #f8f9ff;
-            border: 2px dashed #667eea;
-            border-radius: 8px;
+            background: #fffbf0;
+            border: 2px dashed var(--primary-gold);
+            border-radius: var(--radius-md);
            padding: 15px;
            margin-bottom: 20px;
            font-size: 13px;
-            color: #495057;
+            color: var(--text-primary);
+            animation: pulse 3s infinite;
        }

        .test-user-info strong {
-            color: #667eea;
+            color: var(--primary-gold-dark);
+            font-weight: 600;
+        }
+
+        .test-user-info code {
+            background: rgba(255, 196, 7, 0.15);
+            padding: 2px 6px;
+            border-radius: 4px;
+            font-family: 'Courier New', monospace;
+            color: var(--primary-gold-dark);
+            font-weight: 600;
        }

        .footer-text {
            text-align: center;
            margin-top: 20px;
            font-size: 12px;
-            color: #6c757d;
+            color: var(--text-muted);
+            font-weight: 500;
        }

        .microsoft-icon {
--- a/web_app.py
+++ b/web_app.py
@ -6,7 +6,7 @@ Flask-based web app for local or server deployment.
 Supports multiple metadata sources: Excel, AI, manual entry, and file import.
 """

-from flask import Flask, render_template, request, jsonify, send_file
+from flask import Flask, render_template, request, jsonify, send_file, session, redirect, url_for
 from werkzeug.utils import secure_filename  # noqa: F401 - kept as fallback
 from pathlib import Path
 import os
@ -259,7 +259,19 @@ def upload_file():
    }

    # Get metadata lookup (only if using Excel source)
-    lookup = get_metadata_lookup() if metadata_source == 'excel' else None
+    excel_session_id = request.form.get('excel_session_id')
+    lookup = None
+
+    if metadata_source == 'excel':
+        if excel_session_id and excel_session_id in imported_metadata:
+            # Use uploaded Excel file
+            lookup = imported_metadata[excel_session_id]
+        else:
+            # Try default Excel file if available
+            try:
+                lookup = get_metadata_lookup()
+            except:
+                return jsonify({'error': 'Please upload an Excel file first using the Upload Excel File button'}), 400

    # Get imported metadata (only if using import source)
    import_map = None
@ -504,9 +516,22 @@ def update_manual_metadata():
    custom_metadata = {
        'title': data.get('title', '').strip()[:200],
        'subject': data.get('subject', '').strip()[:300],
-        'keywords': data.get('keywords', '').strip()[:500]
+        'keywords': data.get('keywords', '').strip()[:500],
+        'author': data.get('author', '').strip()[:100],
+        'copyright': data.get('copyright', '').strip()[:150],
+        'comments': data.get('comments', '').strip()[:500]
    }

+    # Add custom fields if provided
+    custom_fields = data.get('custom_fields', {})
+    if custom_fields and isinstance(custom_fields, dict):
+        for field_name, field_value in custom_fields.items():
+            # Sanitize custom field names and values
+            safe_name = str(field_name).strip()[:50]
+            safe_value = str(field_value).strip()[:200]
+            if safe_name and safe_value:
+                custom_metadata[safe_name] = safe_value
+
    # Validate session
    if not session_id or session_id not in sessions:
        return jsonify({'error': 'Invalid or expired session'}), 400
@ -566,10 +591,178 @@ def download_file(filename):
        return send_file(filepath, as_attachment=True)
    return jsonify({'error': 'File not found'}), 404

+@app.route('/upload-excel', methods=['POST'])
+@login_required
+def upload_excel():
+    """Upload Excel file for Excel Lookup metadata source."""
+    if 'excel_file' not in request.files:
+        return jsonify({'error': 'No file provided'}), 400
+
+    file = request.files['excel_file']
+    if file.filename == '':
+        return jsonify({'error': 'No file selected'}), 400
+
+    try:
+        import pandas as pd
+
+        # Save temp file
+        excel_filename = safe_filename(file.filename)
+        temp_path = Path(app.config['UPLOAD_FOLDER']) / excel_filename
+        file.save(str(temp_path))
+
+        # Preview Excel structure instead of loading directly
+        excel_file = pd.ExcelFile(str(temp_path))
+        sheet_names = excel_file.sheet_names
+
+        # Get columns and sample data from first sheet
+        preview_data = {}
+        for sheet_name in sheet_names[:5]:  # Limit to first 5 sheets
+            df = pd.read_excel(excel_file, sheet_name=sheet_name, nrows=5)
+            preview_data[sheet_name] = {
+                'columns': df.columns.tolist(),
+                'sample_data': df.head(3).fillna('').to_dict('records')
+            }
+
+        # Store file path temporarily for later configuration
+        excel_session_id = f"excel_{secrets.token_urlsafe(8)}"
+        if 'excel_files' not in imported_metadata:
+            imported_metadata['excel_files'] = {}
+        imported_metadata['excel_files'][excel_session_id] = {
+            'path': str(temp_path),
+            'filename': excel_filename,
+            'sheet_names': sheet_names
+        }
+
+        return jsonify({
+            'success': True,
+            'excel_session_id': excel_session_id,
+            'filename': excel_filename,
+            'sheets': sheet_names,
+            'preview': preview_data,
+            'message': f'Excel file uploaded. Please configure column mapping.'
+        })
+
+    except Exception as e:
+        import logging
+        logging.getLogger(__name__).error(f"Excel upload failed: {e}")
+        return jsonify({'error': f'Excel upload failed: {str(e)}'}), 500
+
+@app.route('/preview-excel-sheet', methods=['POST'])
+@login_required
+def preview_excel_sheet():
+    """Preview a specific sheet from uploaded Excel file."""
+    try:
+        import pandas as pd
+
+        data = request.json
+        excel_session_id = data.get('excel_session_id')
+        sheet_name = data.get('sheet_name')
+
+        if not excel_session_id or excel_session_id not in imported_metadata.get('excel_files', {}):
+            return jsonify({'error': 'Invalid session ID'}), 400
+
+        excel_info = imported_metadata['excel_files'][excel_session_id]
+        excel_path = excel_info['path']
+
+        # Read the specific sheet
+        df = pd.read_excel(excel_path, sheet_name=sheet_name, nrows=10)
+
+        return jsonify({
+            'success': True,
+            'columns': df.columns.tolist(),
+            'sample_data': df.head(5).fillna('').to_dict('records')
+        })
+
+    except Exception as e:
+        import logging
+        logging.getLogger(__name__).error(f"Sheet preview failed: {e}")
+        return jsonify({'error': f'Sheet preview failed: {str(e)}'}), 500
+
+@app.route('/configure-excel-mapping', methods=['POST'])
+@login_required
+def configure_excel_mapping():
+    """Configure Excel column mapping and load metadata."""
+    try:
+        import pandas as pd
+
+        data = request.json
+        excel_session_id = data.get('excel_session_id')
+        sheet_name = data.get('sheet_name')
+        column_mapping = data.get('column_mapping', {})  # {filename: 'col', title: 'col', ...}
+
+        if not excel_session_id or excel_session_id not in imported_metadata.get('excel_files', {}):
+            return jsonify({'error': 'Invalid session ID'}), 400
+
+        excel_info = imported_metadata['excel_files'][excel_session_id]
+        excel_path = excel_info['path']
+
+        # Read the configured sheet
+        df = pd.read_excel(excel_path, sheet_name=sheet_name)
+
+        # Build metadata map using configured columns
+        metadata_map = {}
+        filename_col = column_mapping.get('filename')
+        title_col = column_mapping.get('title')
+        description_col = column_mapping.get('description')
+        keywords_col = column_mapping.get('keywords')
+
+        if not filename_col:
+            return jsonify({'error': 'Filename column is required'}), 400
+
+        for _, row in df.iterrows():
+            filename = row.get(filename_col)
+            if pd.notna(filename) and str(filename).strip():
+                # Get filename without extension for indexing (case-insensitive)
+                filename_stem = Path(str(filename).strip()).stem.lower()
+
+                metadata = {
+                    'title': str(row.get(title_col, '')).strip() if title_col and pd.notna(row.get(title_col)) else '',
+                    'description': str(row.get(description_col, '')).strip() if description_col and pd.notna(row.get(description_col)) else '',
+                    'keywords': str(row.get(keywords_col, '')).strip() if keywords_col and pd.notna(row.get(keywords_col)) else '',
+                    'original_filename': str(filename).strip()
+                }
+
+                metadata_map[filename_stem] = metadata
+
+        # Create a simple lookup object
+        class ConfiguredExcelLookup:
+            def __init__(self, metadata_map):
+                self.metadata_map = metadata_map
+                self.filename_to_metadata = metadata_map
+
+            def lookup_by_filename(self, filename: str):
+                filename_stem = Path(filename).stem.lower()
+                return self.metadata_map.get(filename_stem)
+
+        lookup = ConfiguredExcelLookup(metadata_map)
+
+        # Store configured lookup
+        imported_metadata[excel_session_id] = lookup
+
+        # Get stats
+        stats = {
+            'total_records': len(metadata_map),
+            'with_title': sum(1 for v in metadata_map.values() if v.get('title')),
+            'with_description': sum(1 for v in metadata_map.values() if v.get('description')),
+            'with_keywords': sum(1 for v in metadata_map.values() if v.get('keywords'))
+        }
+
+        return jsonify({
+            'success': True,
+            'excel_session_id': excel_session_id,
+            'stats': stats,
+            'message': f'Configured mapping for {stats["total_records"]} records from sheet "{sheet_name}"'
+        })
+
+    except Exception as e:
+        import logging
+        logging.getLogger(__name__).error(f"Excel configuration failed: {e}")
+        return jsonify({'error': f'Excel configuration failed: {str(e)}'}), 500
+
@app.route('/import-metadata', methods=['POST'])
@login_required
 def import_metadata():
-    """Import metadata from external file (CSV, Excel, JSON)."""
+    """Upload import file and preview structure for mapping."""
    if 'import_file' not in request.files:
        return jsonify({'error': 'No file provided'}), 400

@ -578,45 +771,142 @@ def import_metadata():
        return jsonify({'error': 'No file selected'}), 400

    try:
+        import pandas as pd
+
        # Save temp file
        import_filename = safe_filename(file.filename)
        temp_path = Path(app.config['UPLOAD_FOLDER']) / import_filename
        file.save(str(temp_path))

-        # Import based on file type
-        importer = MetadataImporter()
        file_ext = temp_path.suffix.lower()

+        # Read file and get structure
        if file_ext == '.csv':
-            metadata_map = importer.import_from_csv(str(temp_path))
+            df = pd.read_csv(str(temp_path), nrows=5, encoding='utf-8')
        elif file_ext in ['.xlsx', '.xls']:
-            metadata_map = importer.import_from_excel(str(temp_path))
+            df = pd.read_excel(str(temp_path), nrows=5)
        elif file_ext == '.json':
-            metadata_map = importer.import_from_json(str(temp_path))
+            import json
+            with open(str(temp_path), 'r', encoding='utf-8') as f:
+                data = json.load(f)
+                # Convert to DataFrame
+                if isinstance(data, list):
+                    df = pd.DataFrame(data[:5])
+                elif isinstance(data, dict):
+                    df = pd.DataFrame([data])
+                else:
+                    return jsonify({'error': 'Invalid JSON format'}), 400
        else:
-            return jsonify({'error': f'Unsupported file format: {file_ext}. Supported: .csv, .xlsx, .xls, .json'}), 400
+            return jsonify({'error': f'Unsupported file format: {file_ext}'}), 400

-        # Validate import
-        stats = importer.validate_import(metadata_map)
+        columns = df.columns.tolist()
+        sample_data = df.fillna('').to_dict('records')

-        # Store in global dict with unique session ID
-        import_session_id = f"import_{len(imported_metadata) + 1}"
+        # Store file path for later configuration
+        import_session_id = f"import_{secrets.token_urlsafe(8)}"
+        if 'import_files' not in imported_metadata:
+            imported_metadata['import_files'] = {}
+        imported_metadata['import_files'][import_session_id] = {
+            'path': str(temp_path),
+            'filename': import_filename,
+            'file_type': file_ext
+        }
+
+        return jsonify({
+            'success': True,
+            'import_session_id': import_session_id,
+            'filename': import_filename,
+            'columns': columns,
+            'sample_data': sample_data,
+            'message': f'Import file uploaded. Please configure column mapping.'
+        })
+
+    except Exception as e:
+        import logging
+        logging.getLogger(__name__).error(f"Import upload failed: {e}")
+        return jsonify({'error': f'Import upload failed: {str(e)}'}), 500
+
+@app.route('/configure-import-mapping', methods=['POST'])
+@login_required
+def configure_import_mapping():
+    """Configure import column mapping and load metadata."""
+    try:
+        import pandas as pd
+        import json
+
+        data = request.json
+        import_session_id = data.get('import_session_id')
+        column_mapping = data.get('column_mapping', {})
+
+        if not import_session_id or import_session_id not in imported_metadata.get('import_files', {}):
+            return jsonify({'error': 'Invalid session ID'}), 400
+
+        import_info = imported_metadata['import_files'][import_session_id]
+        import_path = import_info['path']
+        file_ext = import_info['file_type']
+
+        # Read the full file
+        if file_ext == '.csv':
+            df = pd.read_csv(import_path, encoding='utf-8')
+        elif file_ext in ['.xlsx', '.xls']:
+            df = pd.read_excel(import_path)
+        elif file_ext == '.json':
+            with open(import_path, 'r', encoding='utf-8') as f:
+                json_data = json.load(f)
+                if isinstance(json_data, list):
+                    df = pd.DataFrame(json_data)
+                else:
+                    df = pd.DataFrame([json_data])
+
+        # Build metadata map using configured columns
+        metadata_map = {}
+        filename_col = column_mapping.get('filename')
+        title_col = column_mapping.get('title')
+        subject_col = column_mapping.get('subject')
+        keywords_col = column_mapping.get('keywords')
+
+        if not filename_col:
+            return jsonify({'error': 'Filename column is required'}), 400
+
+        for _, row in df.iterrows():
+            filename = row.get(filename_col)
+            if pd.notna(filename) and str(filename).strip():
+                filename_stem = Path(str(filename).strip()).stem.lower()
+
+                metadata = {
+                    'title': str(row.get(title_col, '')).strip() if title_col and pd.notna(row.get(title_col)) else '',
+                    'subject': str(row.get(subject_col, '')).strip() if subject_col and pd.notna(row.get(subject_col)) else '',
+                    'keywords': str(row.get(keywords_col, '')).strip() if keywords_col and pd.notna(row.get(keywords_col)) else '',
+                    'original_filename': str(filename).strip()
+                }
+
+                metadata_map[filename_stem] = metadata
+
+        # Store configured metadata map
        imported_metadata[import_session_id] = metadata_map

        # Clean up temp file
-        temp_path.unlink()
+        Path(import_path).unlink(missing_ok=True)
+
+        # Get stats
+        stats = {
+            'total_records': len(metadata_map),
+            'with_title': sum(1 for v in metadata_map.values() if v.get('title')),
+            'with_subject': sum(1 for v in metadata_map.values() if v.get('subject')),
+            'with_keywords': sum(1 for v in metadata_map.values() if v.get('keywords'))
+        }

        return jsonify({
            'success': True,
            'import_session_id': import_session_id,
            'stats': stats,
-            'message': f'Imported {stats["total_records"]} metadata records from {import_filename}'
+            'message': f'Configured mapping for {stats["total_records"]} records'
        })

    except Exception as e:
        import logging
-        logging.getLogger(__name__).error(f"Import failed: {e}")
-        return jsonify({'error': f'Import failed: {str(e)}'}), 500
+        logging.getLogger(__name__).error(f"Import configuration failed: {e}")
+        return jsonify({'error': f'Import configuration failed: {str(e)}'}), 500

@app.route('/preview-import', methods=['POST'])
@login_required