# Oliver Metadata Tool v3.1 Enterprise Edition Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface, user authentication, and AI-powered metadata generation. **Developer:** Vadym Samoilenko **License:** Corporate License - Oliver Marketing **Version:** 3.1 (Enterprise Edition) --- ## Features ### Multiple Metadata Sources - **📂 File Import**: Import metadata from CSV, Excel, or JSON with smart column mapping and sheet selection - **🤖 AI Generation**: OpenAI-powered intelligent metadata generation - **✏️ Manual Entry**: Direct editing with real-time validation - **📋 Templates**: Reusable metadata templates with variables ### Enterprise Features - **🔐 Authentication**: Local user authentication + Microsoft SSO support - **👥 User Management**: SQLite database for users and sessions - **📊 Audit Logging**: Track all user actions and metadata changes - **🔍 AI Usage Tracking**: Monitor OpenAI token usage and costs ### File Support - **300+ File Formats** via ExifTool integration - **PDF Files**: Full metadata support (title, subject, keywords, author, copyright) - **Images**: JPEG, PNG, GIF, HEIC, TIFF, RAW formats - **Office Documents**: Word, Excel, PowerPoint - **Video Files**: MP4, MOV, AVI, MKV - **Unicode Support**: Full support for Chinese, Japanese, Korean characters ### Advanced Capabilities - **Smart Field Mapping**: Auto-detect columns with fuzzy matching - **Batch Processing**: Process multiple files with selective updates - **Custom Metadata Fields**: Add unlimited custom fields - **CSV Export**: Export metadata and processing results - **Template Variables**: {filename}, {date}, {user}, custom variables --- ## Requirements ### System Dependencies - **Python 3.8+** - **ExifTool 12.15+** (required for 300+ format support) - **Tesseract OCR** (optional - for image text extraction) - **Poppler** (optional - for PDF content extraction) ### Python Dependencies All listed in `requirements.txt`: - Flask 2.3.0+ (Web framework) - pandas, openpyxl (Excel/CSV processing) - PyExifTool 0.5.6+ (Metadata operations) - openai 1.0.0+ (AI generation) - tiktoken 0.5.0+ (Token counting) - tenacity 8.2.0+ (Retry logic) - msal (Microsoft SSO - optional) --- ## Installation ### 1. Install System Dependencies **macOS:** ```bash brew install exiftool tesseract tesseract-lang poppler ``` **Linux (Ubuntu/Debian):** ```bash sudo apt-get install libimage-exiftool-perl tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils ``` **Windows:** ```bash # Install ExifTool from: https://exiftool.org/ choco install exiftool tesseract ``` **Verify ExifTool Installation:** ```bash exiftool -ver # Should show version 12.15 or higher ``` See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions. ### 2. Create Virtual Environment ```bash python3 -m venv venv_local source venv_local/bin/activate # On Windows: venv_local\Scripts\activate ``` ### 3. Install Python Dependencies ```bash pip install -r requirements.txt ``` ### 4. Configure Environment Variables Create a `.env` file in the project root: ```env # Required: OpenAI API Key (for AI metadata generation) OPENAI_API_KEY=your-openai-api-key-here # Optional: Microsoft SSO (for enterprise authentication) # AZURE_CLIENT_ID=your-azure-client-id # AZURE_CLIENT_SECRET=your-azure-client-secret # AZURE_TENANT_ID=your-azure-tenant-id # REDIRECT_URI=http://localhost:5001/auth/callback # Optional: Flask secret key (auto-generated if not set) # SECRET_KEY=your-secret-key-here # Optional: AI settings (defaults shown) # AI_MODEL=gpt-4o-mini # MAX_TOKENS=500 # TEMPERATURE=0.5 # API_TIMEOUT=30 # API_MAX_RETRIES=3 ``` ### 5. Initialize Database The database will be created automatically on first run. To manually initialize: ```bash python -c "from src.database import Database; db = Database(); print('Database initialized')" ``` --- ## Docker Deployment (Recommended) ### Quick Start with Docker ```bash # Build and start docker-compose up -d # Or use the helper script ./docker-run.sh build ./docker-run.sh start # Access at http://localhost:5001 ``` **Benefits:** - ✅ No manual dependency installation - ✅ Consistent environment across systems - ✅ Persistent data storage via volumes - ✅ Easy updates and rollbacks - ✅ Production-ready configuration **See [DOCKER.md](DOCKER.md) for complete Docker deployment guide.** --- ## Usage ### Starting the Web Application **Local Development:** ```bash python web_app.py ``` **Docker:** ```bash docker-compose up -d ``` The application will: 1. ✅ Check for ExifTool availability 2. ✅ Initialize SQLite database (users, sessions, audit_log) 3. ✅ Start Flask server on http://localhost:5001 4. 🌐 Open browser automatically (local mode only) ### Login **Test Account:** - Username: `tester` - Password: `oliveradmin` **Microsoft SSO** (if configured): - Click "Sign in with Microsoft" button - Authenticate via Azure AD - Users auto-created on first login ### Using Metadata Sources #### 1. Import from File 1. Select "Import from File (CSV/Excel/JSON)" from metadata source dropdown (default) 2. Click "Choose File" and select your metadata file 3. Configure mapping modal: - For Excel files: Select sheet name - Map columns: Filename (required), Title, Description, Keywords - Auto-detection suggests best matches - Preview first 3 rows 4. Confirm mapping 5. Upload files to process - tool matches files by filename #### 2. AI Generation 1. Select "AI Generation" from metadata source dropdown 2. Upload files 3. AI generates metadata (10-30 seconds per file) 4. Review and edit generated metadata 5. Save changes #### 3. Manual Entry 1. Select "Manual Entry" 2. Upload files 3. Fill in metadata fields manually 4. Save changes #### 4. Templates 1. Create template with variables 2. Select template from dropdown 3. Apply to selected files 4. Review and save ### Batch Operations 1. Upload multiple files 2. Use checkboxes to select files 3. "Select All" / "Deselect All" buttons 4. Edit metadata individually 5. Click "Update Selected Files" to save all at once 6. Export results to CSV --- ## Configuration ### Database Schema **Users Table:** - id, username, password_hash, email, full_name - auth_method (local/sso) - created_at, last_login, is_active **Sessions Table:** - session_id, user_id, created_at, expires_at - ip_address, user_agent **Audit Log Table:** - id, user_id, action, details, timestamp ### AI Usage Tracking Every AI metadata generation is logged with: - User ID - Timestamp - Tokens used (prompt + completion) - Cost estimate (based on gpt-4o-mini pricing) View logs in database: ```sql SELECT * FROM audit_log WHERE action = 'ai_generation' ORDER BY timestamp DESC; ``` ### User Management **Create New User:** ```python from src.database import Database db = Database() db.create_user( username='newuser', password='password123', email='user@example.com', full_name='New User', auth_method='local' ) ``` **List All Users:** ```python users = db.get_all_users() for user in users: print(f"{user['username']} - Last login: {user['last_login']}") ``` --- ## Architecture ### File Structure ``` oliver-metadata-tool/ ├── web_app.py # Flask web application (main entry point) ├── requirements.txt # Python dependencies ├── .env # Environment configuration ├── oliver_metadata.db # SQLite database (auto-created) ├── src/ │ ├── config.py # Configuration management │ ├── database.py # Database operations │ ├── auth.py # Authentication logic │ ├── metadata_analyzer.py # AI metadata generation │ ├── metadata_importer.py # Import from files │ ├── template_manager.py # Template system │ ├── field_mapper.py # Column mapping │ ├── excel_metadata_lookup.py # Excel lookup │ ├── extractors/ │ │ ├── pdf_extractor.py │ │ ├── image_extractor.py │ │ ├── office_extractor.py │ │ ├── video_extractor.py │ │ └── exiftool_extractor.py │ └── updaters/ │ ├── pdf_updater.py │ ├── image_updater.py │ ├── office_updater.py │ ├── video_updater.py │ └── exiftool_updater.py ├── templates/ │ ├── index.html # Main UI │ └── login.html # Login page └── docs/ └── EXIFTOOL_SETUP.md # ExifTool setup guide ``` ### Technology Stack - **Backend:** Flask (Python) - **Database:** SQLite - **Frontend:** HTML5, CSS3, JavaScript (Vanilla) - **Design:** Montserrat font, Dark & Gold theme - **Authentication:** Flask-Session, werkzeug.security, MSAL - **AI:** OpenAI API (gpt-4o-mini) - **Metadata:** PyExifTool, pypdf, python-docx, openpyxl --- ## API Endpoints ### Authentication - `GET /login` - Login page - `POST /login` - Authenticate user - `GET /logout` - Destroy session - `GET /login/microsoft` - Microsoft SSO redirect - `GET /auth/callback` - SSO callback ### File Operations - `POST /upload` - Upload files and generate metadata - `POST /update-manual` - Update file metadata manually - `GET /download/` - Download processed file ### Metadata Sources - `POST /upload-excel` - Upload Excel file for mapping - `POST /preview-excel-sheet` - Preview Excel sheet structure - `POST /configure-excel-mapping` - Configure Excel column mapping - `POST /import-metadata` - Upload import file for mapping - `POST /configure-import-mapping` - Configure import column mapping ### Templates - `GET /templates/list` - List all templates - `POST /templates/save` - Save new template - `POST /templates/load` - Load template by name - `DELETE /templates/delete` - Delete template - `POST /templates/apply` - Apply template to files - `POST /templates/preview` - Preview template output --- ## Security & Privacy ### Authentication - Passwords hashed with werkzeug.security (pbkdf2:sha256) - Session tokens: 32-byte cryptographically secure random strings - Sessions expire after 24 hours - Microsoft SSO via OAuth2 + Azure AD ### Data Protection - All credentials stored in `.env` (excluded from git) - Database file excluded from git - API keys never logged or exposed to frontend - Audit trail for all user actions ### Production Recommendations 1. **HTTPS:** Use SSL/TLS certificates in production 2. **Database:** Migrate to PostgreSQL for better concurrency 3. **Rate Limiting:** Add rate limits to prevent abuse 4. **CSRF Protection:** Enable Flask-WTF for form security 5. **Error Tracking:** Integrate Sentry or similar service 6. **Backups:** Regular database backups 7. **Monitoring:** Track AI token usage for cost management --- ## Troubleshooting ### Common Issues **ExifTool not found:** ```bash # Verify installation exiftool -ver # macOS: Reinstall with Homebrew brew reinstall exiftool # Linux: Reinstall with apt sudo apt-get install --reinstall libimage-exiftool-perl ``` **Database locked error:** ```bash # Stop all instances lsof -ti:5001 | xargs kill -9 # Restart application python web_app.py ``` **OpenAI API errors:** - Check API key in `.env` file - Verify API key is valid at https://platform.openai.com/api-keys - Check token usage limits on OpenAI dashboard **Import failed - column not found:** - Use the mapping modal to manually select columns - Check that your file has headers in the first row - Verify file encoding is UTF-8 --- ## Development ### Running Tests ```bash # Unit tests (if implemented) pytest tests/ # Manual integration test python -c "from src.database import Database; from src.config import Config; print('✅ All imports successful')" ``` ### Git Workflow ```bash # Check status git status # Add changes git add . # Commit with message git commit -m "Your commit message" # Push to remote git push origin main ``` --- ## License & Credits **License:** Corporate License - Oliver Marketing All rights reserved. Unauthorized copying, distribution, or modification is prohibited. **Developer:** Vadym Samoilenko **Company:** Oliver Marketing **Version:** 3.1 Enterprise Edition **Release Date:** January 2026 **Third-Party Software:** - ExifTool by Phil Harvey (Perl Artistic License) - Flask by Pallets (BSD License) - OpenAI API (Commercial License) - PyExifTool (LGPL License) --- ## Support For issues, questions, or feature requests: - **Internal Support:** Contact IT department - **Developer:** Vadym Samoilenko - **Documentation:** See `docs/` folder --- ## Changelog ### v3.1 (January 2026) - Enterprise Edition - ✅ User authentication (local + Microsoft SSO) - ✅ SQLite database with audit logging - ✅ Unified import from file (CSV/Excel/JSON) with smart column mapping - ✅ Excel sheet selection and preview - ✅ Custom metadata fields support - ✅ AI usage tracking and cost monitoring - ✅ Dark & Gold UI redesign - ✅ Template variables and preview - ✅ Batch selection and CSV export - ✅ Consolidated metadata sources (removed redundant Excel Lookup) ### v3.0 (January 2026) - ✅ ExifTool integration (300+ formats) - ✅ Multiple metadata sources (Import, AI, Manual) - ✅ Field mapping with fuzzy matching - ✅ Metadata templates system - ✅ Rebranded to Oliver Metadata Tool ### v2.x (Prior) - Basic Excel lookup functionality - Multi-format file support - Web interface