Features: - Docker mode detection via DOCKER_MODE env var - Persistent volumes for uploads, database, and output - Health checks and auto-restart - Complete docker-compose.yml configuration - Helper script (docker-run.sh) for easy management - Comprehensive DOCKER.md documentation Changes: - web_app.py: Auto-detect Docker mode, use persistent dirs - src/database.py: Auto-detect database path based on environment - Dockerfile: Multi-stage build with all dependencies (ExifTool, Tesseract, Poppler, FFmpeg) - docker-compose.yml: Production-ready configuration - docker-run.sh: Management script (build, start, stop, logs, etc.) - DOCKER.md: Complete deployment and troubleshooting guide - README.md: Added Docker quick start section - .gitignore: Added Docker-related entries Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
515 lines
13 KiB
Markdown
515 lines
13 KiB
Markdown
# Oliver Metadata Tool v3.1 Enterprise Edition
|
|
|
|
Universal metadata creation and management tool for all file types. Create, import, and manage metadata from multiple sources with an intuitive web interface, user authentication, and AI-powered metadata generation.
|
|
|
|
**Developer:** Vadym Samoilenko
|
|
**License:** Corporate License - Oliver Marketing
|
|
**Version:** 3.1 (Enterprise Edition)
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
### Multiple Metadata Sources
|
|
- **📂 File Import**: Import metadata from CSV, Excel, or JSON with smart column mapping and sheet selection
|
|
- **🤖 AI Generation**: OpenAI-powered intelligent metadata generation
|
|
- **✏️ Manual Entry**: Direct editing with real-time validation
|
|
- **📋 Templates**: Reusable metadata templates with variables
|
|
|
|
### Enterprise Features
|
|
- **🔐 Authentication**: Local user authentication + Microsoft SSO support
|
|
- **👥 User Management**: SQLite database for users and sessions
|
|
- **📊 Audit Logging**: Track all user actions and metadata changes
|
|
- **🔍 AI Usage Tracking**: Monitor OpenAI token usage and costs
|
|
|
|
### File Support
|
|
- **300+ File Formats** via ExifTool integration
|
|
- **PDF Files**: Full metadata support (title, subject, keywords, author, copyright)
|
|
- **Images**: JPEG, PNG, GIF, HEIC, TIFF, RAW formats
|
|
- **Office Documents**: Word, Excel, PowerPoint
|
|
- **Video Files**: MP4, MOV, AVI, MKV
|
|
- **Unicode Support**: Full support for Chinese, Japanese, Korean characters
|
|
|
|
### Advanced Capabilities
|
|
- **Smart Field Mapping**: Auto-detect columns with fuzzy matching
|
|
- **Batch Processing**: Process multiple files with selective updates
|
|
- **Custom Metadata Fields**: Add unlimited custom fields
|
|
- **CSV Export**: Export metadata and processing results
|
|
- **Template Variables**: {filename}, {date}, {user}, custom variables
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
### System Dependencies
|
|
- **Python 3.8+**
|
|
- **ExifTool 12.15+** (required for 300+ format support)
|
|
- **Tesseract OCR** (optional - for image text extraction)
|
|
- **Poppler** (optional - for PDF content extraction)
|
|
|
|
### Python Dependencies
|
|
All listed in `requirements.txt`:
|
|
- Flask 2.3.0+ (Web framework)
|
|
- pandas, openpyxl (Excel/CSV processing)
|
|
- PyExifTool 0.5.6+ (Metadata operations)
|
|
- openai 1.0.0+ (AI generation)
|
|
- tiktoken 0.5.0+ (Token counting)
|
|
- tenacity 8.2.0+ (Retry logic)
|
|
- msal (Microsoft SSO - optional)
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
### 1. Install System Dependencies
|
|
|
|
**macOS:**
|
|
```bash
|
|
brew install exiftool tesseract tesseract-lang poppler
|
|
```
|
|
|
|
**Linux (Ubuntu/Debian):**
|
|
```bash
|
|
sudo apt-get install libimage-exiftool-perl tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra tesseract-ocr-jpn tesseract-ocr-kor poppler-utils
|
|
```
|
|
|
|
**Windows:**
|
|
```bash
|
|
# Install ExifTool from: https://exiftool.org/
|
|
choco install exiftool tesseract
|
|
```
|
|
|
|
**Verify ExifTool Installation:**
|
|
```bash
|
|
exiftool -ver
|
|
# Should show version 12.15 or higher
|
|
```
|
|
|
|
See [docs/EXIFTOOL_SETUP.md](docs/EXIFTOOL_SETUP.md) for detailed setup instructions.
|
|
|
|
### 2. Create Virtual Environment
|
|
|
|
```bash
|
|
python3 -m venv venv_local
|
|
source venv_local/bin/activate # On Windows: venv_local\Scripts\activate
|
|
```
|
|
|
|
### 3. Install Python Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 4. Configure Environment Variables
|
|
|
|
Create a `.env` file in the project root:
|
|
|
|
```env
|
|
# Required: OpenAI API Key (for AI metadata generation)
|
|
OPENAI_API_KEY=your-openai-api-key-here
|
|
|
|
# Optional: Microsoft SSO (for enterprise authentication)
|
|
# AZURE_CLIENT_ID=your-azure-client-id
|
|
# AZURE_CLIENT_SECRET=your-azure-client-secret
|
|
# AZURE_TENANT_ID=your-azure-tenant-id
|
|
# REDIRECT_URI=http://localhost:5001/auth/callback
|
|
|
|
# Optional: Flask secret key (auto-generated if not set)
|
|
# SECRET_KEY=your-secret-key-here
|
|
|
|
# Optional: AI settings (defaults shown)
|
|
# AI_MODEL=gpt-4o-mini
|
|
# MAX_TOKENS=500
|
|
# TEMPERATURE=0.5
|
|
# API_TIMEOUT=30
|
|
# API_MAX_RETRIES=3
|
|
```
|
|
|
|
### 5. Initialize Database
|
|
|
|
The database will be created automatically on first run. To manually initialize:
|
|
|
|
```bash
|
|
python -c "from src.database import Database; db = Database(); print('Database initialized')"
|
|
```
|
|
|
|
---
|
|
|
|
## Docker Deployment (Recommended)
|
|
|
|
### Quick Start with Docker
|
|
|
|
```bash
|
|
# Build and start
|
|
docker-compose up -d
|
|
|
|
# Or use the helper script
|
|
./docker-run.sh build
|
|
./docker-run.sh start
|
|
|
|
# Access at http://localhost:5001
|
|
```
|
|
|
|
**Benefits:**
|
|
- ✅ No manual dependency installation
|
|
- ✅ Consistent environment across systems
|
|
- ✅ Persistent data storage via volumes
|
|
- ✅ Easy updates and rollbacks
|
|
- ✅ Production-ready configuration
|
|
|
|
**See [DOCKER.md](DOCKER.md) for complete Docker deployment guide.**
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
### Starting the Web Application
|
|
|
|
**Local Development:**
|
|
```bash
|
|
python web_app.py
|
|
```
|
|
|
|
**Docker:**
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
The application will:
|
|
1. ✅ Check for ExifTool availability
|
|
2. ✅ Initialize SQLite database (users, sessions, audit_log)
|
|
3. ✅ Start Flask server on http://localhost:5001
|
|
4. 🌐 Open browser automatically (local mode only)
|
|
|
|
### Login
|
|
|
|
**Test Account:**
|
|
- Username: `tester`
|
|
- Password: `oliveradmin`
|
|
|
|
**Microsoft SSO** (if configured):
|
|
- Click "Sign in with Microsoft" button
|
|
- Authenticate via Azure AD
|
|
- Users auto-created on first login
|
|
|
|
### Using Metadata Sources
|
|
|
|
#### 1. Import from File
|
|
1. Select "Import from File (CSV/Excel/JSON)" from metadata source dropdown (default)
|
|
2. Click "Choose File" and select your metadata file
|
|
3. Configure mapping modal:
|
|
- For Excel files: Select sheet name
|
|
- Map columns: Filename (required), Title, Description, Keywords
|
|
- Auto-detection suggests best matches
|
|
- Preview first 3 rows
|
|
4. Confirm mapping
|
|
5. Upload files to process - tool matches files by filename
|
|
|
|
#### 2. AI Generation
|
|
1. Select "AI Generation" from metadata source dropdown
|
|
2. Upload files
|
|
3. AI generates metadata (10-30 seconds per file)
|
|
4. Review and edit generated metadata
|
|
5. Save changes
|
|
|
|
#### 3. Manual Entry
|
|
1. Select "Manual Entry"
|
|
2. Upload files
|
|
3. Fill in metadata fields manually
|
|
4. Save changes
|
|
|
|
#### 4. Templates
|
|
1. Create template with variables
|
|
2. Select template from dropdown
|
|
3. Apply to selected files
|
|
4. Review and save
|
|
|
|
### Batch Operations
|
|
|
|
1. Upload multiple files
|
|
2. Use checkboxes to select files
|
|
3. "Select All" / "Deselect All" buttons
|
|
4. Edit metadata individually
|
|
5. Click "Update Selected Files" to save all at once
|
|
6. Export results to CSV
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Database Schema
|
|
|
|
**Users Table:**
|
|
- id, username, password_hash, email, full_name
|
|
- auth_method (local/sso)
|
|
- created_at, last_login, is_active
|
|
|
|
**Sessions Table:**
|
|
- session_id, user_id, created_at, expires_at
|
|
- ip_address, user_agent
|
|
|
|
**Audit Log Table:**
|
|
- id, user_id, action, details, timestamp
|
|
|
|
### AI Usage Tracking
|
|
|
|
Every AI metadata generation is logged with:
|
|
- User ID
|
|
- Timestamp
|
|
- Tokens used (prompt + completion)
|
|
- Cost estimate (based on gpt-4o-mini pricing)
|
|
|
|
View logs in database:
|
|
```sql
|
|
SELECT * FROM audit_log WHERE action = 'ai_generation' ORDER BY timestamp DESC;
|
|
```
|
|
|
|
### User Management
|
|
|
|
**Create New User:**
|
|
```python
|
|
from src.database import Database
|
|
db = Database()
|
|
db.create_user(
|
|
username='newuser',
|
|
password='password123',
|
|
email='user@example.com',
|
|
full_name='New User',
|
|
auth_method='local'
|
|
)
|
|
```
|
|
|
|
**List All Users:**
|
|
```python
|
|
users = db.get_all_users()
|
|
for user in users:
|
|
print(f"{user['username']} - Last login: {user['last_login']}")
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### File Structure
|
|
|
|
```
|
|
oliver-metadata-tool/
|
|
├── web_app.py # Flask web application (main entry point)
|
|
├── requirements.txt # Python dependencies
|
|
├── .env # Environment configuration
|
|
├── oliver_metadata.db # SQLite database (auto-created)
|
|
├── src/
|
|
│ ├── config.py # Configuration management
|
|
│ ├── database.py # Database operations
|
|
│ ├── auth.py # Authentication logic
|
|
│ ├── metadata_analyzer.py # AI metadata generation
|
|
│ ├── metadata_importer.py # Import from files
|
|
│ ├── template_manager.py # Template system
|
|
│ ├── field_mapper.py # Column mapping
|
|
│ ├── excel_metadata_lookup.py # Excel lookup
|
|
│ ├── extractors/
|
|
│ │ ├── pdf_extractor.py
|
|
│ │ ├── image_extractor.py
|
|
│ │ ├── office_extractor.py
|
|
│ │ ├── video_extractor.py
|
|
│ │ └── exiftool_extractor.py
|
|
│ └── updaters/
|
|
│ ├── pdf_updater.py
|
|
│ ├── image_updater.py
|
|
│ ├── office_updater.py
|
|
│ ├── video_updater.py
|
|
│ └── exiftool_updater.py
|
|
├── templates/
|
|
│ ├── index.html # Main UI
|
|
│ └── login.html # Login page
|
|
└── docs/
|
|
└── EXIFTOOL_SETUP.md # ExifTool setup guide
|
|
```
|
|
|
|
### Technology Stack
|
|
|
|
- **Backend:** Flask (Python)
|
|
- **Database:** SQLite
|
|
- **Frontend:** HTML5, CSS3, JavaScript (Vanilla)
|
|
- **Design:** Montserrat font, Dark & Gold theme
|
|
- **Authentication:** Flask-Session, werkzeug.security, MSAL
|
|
- **AI:** OpenAI API (gpt-4o-mini)
|
|
- **Metadata:** PyExifTool, pypdf, python-docx, openpyxl
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### Authentication
|
|
- `GET /login` - Login page
|
|
- `POST /login` - Authenticate user
|
|
- `GET /logout` - Destroy session
|
|
- `GET /login/microsoft` - Microsoft SSO redirect
|
|
- `GET /auth/callback` - SSO callback
|
|
|
|
### File Operations
|
|
- `POST /upload` - Upload files and generate metadata
|
|
- `POST /update-manual` - Update file metadata manually
|
|
- `GET /download/<filename>` - Download processed file
|
|
|
|
### Metadata Sources
|
|
- `POST /upload-excel` - Upload Excel file for mapping
|
|
- `POST /preview-excel-sheet` - Preview Excel sheet structure
|
|
- `POST /configure-excel-mapping` - Configure Excel column mapping
|
|
- `POST /import-metadata` - Upload import file for mapping
|
|
- `POST /configure-import-mapping` - Configure import column mapping
|
|
|
|
### Templates
|
|
- `GET /templates/list` - List all templates
|
|
- `POST /templates/save` - Save new template
|
|
- `POST /templates/load` - Load template by name
|
|
- `DELETE /templates/delete` - Delete template
|
|
- `POST /templates/apply` - Apply template to files
|
|
- `POST /templates/preview` - Preview template output
|
|
|
|
---
|
|
|
|
## Security & Privacy
|
|
|
|
### Authentication
|
|
- Passwords hashed with werkzeug.security (pbkdf2:sha256)
|
|
- Session tokens: 32-byte cryptographically secure random strings
|
|
- Sessions expire after 24 hours
|
|
- Microsoft SSO via OAuth2 + Azure AD
|
|
|
|
### Data Protection
|
|
- All credentials stored in `.env` (excluded from git)
|
|
- Database file excluded from git
|
|
- API keys never logged or exposed to frontend
|
|
- Audit trail for all user actions
|
|
|
|
### Production Recommendations
|
|
1. **HTTPS:** Use SSL/TLS certificates in production
|
|
2. **Database:** Migrate to PostgreSQL for better concurrency
|
|
3. **Rate Limiting:** Add rate limits to prevent abuse
|
|
4. **CSRF Protection:** Enable Flask-WTF for form security
|
|
5. **Error Tracking:** Integrate Sentry or similar service
|
|
6. **Backups:** Regular database backups
|
|
7. **Monitoring:** Track AI token usage for cost management
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**ExifTool not found:**
|
|
```bash
|
|
# Verify installation
|
|
exiftool -ver
|
|
|
|
# macOS: Reinstall with Homebrew
|
|
brew reinstall exiftool
|
|
|
|
# Linux: Reinstall with apt
|
|
sudo apt-get install --reinstall libimage-exiftool-perl
|
|
```
|
|
|
|
**Database locked error:**
|
|
```bash
|
|
# Stop all instances
|
|
lsof -ti:5001 | xargs kill -9
|
|
|
|
# Restart application
|
|
python web_app.py
|
|
```
|
|
|
|
**OpenAI API errors:**
|
|
- Check API key in `.env` file
|
|
- Verify API key is valid at https://platform.openai.com/api-keys
|
|
- Check token usage limits on OpenAI dashboard
|
|
|
|
**Import failed - column not found:**
|
|
- Use the mapping modal to manually select columns
|
|
- Check that your file has headers in the first row
|
|
- Verify file encoding is UTF-8
|
|
|
|
---
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Unit tests (if implemented)
|
|
pytest tests/
|
|
|
|
# Manual integration test
|
|
python -c "from src.database import Database; from src.config import Config; print('✅ All imports successful')"
|
|
```
|
|
|
|
### Git Workflow
|
|
|
|
```bash
|
|
# Check status
|
|
git status
|
|
|
|
# Add changes
|
|
git add .
|
|
|
|
# Commit with message
|
|
git commit -m "Your commit message"
|
|
|
|
# Push to remote
|
|
git push origin main
|
|
```
|
|
|
|
---
|
|
|
|
## License & Credits
|
|
|
|
**License:** Corporate License - Oliver Marketing
|
|
All rights reserved. Unauthorized copying, distribution, or modification is prohibited.
|
|
|
|
**Developer:** Vadym Samoilenko
|
|
**Company:** Oliver Marketing
|
|
**Version:** 3.1 Enterprise Edition
|
|
**Release Date:** January 2026
|
|
|
|
**Third-Party Software:**
|
|
- ExifTool by Phil Harvey (Perl Artistic License)
|
|
- Flask by Pallets (BSD License)
|
|
- OpenAI API (Commercial License)
|
|
- PyExifTool (LGPL License)
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
For issues, questions, or feature requests:
|
|
- **Internal Support:** Contact IT department
|
|
- **Developer:** Vadym Samoilenko
|
|
- **Documentation:** See `docs/` folder
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
### v3.1 (January 2026) - Enterprise Edition
|
|
- ✅ User authentication (local + Microsoft SSO)
|
|
- ✅ SQLite database with audit logging
|
|
- ✅ Unified import from file (CSV/Excel/JSON) with smart column mapping
|
|
- ✅ Excel sheet selection and preview
|
|
- ✅ Custom metadata fields support
|
|
- ✅ AI usage tracking and cost monitoring
|
|
- ✅ Dark & Gold UI redesign
|
|
- ✅ Template variables and preview
|
|
- ✅ Batch selection and CSV export
|
|
- ✅ Consolidated metadata sources (removed redundant Excel Lookup)
|
|
|
|
### v3.0 (January 2026)
|
|
- ✅ ExifTool integration (300+ formats)
|
|
- ✅ Multiple metadata sources (Import, AI, Manual)
|
|
- ✅ Field mapping with fuzzy matching
|
|
- ✅ Metadata templates system
|
|
- ✅ Rebranded to Oliver Metadata Tool
|
|
|
|
### v2.x (Prior)
|
|
- Basic Excel lookup functionality
|
|
- Multi-format file support
|
|
- Web interface
|