Commit graph

8 commits

Author SHA1 Message Date
SamoilenkoVadym
e9784d7da8 Phase 4 Complete: Authentication, Database, and Microsoft SSO
This commit implements a complete authentication system with local users,
session management, and Microsoft SSO support for enterprise environments.

New Files Created:
- src/database.py: SQLite database management with users, sessions, audit_log
- src/auth.py: Authentication module with login, SSO, and session management
- templates/login.html: Modern login page with SSO button

Database Schema:
- users table: username, password_hash, email, full_name, auth_method
- sessions table: session management with expiration
- audit_log table: user activity tracking
- Indexes for performance optimization

Authentication Features:
- Local authentication with test user (tester/oliveradmin)
- Password hashing with Werkzeug
- Session management with 24-hour expiration
- @login_required decorator for route protection
- Automatic session cleanup

Microsoft SSO Integration:
- MSAL library integration for Azure AD
- OAuth2 authorization code flow
- Microsoft Graph API user info retrieval
- Automatic user creation/update from SSO
- CSRF protection with state parameter
- Graceful fallback when SSO not configured

Security Improvements:
- All routes protected with @login_required
- Session-based authentication with database storage
- IP address and user agent logging
- Audit trail for user actions
- Secure session token generation

Configuration:
- Environment variables for Azure AD (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
- SECRET_KEY for Flask session encryption
- Optional MSAL dependency (SSO works only if configured)

Dependencies Added:
- Werkzeug>=3.0.0 for password hashing
- msal>=1.20.0 for Microsoft SSO (optional)

Test Credentials:
- Username: tester
- Password: oliveradmin

Phase 4 Status: Complete
Next Phase: Phase 5 (Modern UI Overhaul) for v3.1 release

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:57:47 +00:00
SamoilenkoVadym
f99aa118bf Phase 3 Complete: Batch Selection, CSV Export, and Metadata Templates
This commit completes Phase 3 implementation with advanced batch processing
and metadata template system.

Changes:
- Added batch file selection with checkboxes
- Implemented select all/deselect all functionality
- Updated batch processing to handle only selected files
- Added CSV export for processing results
- Created template_manager.py with variable substitution system
- Added template endpoints (list, save, load, delete, apply, preview)
- Integrated template UI with modal dialog for creation
- Template variables: {filename}, {date}, {datetime}, {user}, {year}, {month}, {day}

Phase 3 Status: Complete
Next Phase: Phase 4 (Authentication + SSO) for v3.1 release

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:52:05 +00:00
SamoilenkoVadym
61210a5e3d Phase 3.1: Field mapping foundation with auto-detection
Created comprehensive FieldMapper module (400+ lines):
- Fuzzy field matching with SequenceMatcher (60% similarity threshold)
- 10+ aliases per standard field (title, subject, keywords, description)
- Auto-mapping with confidence scores (0.0 to 1.0)
- Mapping suggestions with alternatives (top 2 per field)
- Exact match detection (score 1.0) and substring bonuses (0.85)
- Preset save/load/delete for reusable mappings
- Mapping validation (duplicate targets, coverage stats)
- Unmapped field detection and coverage percentage

FieldMapper features:
- auto_map(): Generate mapping from source fields
- suggest_mapping(): Get best match + alternatives for each field
- validate_mapping(): Check for conflicts and warnings
- apply_mapping(): Transform data using field mapping
- get_mapping_coverage(): Calculate mapping completeness
- Preset management: save, load, list, delete

MetadataImporter enhancements:
- preview_file_structure(): Preview columns and suggest mappings
- import_with_mapping(): Import with custom field mapping
- Integration with FieldMapper for smart detection
- Sample row preview (5 rows) before import

Web API additions:
- /preview-import endpoint: Preview file structure and field suggestions
- Returns: columns, sample rows, mapping suggestions with confidence
- Supports CSV, Excel, JSON format detection

Field mapping workflow:
1. User uploads import file for preview
2. System analyzes columns and suggests mappings
3. User reviews/adjusts mappings (confidence scores shown)
4. User confirms and imports with mapping
5. Optional: Save mapping as preset for reuse

Technical highlights:
- SequenceMatcher from difflib for fuzzy string matching
- Normalize field names (lowercase, underscores)
- Multiple alias sets per target field
- Confidence-based ranking of matches
- Preset persistence via JSON file

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:45:11 +00:00
SamoilenkoVadym
03079080d8 Phase 2.4: Metadata import from external files (CSV, Excel, JSON)
Created comprehensive metadata_importer.py module:
- CSV import with multiple encoding support (UTF-8, Latin1, ISO-8859-1, CP1252)
- Excel import (.xlsx, .xls) with sheet selection
- JSON import (object and array formats)
- Intelligent column detection for filename, title, subject, keywords
- Fuzzy column matching (case-insensitive, multiple aliases)
- Metadata normalization to standard format
- Import validation with statistics
- File lookup by filename stem (case-insensitive)

Web interface enhancements:
- /import-metadata endpoint for file uploads
- Import section UI (appears when Import source selected)
- Real-time import statistics display (records, title/subject/keywords counts)
- Import session management with unique session IDs
- Visual feedback (active state, success/error messages)
- Validation: requires import file before processing with import source

Import workflow:
1. User selects "Import from File" metadata source
2. Import section appears with file chooser
3. User uploads CSV/Excel/JSON with metadata
4. System validates and shows statistics
5. User uploads files to process
6. System matches files to imported metadata by filename

Supported import formats:
- CSV: filename, title, subject/description, keywords columns
- Excel: Any sheet with filename and metadata columns
- JSON: {filename: {metadata}} or [{filename, metadata}] formats

Technical features:
- Pandas DataFrame parsing for CSV/Excel
- Flexible column name detection (10+ aliases per field)
- NaN/null value handling
- List/array keyword support
- Unicode filename support

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:39:27 +00:00
SamoilenkoVadym
1bf2483f2d Phase 2.3: AI metadata generation with production-ready features
Enhanced metadata_analyzer.py with production-ready capabilities:
- Token counting with tiktoken for accurate OpenAI usage tracking
- Exponential backoff retry logic with tenacity library
- Intelligent content truncation based on token limits (not characters)
- Configurable timeout and max retries from Config
- Graceful fallback when tiktoken/tenacity unavailable
- Enhanced error reporting with _ai_error and _tokens_used metadata

Integrated AI generation in web interface:
- AI analyzer lazy initialization in web_app.py
- Real content extraction and AI analysis in upload endpoint
- Error handling for insufficient content or API failures
- Token usage logging for monitoring and optimization

UI improvements for AI experience:
- Special loading message for AI processing (10-30s per file)
- Display token usage for AI-generated metadata
- Show AI errors prominently with helpful messages
- Filter internal metadata fields (_tokens_used, _ai_error) from forms

Dependencies leveraged:
- tiktoken: Proper OpenAI token counting (10x more accurate)
- tenacity: Exponential backoff retry (3 attempts, 2-10s delays)
- openai: Production timeout support (30s default)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:36:48 +00:00
SamoilenkoVadym
ae19179752 Phase 1.4: ExifTool integration for enhanced metadata support
Added ExifTool integration to support 300+ file formats with improved
performance and unified API for metadata operations.

Changes:
- Added PyExifTool>=0.5.6 to requirements.txt
- Created comprehensive ExifTool setup guide (docs/EXIFTOOL_SETUP.md)
- Created ExifToolExtractor for reading metadata from images/video/PDF
- Created ExifToolUpdater for writing metadata to images/video/PDF
- Updated README with ExifTool installation instructions

ExifTool Benefits:
- Unified API for images, videos, PDFs (vs 5+ separate libraries)
- Support for 300+ formats (HEIC, RAW, MKV, and more)
- 10-60x faster batch operations with stay_open mode
- Better PDF metadata writing (current pypdf is read-only)
- Battle-tested tool with 20+ years of development

Architecture:
- Hybrid approach: ExifTool for images/video/PDF, Python libs for Office
- Graceful fallback if ExifTool not installed
- Automatic detection on startup with helpful messages
- Tag mapping from ExifTool tags to standard fields (title/subject/keywords)

Implementation follows existing extractor/updater patterns for consistency.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:26:01 +00:00
SamoilenkoVadym
7db62e06da Phase 1.1: Rebrand to Oliver Metadata Tool v3.0
- Updated application name to "Oliver Metadata Tool"
- Updated version to 3.0.0
- Added App Info constants to config.py (APP_NAME, APP_VERSION, APP_DESCRIPTION)
- Updated web interface (title, header, footer)
- Updated README with new branding and description
- Added AI configuration settings to config.py
- Added ExifTool check method to config.py

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:15:26 +00:00
SamoilenkoVadym
2082ea7ce7 Initial commit: Universal metadata tool with Excel-based lookup
- Added Flask web interface for batch metadata processing
- Added Excel-based metadata lookup (Celum ID mapping)
- Dual-sheet support: DSB (primary) and Medsurg (fallback)
- Unicode/hieroglyph support for CGA region (Chinese, Japanese, Korean)
- Multi-format support: PDF, images, Office docs, video
- OCR with multi-language support (Tesseract)
- Filename matching without extension (case-insensitive)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 14:23:42 +00:00