Fixed three critical issues:
1. Session persistence - Cookies not saved after page refresh
- Replaced APPLICATION_ROOT with SESSION_COOKIE_PATH
- Added proper cookie settings for reverse proxy (HttpOnly, SameSite)
- Set correct cookie path matching URL_PREFIX
2. AJAX detection for FormData uploads (JPG, etc.)
- Enhanced @login_required to detect POST/PUT/DELETE as AJAX
- Added Content-Type check for JSON requests
- Added path prefix check for API endpoints
3. JavaScript AJAX identification
- Updated fetchWithAuth() to add X-Requested-With header
- Properly handles both JSON and FormData requests using Headers API
- Ensures all fetch calls are identified as AJAX by server
Changes:
- web_app.py: Fixed Flask session cookie configuration
- src/auth.py: Improved AJAX detection logic in login_required decorator
- templates/index.html: Enhanced fetchWithAuth() with proper headers
This fixes:
- Users having to re-login on every page refresh
- "Unexpected token '<'" errors when uploading JPG files
- Session cookies not persisting through reverse proxy
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Fixed two critical issues with API authentication on production server:
1. Modified @login_required decorator to detect AJAX/API requests and return
JSON error with 401 status instead of HTML redirect. This prevents
"Unexpected token '<'" errors when session expires.
2. Created fetchWithAuth() helper function in JavaScript that automatically
handles 401 responses by redirecting to login page. Updated all 11 API
fetch calls to use this wrapper.
Changes:
- src/auth.py: Added AJAX detection and JSON error responses to login_required
- templates/index.html: Added fetchWithAuth() and updated all fetch() calls
This fixes the console errors:
- "Failed to load templates: SyntaxError: Unexpected token '<'"
- 502 Bad Gateway errors now properly handled with session checks
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
- Use initiate_auth_code_flow for PKCE (required by Azure AD for public clients)
- Store auth flow in session for token exchange
- Fix AADSTS9002325 error
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove -back suffix, use single path for monolithic Flask app
- All routes now use /solventum-image-metadata/ prefix
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add URL_PREFIX for all redirect URLs
- Redirects now go to /solventum-image-metadata-back/login instead of /login
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add deploy.sh for idempotent Docker deployments
- Configure API_BASE for /solventum-image-metadata-back/ reverse proxy
- Enable Azure AD SSO with public client flow (no secret required)
- Remove hardcoded tester user for production security
- Add ProxyFix middleware for reverse proxy header handling
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GPT-5, GPT-5-mini, and GPT-5-nano to valid models list
- Add model validation with automatic fallback to gpt-4o-mini
- Update _is_new_model() to recognize GPT-5 as new generation
- Add detailed API response logging (model used, tokens, content preview)
- Add empty content detection with helpful error messages
- Fix API parameter selection for GPT-5 models
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enhanced JSON and text parsing:
- Smarter JSON extraction from responses with extra text
- Find JSON object {...} anywhere in response text
- Improved markdown code block removal
- Validate parsed JSON has meaningful content
- Better text parsing fallback with multiple strategies
- Added debug logging for raw AI responses
- Handle edge cases like empty titles or malformed JSON
Fixes issue where AI-generated metadata was not displayed correctly.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed two critical issues:
1. OpenAI API temperature parameter:
- New models (gpt-5-mini, gpt-4o, o1, o3) only support default temperature=1
- Modified _get_api_params() to exclude temperature for new models
- Older models still use custom temperature setting
- Fixes 400 error: 'temperature' unsupported value
2. File cleanup to prevent disk space issues:
- Added cleanup_session_files() to remove files when session ends
- Added cleanup_old_files() to remove files older than 24 hours
- Cleanup runs automatically on app startup in Docker mode
- Cleanup runs on logout to free up space immediately
- Added /cleanup-session endpoint for manual cleanup
- Files no longer accumulate in Docker volume
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed two critical issues:
1. OpenAI API compatibility for newer models:
- Added _get_token_param() method to detect model type
- Newer models (gpt-5-mini, gpt-4o, o1, o3) use max_completion_tokens
- Older models (gpt-3.5-turbo) use max_tokens
- Fixes 400 error: 'max_tokens' not supported parameter
2. Progress bar for AI generation:
- Added startProgressAnimation() and stopProgressAnimation()
- Animated progress bar shows activity during AI processing
- Progress slowly increments to 90% to indicate work in progress
- Stops animation when processing completes or errors occur
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major Changes:
- Removed GUI version (run_gui.py, src/gui_app.py) - Web-only application
- Fixed duplicate JavaScript variable declaration (importSessionId)
- Fixed metadata import endpoint to use session data instead of Excel lookup
- Added .env.example with all configuration options
Bug Fixes:
- Fixed /update endpoint to use suggested_metadata from session
- Fixed JavaScript updateAllFiles() to send session_id and file_index
- Updated README.md to reflect web-only interface
Dependencies:
- Updated requirements.txt to use minimum version constraints (>=)
Configuration:
- Added comprehensive .env.example with all environment variables
- Documented OpenAI API, Microsoft SSO, and optional tool paths
Testing:
- Verified import metadata workflow end-to-end
- Confirmed file upload and metadata update functionality
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements a complete authentication system with local users,
session management, and Microsoft SSO support for enterprise environments.
New Files Created:
- src/database.py: SQLite database management with users, sessions, audit_log
- src/auth.py: Authentication module with login, SSO, and session management
- templates/login.html: Modern login page with SSO button
Database Schema:
- users table: username, password_hash, email, full_name, auth_method
- sessions table: session management with expiration
- audit_log table: user activity tracking
- Indexes for performance optimization
Authentication Features:
- Local authentication with test user (tester/oliveradmin)
- Password hashing with Werkzeug
- Session management with 24-hour expiration
- @login_required decorator for route protection
- Automatic session cleanup
Microsoft SSO Integration:
- MSAL library integration for Azure AD
- OAuth2 authorization code flow
- Microsoft Graph API user info retrieval
- Automatic user creation/update from SSO
- CSRF protection with state parameter
- Graceful fallback when SSO not configured
Security Improvements:
- All routes protected with @login_required
- Session-based authentication with database storage
- IP address and user agent logging
- Audit trail for user actions
- Secure session token generation
Configuration:
- Environment variables for Azure AD (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
- SECRET_KEY for Flask session encryption
- Optional MSAL dependency (SSO works only if configured)
Dependencies Added:
- Werkzeug>=3.0.0 for password hashing
- msal>=1.20.0 for Microsoft SSO (optional)
Test Credentials:
- Username: tester
- Password: oliveradmin
Phase 4 Status: Complete
Next Phase: Phase 5 (Modern UI Overhaul) for v3.1 release
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created comprehensive FieldMapper module (400+ lines):
- Fuzzy field matching with SequenceMatcher (60% similarity threshold)
- 10+ aliases per standard field (title, subject, keywords, description)
- Auto-mapping with confidence scores (0.0 to 1.0)
- Mapping suggestions with alternatives (top 2 per field)
- Exact match detection (score 1.0) and substring bonuses (0.85)
- Preset save/load/delete for reusable mappings
- Mapping validation (duplicate targets, coverage stats)
- Unmapped field detection and coverage percentage
FieldMapper features:
- auto_map(): Generate mapping from source fields
- suggest_mapping(): Get best match + alternatives for each field
- validate_mapping(): Check for conflicts and warnings
- apply_mapping(): Transform data using field mapping
- get_mapping_coverage(): Calculate mapping completeness
- Preset management: save, load, list, delete
MetadataImporter enhancements:
- preview_file_structure(): Preview columns and suggest mappings
- import_with_mapping(): Import with custom field mapping
- Integration with FieldMapper for smart detection
- Sample row preview (5 rows) before import
Web API additions:
- /preview-import endpoint: Preview file structure and field suggestions
- Returns: columns, sample rows, mapping suggestions with confidence
- Supports CSV, Excel, JSON format detection
Field mapping workflow:
1. User uploads import file for preview
2. System analyzes columns and suggests mappings
3. User reviews/adjusts mappings (confidence scores shown)
4. User confirms and imports with mapping
5. Optional: Save mapping as preset for reuse
Technical highlights:
- SequenceMatcher from difflib for fuzzy string matching
- Normalize field names (lowercase, underscores)
- Multiple alias sets per target field
- Confidence-based ranking of matches
- Preset persistence via JSON file
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enhanced metadata_analyzer.py with production-ready capabilities:
- Token counting with tiktoken for accurate OpenAI usage tracking
- Exponential backoff retry logic with tenacity library
- Intelligent content truncation based on token limits (not characters)
- Configurable timeout and max retries from Config
- Graceful fallback when tiktoken/tenacity unavailable
- Enhanced error reporting with _ai_error and _tokens_used metadata
Integrated AI generation in web interface:
- AI analyzer lazy initialization in web_app.py
- Real content extraction and AI analysis in upload endpoint
- Error handling for insufficient content or API failures
- Token usage logging for monitoring and optimization
UI improvements for AI experience:
- Special loading message for AI processing (10-30s per file)
- Display token usage for AI-generated metadata
- Show AI errors prominently with helpful messages
- Filter internal metadata fields (_tokens_used, _ai_error) from forms
Dependencies leveraged:
- tiktoken: Proper OpenAI token counting (10x more accurate)
- tenacity: Exponential backoff retry (3 attempts, 2-10s delays)
- openai: Production timeout support (30s default)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added ExifTool integration to support 300+ file formats with improved
performance and unified API for metadata operations.
Changes:
- Added PyExifTool>=0.5.6 to requirements.txt
- Created comprehensive ExifTool setup guide (docs/EXIFTOOL_SETUP.md)
- Created ExifToolExtractor for reading metadata from images/video/PDF
- Created ExifToolUpdater for writing metadata to images/video/PDF
- Updated README with ExifTool installation instructions
ExifTool Benefits:
- Unified API for images, videos, PDFs (vs 5+ separate libraries)
- Support for 300+ formats (HEIC, RAW, MKV, and more)
- 10-60x faster batch operations with stay_open mode
- Better PDF metadata writing (current pypdf is read-only)
- Battle-tested tool with 20+ years of development
Architecture:
- Hybrid approach: ExifTool for images/video/PDF, Python libs for Office
- Graceful fallback if ExifTool not installed
- Automatic detection on startup with helpful messages
- Tag mapping from ExifTool tags to standard fields (title/subject/keywords)
Implementation follows existing extractor/updater patterns for consistency.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Updated application name to "Oliver Metadata Tool"
- Updated version to 3.0.0
- Added App Info constants to config.py (APP_NAME, APP_VERSION, APP_DESCRIPTION)
- Updated web interface (title, header, footer)
- Updated README with new branding and description
- Added AI configuration settings to config.py
- Added ExifTool check method to config.py
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added Flask web interface for batch metadata processing
- Added Excel-based metadata lookup (Celum ID mapping)
- Dual-sheet support: DSB (primary) and Medsurg (fallback)
- Unicode/hieroglyph support for CGA region (Chinese, Japanese, Korean)
- Multi-format support: PDF, images, Office docs, video
- OCR with multi-language support (Tesseract)
- Filename matching without extension (case-insensitive)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>