Fixed three critical issues:
1. Session persistence - Cookies not saved after page refresh
- Replaced APPLICATION_ROOT with SESSION_COOKIE_PATH
- Added proper cookie settings for reverse proxy (HttpOnly, SameSite)
- Set correct cookie path matching URL_PREFIX
2. AJAX detection for FormData uploads (JPG, etc.)
- Enhanced @login_required to detect POST/PUT/DELETE as AJAX
- Added Content-Type check for JSON requests
- Added path prefix check for API endpoints
3. JavaScript AJAX identification
- Updated fetchWithAuth() to add X-Requested-With header
- Properly handles both JSON and FormData requests using Headers API
- Ensures all fetch calls are identified as AJAX by server
Changes:
- web_app.py: Fixed Flask session cookie configuration
- src/auth.py: Improved AJAX detection logic in login_required decorator
- templates/index.html: Enhanced fetchWithAuth() with proper headers
This fixes:
- Users having to re-login on every page refresh
- "Unexpected token '<'" errors when uploading JPG files
- Session cookies not persisting through reverse proxy
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
- Login button now uses MSAL.js loginRedirect() for PKCE
- oauth_callback uses MSAL.js handleRedirectPromise() to complete token exchange
- PKCE flow is now entirely in browser (SPA compatible)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Render oauth_callback.html with MSAL.js for browser token exchange
- Add /auth/token endpoint to receive token from JavaScript
- Token exchange happens in browser (cross-origin) for SPA compatibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Check for OAuth code in query params on main page
- Process SSO login without requiring /auth/callback route
- Redirect to clean URL after successful login
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use initiate_auth_code_flow for PKCE (required by Azure AD for public clients)
- Store auth flow in session for token exchange
- Fix AADSTS9002325 error
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove -back suffix, use single path for monolithic Flask app
- All routes now use /solventum-image-metadata/ prefix
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add URL_PREFIX for all redirect URLs
- Redirects now go to /solventum-image-metadata-back/login instead of /login
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add deploy.sh for idempotent Docker deployments
- Configure API_BASE for /solventum-image-metadata-back/ reverse proxy
- Enable Azure AD SSO with public client flow (no secret required)
- Remove hardcoded tester user for production security
- Add ProxyFix middleware for reverse proxy header handling
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixed two critical issues:
1. OpenAI API temperature parameter:
- New models (gpt-5-mini, gpt-4o, o1, o3) only support default temperature=1
- Modified _get_api_params() to exclude temperature for new models
- Older models still use custom temperature setting
- Fixes 400 error: 'temperature' unsupported value
2. File cleanup to prevent disk space issues:
- Added cleanup_session_files() to remove files when session ends
- Added cleanup_old_files() to remove files older than 24 hours
- Cleanup runs automatically on app startup in Docker mode
- Cleanup runs on logout to free up space immediately
- Added /cleanup-session endpoint for manual cleanup
- Files no longer accumulate in Docker volume
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Modified download feature to work with selected files instead of all files:
- Button now shows 'Download Selected Files (N) as ZIP' with count
- New endpoint /download-selected accepts POST with file_indices
- Frontend sends only selected file indices to backend
- Button text updates dynamically when selection changes
- All files selected by default as before
- Users can select/deselect files before downloading
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added functionality to download all processed files at once in a ZIP archive:
- New endpoint /download-all/<session_id> in web_app.py
- Creates timestamped ZIP archive with all files from session
- Download All button appears after successful file updates
- Button shows at bottom of results with clear styling
- Added zipfile and datetime imports
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem:
- Users entered local folder paths that Docker container cannot access
- Files were not saved to user's folders
- output_dir check (os.path.isdir) always failed for host paths
Solution:
1. Backend (web_app.py):
- Only use output_dir in non-Docker mode
- In Docker mode, always update files in-place
- Users download files via browser instead
2. Frontend (templates/index.html):
- Hide output_dir field in Docker mode
- Show info message: files available for download
- Safe JS check for outputDir element
3. Template rendering:
- Pass docker_mode flag to template
- Conditional display of output directory section
Result:
✅ Docker mode: Files updated in-place, downloadable via browser
✅ Local mode: output_dir still works for direct folder saving
✅ No more confusion about folder paths
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major Changes:
- Removed GUI version (run_gui.py, src/gui_app.py) - Web-only application
- Fixed duplicate JavaScript variable declaration (importSessionId)
- Fixed metadata import endpoint to use session data instead of Excel lookup
- Added .env.example with all configuration options
Bug Fixes:
- Fixed /update endpoint to use suggested_metadata from session
- Fixed JavaScript updateAllFiles() to send session_id and file_index
- Updated README.md to reflect web-only interface
Dependencies:
- Updated requirements.txt to use minimum version constraints (>=)
Configuration:
- Added comprehensive .env.example with all environment variables
- Documented OpenAI API, Microsoft SSO, and optional tool paths
Testing:
- Verified import metadata workflow end-to-end
- Confirmed file upload and metadata update functionality
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements a complete authentication system with local users,
session management, and Microsoft SSO support for enterprise environments.
New Files Created:
- src/database.py: SQLite database management with users, sessions, audit_log
- src/auth.py: Authentication module with login, SSO, and session management
- templates/login.html: Modern login page with SSO button
Database Schema:
- users table: username, password_hash, email, full_name, auth_method
- sessions table: session management with expiration
- audit_log table: user activity tracking
- Indexes for performance optimization
Authentication Features:
- Local authentication with test user (tester/oliveradmin)
- Password hashing with Werkzeug
- Session management with 24-hour expiration
- @login_required decorator for route protection
- Automatic session cleanup
Microsoft SSO Integration:
- MSAL library integration for Azure AD
- OAuth2 authorization code flow
- Microsoft Graph API user info retrieval
- Automatic user creation/update from SSO
- CSRF protection with state parameter
- Graceful fallback when SSO not configured
Security Improvements:
- All routes protected with @login_required
- Session-based authentication with database storage
- IP address and user agent logging
- Audit trail for user actions
- Secure session token generation
Configuration:
- Environment variables for Azure AD (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
- SECRET_KEY for Flask session encryption
- Optional MSAL dependency (SSO works only if configured)
Dependencies Added:
- Werkzeug>=3.0.0 for password hashing
- msal>=1.20.0 for Microsoft SSO (optional)
Test Credentials:
- Username: tester
- Password: oliveradmin
Phase 4 Status: Complete
Next Phase: Phase 5 (Modern UI Overhaul) for v3.1 release
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created comprehensive FieldMapper module (400+ lines):
- Fuzzy field matching with SequenceMatcher (60% similarity threshold)
- 10+ aliases per standard field (title, subject, keywords, description)
- Auto-mapping with confidence scores (0.0 to 1.0)
- Mapping suggestions with alternatives (top 2 per field)
- Exact match detection (score 1.0) and substring bonuses (0.85)
- Preset save/load/delete for reusable mappings
- Mapping validation (duplicate targets, coverage stats)
- Unmapped field detection and coverage percentage
FieldMapper features:
- auto_map(): Generate mapping from source fields
- suggest_mapping(): Get best match + alternatives for each field
- validate_mapping(): Check for conflicts and warnings
- apply_mapping(): Transform data using field mapping
- get_mapping_coverage(): Calculate mapping completeness
- Preset management: save, load, list, delete
MetadataImporter enhancements:
- preview_file_structure(): Preview columns and suggest mappings
- import_with_mapping(): Import with custom field mapping
- Integration with FieldMapper for smart detection
- Sample row preview (5 rows) before import
Web API additions:
- /preview-import endpoint: Preview file structure and field suggestions
- Returns: columns, sample rows, mapping suggestions with confidence
- Supports CSV, Excel, JSON format detection
Field mapping workflow:
1. User uploads import file for preview
2. System analyzes columns and suggests mappings
3. User reviews/adjusts mappings (confidence scores shown)
4. User confirms and imports with mapping
5. Optional: Save mapping as preset for reuse
Technical highlights:
- SequenceMatcher from difflib for fuzzy string matching
- Normalize field names (lowercase, underscores)
- Multiple alias sets per target field
- Confidence-based ranking of matches
- Preset persistence via JSON file
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enhanced metadata_analyzer.py with production-ready capabilities:
- Token counting with tiktoken for accurate OpenAI usage tracking
- Exponential backoff retry logic with tenacity library
- Intelligent content truncation based on token limits (not characters)
- Configurable timeout and max retries from Config
- Graceful fallback when tiktoken/tenacity unavailable
- Enhanced error reporting with _ai_error and _tokens_used metadata
Integrated AI generation in web interface:
- AI analyzer lazy initialization in web_app.py
- Real content extraction and AI analysis in upload endpoint
- Error handling for insufficient content or API failures
- Token usage logging for monitoring and optimization
UI improvements for AI experience:
- Special loading message for AI processing (10-30s per file)
- Display token usage for AI-generated metadata
- Show AI errors prominently with helpful messages
- Filter internal metadata fields (_tokens_used, _ai_error) from forms
Dependencies leveraged:
- tiktoken: Proper OpenAI token counting (10x more accurate)
- tenacity: Exponential backoff retry (3 attempts, 2-10s delays)
- openai: Production timeout support (30s default)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added comprehensive startup checks in web_app.py:
- Check for Excel file existence with helpful error message
- Validate OpenAI API key availability (optional)
- Check ExifTool installation (optional)
- Display available metadata sources based on configuration
- Updated branding in startup messages
Benefits:
- Users see clear error messages for missing dependencies
- Easy troubleshooting of configuration issues
- Graceful degradation when optional features unavailable
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Updated application name to "Oliver Metadata Tool"
- Updated version to 3.0.0
- Added App Info constants to config.py (APP_NAME, APP_VERSION, APP_DESCRIPTION)
- Updated web interface (title, header, footer)
- Updated README with new branding and description
- Added AI configuration settings to config.py
- Added ExifTool check method to config.py
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added Flask web interface for batch metadata processing
- Added Excel-based metadata lookup (Celum ID mapping)
- Dual-sheet support: DSB (primary) and Medsurg (fallback)
- Unicode/hieroglyph support for CGA region (Chinese, Japanese, Korean)
- Multi-format support: PDF, images, Office docs, video
- OCR with multi-language support (Tesseract)
- Filename matching without extension (case-insensitive)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>