Commit graph

27 commits

Author SHA1 Message Date
SamoilenkoVadym
0b136bac6c Fix session expiration issues and improve error handling
Fixed two critical session-related issues:

1. Session expiration during file processing
   - Added proper error message when session expires mid-process
   - Prevents silent failure and missing download buttons
   - Shows clear "Session expired" message to user

2. Session lifetime and cookie configuration
   - Increased session lifetime from 24 hours to 7 days (configurable)
   - Made sessions permanent (session.permanent = True) in all login flows
   - Improved cookie security settings with environment variable control
   - Added SESSION_COOKIE_SECURE and SESSION_LIFETIME_DAYS env vars
   - Fixed cookie configuration for HTTPS reverse proxy

Changes:
- web_app.py: Enhanced session configuration and made sessions permanent
- templates/index.html: Better error handling for session expiration

This fixes:
- "Unexpected token '<'" errors appearing intermittently
- Missing download buttons after metadata update
- Sessions expiring too quickly requiring frequent re-login

Environment variables (optional):
- SESSION_COOKIE_SECURE=true (default for HTTPS)
- SESSION_LIFETIME_DAYS=7 (default 7 days)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-09 11:46:51 +00:00
SamoilenkoVadym
2ea4673bf0 Fix session persistence and improve AJAX detection for file uploads
Fixed three critical issues:

1. Session persistence - Cookies not saved after page refresh
   - Replaced APPLICATION_ROOT with SESSION_COOKIE_PATH
   - Added proper cookie settings for reverse proxy (HttpOnly, SameSite)
   - Set correct cookie path matching URL_PREFIX

2. AJAX detection for FormData uploads (JPG, etc.)
   - Enhanced @login_required to detect POST/PUT/DELETE as AJAX
   - Added Content-Type check for JSON requests
   - Added path prefix check for API endpoints

3. JavaScript AJAX identification
   - Updated fetchWithAuth() to add X-Requested-With header
   - Properly handles both JSON and FormData requests using Headers API
   - Ensures all fetch calls are identified as AJAX by server

Changes:
- web_app.py: Fixed Flask session cookie configuration
- src/auth.py: Improved AJAX detection logic in login_required decorator
- templates/index.html: Enhanced fetchWithAuth() with proper headers

This fixes:
- Users having to re-login on every page refresh
- "Unexpected token '<'" errors when uploading JPG files
- Session cookies not persisting through reverse proxy

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-09 11:24:49 +00:00
SamoilenkoVadym
25c5d1ba11 Complete SPA OAuth flow - login button uses MSAL.js
- Login button now uses MSAL.js loginRedirect() for PKCE
- oauth_callback uses MSAL.js handleRedirectPromise() to complete token exchange
- PKCE flow is now entirely in browser (SPA compatible)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:50:23 +00:00
SamoilenkoVadym
2e64ae9d15 Add SPA-compatible OAuth flow with MSAL.js
- Render oauth_callback.html with MSAL.js for browser token exchange
- Add /auth/token endpoint to receive token from JavaScript
- Token exchange happens in browser (cross-origin) for SPA compatibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:40:42 +00:00
SamoilenkoVadym
497ab446ad Handle OAuth callback on root path /
- Check for OAuth code in query params on main page
- Process SSO login without requiring /auth/callback route
- Redirect to clean URL after successful login

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:27:55 +00:00
SamoilenkoVadym
992787bef1 Add debug logging to auth callback 2026-02-06 23:25:42 +00:00
SamoilenkoVadym
14ea29c5cb Add PKCE support for Azure AD public client SSO
- Use initiate_auth_code_flow for PKCE (required by Azure AD for public clients)
- Store auth flow in session for token exchange
- Fix AADSTS9002325 error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:21:20 +00:00
SamoilenkoVadym
614322e135 Fix URL prefix to use single path /solventum-image-metadata
- Remove -back suffix, use single path for monolithic Flask app
- All routes now use /solventum-image-metadata/ prefix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 23:12:10 +00:00
SamoilenkoVadym
a1ddf28108 Fix redirect URLs for reverse proxy
- Add URL_PREFIX for all redirect URLs
- Redirects now go to /solventum-image-metadata-back/login instead of /login

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 22:32:54 +00:00
SamoilenkoVadym
189cb3dab3 Add deployment script and configure reverse proxy with Azure SSO
- Add deploy.sh for idempotent Docker deployments
- Configure API_BASE for /solventum-image-metadata-back/ reverse proxy
- Enable Azure AD SSO with public client flow (no secret required)
- Remove hardcoded tester user for production security
- Add ProxyFix middleware for reverse proxy header handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 16:37:19 +00:00
SamoilenkoVadym
0b9a29b0c4 Fix temperature parameter for new models and add file cleanup
Fixed two critical issues:

1. OpenAI API temperature parameter:
   - New models (gpt-5-mini, gpt-4o, o1, o3) only support default temperature=1
   - Modified _get_api_params() to exclude temperature for new models
   - Older models still use custom temperature setting
   - Fixes 400 error: 'temperature' unsupported value

2. File cleanup to prevent disk space issues:
   - Added cleanup_session_files() to remove files when session ends
   - Added cleanup_old_files() to remove files older than 24 hours
   - Cleanup runs automatically on app startup in Docker mode
   - Cleanup runs on logout to free up space immediately
   - Added /cleanup-session endpoint for manual cleanup
   - Files no longer accumulate in Docker volume

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 14:10:00 +00:00
SamoilenkoVadym
cba7275764 Fix KeyError: change 'path' to 'filepath' in download_selected_files
Fixed critical bug in download_selected_files function:
- Session stores files with 'filepath' key, not 'path'
- Changed file_info['path'] to file_info['filepath']
- Added extensive logging to catch future issues

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 14:01:01 +00:00
SamoilenkoVadym
ec3a2e2ffe Change Download All to Download Selected Files functionality
Modified download feature to work with selected files instead of all files:
- Button now shows 'Download Selected Files (N) as ZIP' with count
- New endpoint /download-selected accepts POST with file_indices
- Frontend sends only selected file indices to backend
- Button text updates dynamically when selection changes
- All files selected by default as before
- Users can select/deselect files before downloading

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 13:56:13 +00:00
SamoilenkoVadym
74639b949a Add Download All Files feature with ZIP archive support
Added functionality to download all processed files at once in a ZIP archive:
- New endpoint /download-all/<session_id> in web_app.py
- Creates timestamped ZIP archive with all files from session
- Download All button appears after successful file updates
- Button shows at bottom of results with clear styling
- Added zipfile and datetime imports

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 13:51:37 +00:00
SamoilenkoVadym
f5d77b8b39 Fix output directory issue in Docker mode
Problem:
- Users entered local folder paths that Docker container cannot access
- Files were not saved to user's folders
- output_dir check (os.path.isdir) always failed for host paths

Solution:
1. Backend (web_app.py):
   - Only use output_dir in non-Docker mode
   - In Docker mode, always update files in-place
   - Users download files via browser instead

2. Frontend (templates/index.html):
   - Hide output_dir field in Docker mode
   - Show info message: files available for download
   - Safe JS check for outputDir element

3. Template rendering:
   - Pass docker_mode flag to template
   - Conditional display of output directory section

Result:
 Docker mode: Files updated in-place, downloadable via browser
 Local mode: output_dir still works for direct folder saving
 No more confusion about folder paths

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 13:44:13 +00:00
SamoilenkoVadym
acc071927e Add Docker support with complete deployment setup
Features:
- Docker mode detection via DOCKER_MODE env var
- Persistent volumes for uploads, database, and output
- Health checks and auto-restart
- Complete docker-compose.yml configuration
- Helper script (docker-run.sh) for easy management
- Comprehensive DOCKER.md documentation

Changes:
- web_app.py: Auto-detect Docker mode, use persistent dirs
- src/database.py: Auto-detect database path based on environment
- Dockerfile: Multi-stage build with all dependencies (ExifTool, Tesseract, Poppler, FFmpeg)
- docker-compose.yml: Production-ready configuration
- docker-run.sh: Management script (build, start, stop, logs, etc.)
- DOCKER.md: Complete deployment and troubleshooting guide
- README.md: Added Docker quick start section
- .gitignore: Added Docker-related entries

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 13:07:15 +00:00
SamoilenkoVadym
007597c88a Remove GUI version, fix import metadata bugs, add .env.example
Major Changes:

- Removed GUI version (run_gui.py, src/gui_app.py) - Web-only application

- Fixed duplicate JavaScript variable declaration (importSessionId)

- Fixed metadata import endpoint to use session data instead of Excel lookup

- Added .env.example with all configuration options

Bug Fixes:

- Fixed /update endpoint to use suggested_metadata from session

- Fixed JavaScript updateAllFiles() to send session_id and file_index

- Updated README.md to reflect web-only interface

Dependencies:

- Updated requirements.txt to use minimum version constraints (>=)

Configuration:

- Added comprehensive .env.example with all environment variables

- Documented OpenAI API, Microsoft SSO, and optional tool paths

Testing:

- Verified import metadata workflow end-to-end

- Confirmed file upload and metadata update functionality

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 12:14:18 +00:00
SamoilenkoVadym
804c8acbbb v3.1 Enterprise Edition: Excel/Import mapping, UI fixes, documentation update
Features:
- Smart column mapping for Excel and Import files (CSV/Excel/JSON)
- Modal dialogs for configuring sheet and column mappings
- Auto-detection of common column names (filename, title, description, keywords)
- Preview of first 3 rows before confirming mapping
- Case-insensitive filename matching without extension

UI Improvements:
- Fixed output folder selection (now uses text input instead of folder browser)
- Removed non-functional Reset button from metadata editor
- Clear button for output folder path

Documentation:
- Updated README.md with v3.1 Enterprise Edition information
- Developer: Vadym Samoilenko
- License: Corporate License - Oliver Marketing
- Added AI usage tracking and logging documentation
- Complete installation guide with all dependencies
- API endpoint documentation
- Security and privacy section
- Troubleshooting guide

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 17:06:18 +00:00
SamoilenkoVadym
e9784d7da8 Phase 4 Complete: Authentication, Database, and Microsoft SSO
This commit implements a complete authentication system with local users,
session management, and Microsoft SSO support for enterprise environments.

New Files Created:
- src/database.py: SQLite database management with users, sessions, audit_log
- src/auth.py: Authentication module with login, SSO, and session management
- templates/login.html: Modern login page with SSO button

Database Schema:
- users table: username, password_hash, email, full_name, auth_method
- sessions table: session management with expiration
- audit_log table: user activity tracking
- Indexes for performance optimization

Authentication Features:
- Local authentication with test user (tester/oliveradmin)
- Password hashing with Werkzeug
- Session management with 24-hour expiration
- @login_required decorator for route protection
- Automatic session cleanup

Microsoft SSO Integration:
- MSAL library integration for Azure AD
- OAuth2 authorization code flow
- Microsoft Graph API user info retrieval
- Automatic user creation/update from SSO
- CSRF protection with state parameter
- Graceful fallback when SSO not configured

Security Improvements:
- All routes protected with @login_required
- Session-based authentication with database storage
- IP address and user agent logging
- Audit trail for user actions
- Secure session token generation

Configuration:
- Environment variables for Azure AD (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
- SECRET_KEY for Flask session encryption
- Optional MSAL dependency (SSO works only if configured)

Dependencies Added:
- Werkzeug>=3.0.0 for password hashing
- msal>=1.20.0 for Microsoft SSO (optional)

Test Credentials:
- Username: tester
- Password: oliveradmin

Phase 4 Status: Complete
Next Phase: Phase 5 (Modern UI Overhaul) for v3.1 release

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:57:47 +00:00
SamoilenkoVadym
f99aa118bf Phase 3 Complete: Batch Selection, CSV Export, and Metadata Templates
This commit completes Phase 3 implementation with advanced batch processing
and metadata template system.

Changes:
- Added batch file selection with checkboxes
- Implemented select all/deselect all functionality
- Updated batch processing to handle only selected files
- Added CSV export for processing results
- Created template_manager.py with variable substitution system
- Added template endpoints (list, save, load, delete, apply, preview)
- Integrated template UI with modal dialog for creation
- Template variables: {filename}, {date}, {datetime}, {user}, {year}, {month}, {day}

Phase 3 Status: Complete
Next Phase: Phase 4 (Authentication + SSO) for v3.1 release

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:52:05 +00:00
SamoilenkoVadym
61210a5e3d Phase 3.1: Field mapping foundation with auto-detection
Created comprehensive FieldMapper module (400+ lines):
- Fuzzy field matching with SequenceMatcher (60% similarity threshold)
- 10+ aliases per standard field (title, subject, keywords, description)
- Auto-mapping with confidence scores (0.0 to 1.0)
- Mapping suggestions with alternatives (top 2 per field)
- Exact match detection (score 1.0) and substring bonuses (0.85)
- Preset save/load/delete for reusable mappings
- Mapping validation (duplicate targets, coverage stats)
- Unmapped field detection and coverage percentage

FieldMapper features:
- auto_map(): Generate mapping from source fields
- suggest_mapping(): Get best match + alternatives for each field
- validate_mapping(): Check for conflicts and warnings
- apply_mapping(): Transform data using field mapping
- get_mapping_coverage(): Calculate mapping completeness
- Preset management: save, load, list, delete

MetadataImporter enhancements:
- preview_file_structure(): Preview columns and suggest mappings
- import_with_mapping(): Import with custom field mapping
- Integration with FieldMapper for smart detection
- Sample row preview (5 rows) before import

Web API additions:
- /preview-import endpoint: Preview file structure and field suggestions
- Returns: columns, sample rows, mapping suggestions with confidence
- Supports CSV, Excel, JSON format detection

Field mapping workflow:
1. User uploads import file for preview
2. System analyzes columns and suggests mappings
3. User reviews/adjusts mappings (confidence scores shown)
4. User confirms and imports with mapping
5. Optional: Save mapping as preset for reuse

Technical highlights:
- SequenceMatcher from difflib for fuzzy string matching
- Normalize field names (lowercase, underscores)
- Multiple alias sets per target field
- Confidence-based ranking of matches
- Preset persistence via JSON file

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:45:11 +00:00
SamoilenkoVadym
03079080d8 Phase 2.4: Metadata import from external files (CSV, Excel, JSON)
Created comprehensive metadata_importer.py module:
- CSV import with multiple encoding support (UTF-8, Latin1, ISO-8859-1, CP1252)
- Excel import (.xlsx, .xls) with sheet selection
- JSON import (object and array formats)
- Intelligent column detection for filename, title, subject, keywords
- Fuzzy column matching (case-insensitive, multiple aliases)
- Metadata normalization to standard format
- Import validation with statistics
- File lookup by filename stem (case-insensitive)

Web interface enhancements:
- /import-metadata endpoint for file uploads
- Import section UI (appears when Import source selected)
- Real-time import statistics display (records, title/subject/keywords counts)
- Import session management with unique session IDs
- Visual feedback (active state, success/error messages)
- Validation: requires import file before processing with import source

Import workflow:
1. User selects "Import from File" metadata source
2. Import section appears with file chooser
3. User uploads CSV/Excel/JSON with metadata
4. System validates and shows statistics
5. User uploads files to process
6. System matches files to imported metadata by filename

Supported import formats:
- CSV: filename, title, subject/description, keywords columns
- Excel: Any sheet with filename and metadata columns
- JSON: {filename: {metadata}} or [{filename, metadata}] formats

Technical features:
- Pandas DataFrame parsing for CSV/Excel
- Flexible column name detection (10+ aliases per field)
- NaN/null value handling
- List/array keyword support
- Unicode filename support

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:39:27 +00:00
SamoilenkoVadym
1bf2483f2d Phase 2.3: AI metadata generation with production-ready features
Enhanced metadata_analyzer.py with production-ready capabilities:
- Token counting with tiktoken for accurate OpenAI usage tracking
- Exponential backoff retry logic with tenacity library
- Intelligent content truncation based on token limits (not characters)
- Configurable timeout and max retries from Config
- Graceful fallback when tiktoken/tenacity unavailable
- Enhanced error reporting with _ai_error and _tokens_used metadata

Integrated AI generation in web interface:
- AI analyzer lazy initialization in web_app.py
- Real content extraction and AI analysis in upload endpoint
- Error handling for insufficient content or API failures
- Token usage logging for monitoring and optimization

UI improvements for AI experience:
- Special loading message for AI processing (10-30s per file)
- Display token usage for AI-generated metadata
- Show AI errors prominently with helpful messages
- Filter internal metadata fields (_tokens_used, _ai_error) from forms

Dependencies leveraged:
- tiktoken: Proper OpenAI token counting (10x more accurate)
- tenacity: Exponential backoff retry (3 attempts, 2-10s delays)
- openai: Production timeout support (30s default)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:36:48 +00:00
SamoilenkoVadym
fa2b4da2f7 Phase 2.1 & 2.2: Manual metadata editing and multiple sources
Implemented manual metadata editing UI:
- Added editable input fields for title (200 chars), subject (300 chars), keywords (500 chars)
- Character counters with warning/danger indicators at 90%/100%
- Real-time validation with visual feedback
- Save and Reset buttons for each file
- Individual file metadata updates via /update-manual endpoint

Implemented multiple metadata sources:
- Added metadata source selector dropdown (Excel, Manual, AI, Import)
- Modified /upload endpoint to handle different metadata sources
- Excel lookup: existing functionality (fastest)
- Manual entry: empty fields for user input
- AI generation: placeholder for Phase 2.3
- Import: placeholder for Phase 2.4

Technical improvements:
- Session-based metadata storage for persistence
- Graceful success/error feedback with visual indicators
- Sanitized metadata input with length limits
- Backup creation before updates

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:34:05 +00:00
SamoilenkoVadym
f4e1017964 Phase 1.3: Improve startup error handling and dependency checks
Added comprehensive startup checks in web_app.py:
- Check for Excel file existence with helpful error message
- Validate OpenAI API key availability (optional)
- Check ExifTool installation (optional)
- Display available metadata sources based on configuration
- Updated branding in startup messages

Benefits:
- Users see clear error messages for missing dependencies
- Easy troubleshooting of configuration issues
- Graceful degradation when optional features unavailable

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:17:03 +00:00
SamoilenkoVadym
7db62e06da Phase 1.1: Rebrand to Oliver Metadata Tool v3.0
- Updated application name to "Oliver Metadata Tool"
- Updated version to 3.0.0
- Added App Info constants to config.py (APP_NAME, APP_VERSION, APP_DESCRIPTION)
- Updated web interface (title, header, footer)
- Updated README with new branding and description
- Added AI configuration settings to config.py
- Added ExifTool check method to config.py

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 15:15:26 +00:00
SamoilenkoVadym
2082ea7ce7 Initial commit: Universal metadata tool with Excel-based lookup
- Added Flask web interface for batch metadata processing
- Added Excel-based metadata lookup (Celum ID mapping)
- Dual-sheet support: DSB (primary) and Medsurg (fallback)
- Unicode/hieroglyph support for CGA region (Chinese, Japanese, Korean)
- Multi-format support: PDF, images, Office docs, video
- OCR with multi-language support (Tesseract)
- Filename matching without extension (case-insensitive)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 14:23:42 +00:00