Michael Clervi 0d54765bbe changed port from 5010 to 5011 for back end

2025-11-03 15:23:55 +00:00

14 KiB

Executable file

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Voice to Text is a two-tier web application that transcribes audio files using OpenAI Whisper and optionally translates them using DeepL API. The application consists of:

Python Flask API (backend): Handles transcription and translation
PHP web interface (frontend): User interface, authentication, and request handling
Microsoft Azure AD SSO: OAuth2 with PKCE flow for authentication

Development Setup

Initial Setup

# 1. Configure authentication
cp .env.example .env
# Edit .env with your Azure AD credentials

# 2. Install PHP dependencies
composer install

# 3. Install Python dependencies and create virtual environment
./setup.sh

# 4. Start the Python API server
./start_api.sh

# Or manually:
source venv/bin/activate
python api.py

The API runs on http://localhost:5010 by default. The PHP frontend should be served via MAMP, Apache, or any PHP-enabled web server with HTTPS enabled for production.

Testing the Application

# Check if Python API is running
curl http://localhost:5010/health

# Or use the PHP diagnostic page
# Visit: check_api.php in browser

# Test downloads
# Visit: test_download.php in browser

Python Version Compatibility

Recommended: Python 3.10 or 3.11
Supported: Python 3.8+
Warning: Python 3.12+ may have compatibility issues with some dependencies

Architecture

Three-Layer Design

The application uses a separation of concerns:

Authentication Layer: Microsoft Azure AD SSO with OAuth2 PKCE flow
Python API (api.py): Computation-heavy tasks (Whisper transcription, DeepL translation)
PHP Frontend: User interface, session management, file handling, and proxying requests to Python API

Authentication Flow

User Browser
    ↓
login.php (landing page)
    ↓ (clicks "Sign in with Microsoft")
auth.php
    ↓ (generates PKCE code_verifier & code_challenge)
Azure AD OAuth2 Authorization Endpoint
    ↓ (user authenticates)
auth.php (callback)
    ↓ (exchanges code + code_verifier for token)
Microsoft Graph API (/me)
    ↓ (retrieves user info)
Session initialized:
    - $_SESSION['authenticated'] = true
    - $_SESSION['user_id'], ['user_name'], ['user_email']
    - $_SESSION['user_files'] = []
    ↓
index.php (main app)

Request Flow (After Authentication)

User Browser (index.php)
    ↓ (jQuery AJAX + FormData)
process.php
    ↓ (auth check via isAuthenticated())
    ↓ (cURL to Python API)
api.py (Flask)
    ↓ (Whisper transcription)
    ↓ (Optional: DeepL translation)
outputs/ directory
    ↓ (files tracked in $_SESSION['user_files'])
download.php
    ↓ (auth + ownership check)
User Browser (download)

Key Components

Authentication & Configuration Files:

auth_config.php (Authentication & Environment Configuration):

Loads environment variables from .env using vlucas/phpdotenv
Defines Azure AD configuration constants (CLIENT_ID, AUTHORITY, REDIRECT_URI)
Configures secure session settings (httponly, secure, samesite)
Provides helper functions:
- isAuthenticated(): Check if user is logged in and session is valid
- requireAuth(): Redirect to login.php if not authenticated
- getCurrentUser(): Get current user info from session

login.php (Landing Page):

First page users see when not authenticated
Displays "Sign in with Microsoft" button with Microsoft logo
Matches black/gold theme of main application
Redirects to index.php if already authenticated

auth.php (OAuth2 PKCE Handler):

Implements OAuth2 Authorization Code flow with PKCE
Step 1: Generates code_verifier (64-char random string) and code_challenge (SHA256 hash)
Step 2: Redirects to Azure AD with PKCE parameters
Step 3: Handles callback, verifies state (CSRF protection)
Step 4: Exchanges authorization code + code_verifier for access token
Step 5: Calls Microsoft Graph API to get user info
Step 6: Initializes session with user data and empty file list
Step 7: Redirects to index.php

logout.php (Session Destruction):

Clears all session variables
Destroys session cookie
Destroys session
Redirects to login.php

config.php (Configuration Loader):

Requires auth_config.php
Starts session if not already started
All configuration now loaded from .env via auth_config.php

API & Core Files:

api.py (Flask REST API - Port 5010):

/health: Health check endpoint
/transcribe: Main endpoint - accepts audio file, format (txt/vtt/srt), translation settings
/download/<filename>: Serves transcribed files
Whisper model loaded once at startup and kept in memory
DeepL translator initialized at startup
Generates both original and translated files when translation is enabled

process.php (PHP request handler):

Auth check: Calls isAuthenticated() - returns error if not authenticated
Receives multipart/form-data from frontend
Validates file size (350MB limit)
Forwards to Python API via cURL
File tracking: Adds original and translated filenames to $_SESSION['user_files']
Returns formatted HTML for display (truncated at 10,000 chars for preview)
Provides download links for full files

index.php (Main UI):

Auth required: Calls requireAuth() at top - redirects to login.php if not authenticated
Displays user header with name, email, and logout button
jQuery-based AJAX file upload
Format selector (txt/vtt/srt)
Translation toggle with language selector (30+ languages)
Real-time progress bar during processing
In-page preview of transcriptions
Download buttons for original and translated files

download.php (File server):

Auth required: Calls isAuthenticated() - returns 401 if not authenticated
Ownership check: Verifies requested file is in $_SESSION['user_files']
Returns 403 Forbidden if user doesn't own the file
Logs unauthorized download attempts
Serves files from outputs/ directory
Security: Uses basename() to prevent directory traversal
Sets proper Content-Type headers based on file extension

.env (Environment Variables):

AZURE_CLIENT_ID: Azure AD application client ID
AZURE_AUTHORITY: Azure AD authority URL with tenant ID
AZURE_REDIRECT_URI: OAuth2 redirect URI (must match Azure AD config)
DEEPL_API_KEY: DeepL API key for translation
PYTHON_API_URL: Python Flask API endpoint (default: http://localhost:5010)
SESSION_TIMEOUT: Session timeout in seconds (default: 28800 = 8 hours)

Output Formats

Text (.txt)

Plain text transcription - full text of audio

VTT (.vtt)

WebVTT subtitle format with timestamps:

WEBVTT

00:00:00.000 --> 00:00:05.123
First segment text

00:00:05.123 --> 00:00:10.456
Second segment text

SRT (.srt)

SubRip subtitle format with timestamps:

1
00:00:00,000 --> 00:00:05,123
First segment text

2
00:00:05,123 --> 00:00:10,456
Second segment text

Key Difference: VTT uses period (.) for milliseconds, SRT uses comma (,)

Whisper Models

Available models (edit api.py line 26 to change):

tiny: Fastest, least accurate
base: Default - good balance
small: Better accuracy, slower
medium: High accuracy, much slower
large: Best accuracy, very slow

Changing the model:

# In api.py line 26:
model = whisper.load_model("small")  # Change from "base" to desired model

File Size and Timeout Limits

Maximum file size: 350MB (configured in .htaccess and process.php)
Processing timeout: 5 minutes (300 seconds in process.php)
PHP settings (.htaccess):
- upload_max_filesize: 350M
- post_max_size: 350M
- max_execution_time: 1200 seconds
- memory_limit: 512M

Translation

Translation is powered by DeepL API:

Supports 30+ languages
Translation happens after transcription
Original language is auto-detected by Whisper
Both original and translated files are saved with suffixes:
- filename_original.{ext}
- filename_translated.{ext}

File Handling

outputs/ Directory

All transcribed files are saved here. The directory:

Created automatically by setup.sh or api.py
Should have write permissions (777 in production)
Files are named: {original_filename}_original.{ext} and {original_filename}_translated.{ext}
Not tracked by git (see .gitignore)

Temporary Files

Audio files are saved temporarily during processing
Cleaned up automatically after transcription (api.py line 186-187)

Authentication & Security

Microsoft Azure AD SSO

OAuth2 with PKCE: Uses Proof Key for Code Exchange (RFC 7636)
No client secret needed: PKCE allows public clients to authenticate securely
Code verifier: 64-character random string generated for each auth request
Code challenge: SHA256 hash of code_verifier, sent to Azure AD
Token exchange: Authorization code + code_verifier exchanged for access token

Session-Based File Access Control

Session tracking: Files tracked in $_SESSION['user_files'] array
Upload tracking: When user transcribes audio, both original and translated filenames added to their session
Download validation: download.php checks if requested file is in user's session before serving
Session timeout: Configurable (default: 8 hours) - after timeout, user loses access to their files
Trade-off: Files remain in outputs/ directory but become inaccessible after session expires

Session Security

httponly: Session cookies not accessible via JavaScript (XSS protection)
secure: Session cookies only transmitted over HTTPS (production)
samesite: Set to 'Lax' to prevent CSRF attacks
strict_mode: Rejects uninitialized session IDs
Session regeneration: Session ID regenerated after login to prevent session fixation
CSRF protection: OAuth2 state parameter validates callback authenticity

File Security

basename(): Prevents directory traversal attacks in download.php
File size validation: 350MB limit enforced in both .htaccess and process.php
Ownership logging: Unauthorized download attempts logged with user ID
No file type validation: Relies on FFmpeg to handle/reject unsupported formats

Environment Variables

.env file: All sensitive credentials stored in .env (not in git)
API keys: DeepL and Azure credentials loaded from environment
.gitignore: .env explicitly excluded from version control

Production Considerations

HTTPS required: Secure cookies require HTTPS in production
File cleanup: Old files in outputs/ should be cleaned via cron job
Session storage: Consider Redis/Memcached for multi-server deployments
Rate limiting: No rate limiting currently - consider adding for production
Logging: Unauthorized attempts logged - monitor for suspicious activity

Session-Only File Tracking

How It Works

Files are tracked in the PHP session ($_SESSION['user_files'] array) rather than a database. This approach was chosen for simplicity.

File Lifecycle

User uploads audio → process.php transcribes → adds filenames to $_SESSION['user_files']
User can download files as long as session is active
Session expires or user logs out → files become inaccessible
Files remain in outputs/ directory but cannot be downloaded

Trade-offs

Pros:

Simple implementation - no database needed
Automatic "expiration" via session timeout
Works well for temporary transcription tasks

Cons:

Files inaccessible after session expires
Can't access files across multiple devices/browsers
Orphaned files accumulate in outputs/ directory

Future Upgrades

To implement persistent file ownership:

Add SQLite/MySQL database with users and files tables
Store file ownership in database instead of session
Modify download.php to check database ownership
Consider filename-based ownership (encode user_id in filename)

Common Development Tasks

Changing Whisper Model

Edit api.py line 26 and restart the API:

# After editing
./start_api.sh

Adjusting File Size Limits

Edit both:

.htaccess - PHP upload limits
process.php line 12 - PHP validation
If using production Apache: /etc/php/.../php.ini

Testing Authentication Flow

Clear your browser cookies
Visit the application root
Should redirect to login.php
Click "Sign in with Microsoft"
Authenticate with Azure AD
Should redirect back to index.php with user header visible

Testing Transcription

Via Web UI:

Log in via login.php
Upload a test audio file
Check that files appear in test_download.php

Via API directly (bypasses auth):

curl -X POST http://localhost:5010/transcribe \
  -F "audio=@test.mp3" \
  -F "format=txt" \
  -F "translate=0"

Testing File Access Control

Upload a file while logged in
Note the filename from the download link
Log out
Try to access download.php?file=filename directly
Should receive 401 Unauthorized

Adding New Languages

Edit the language selector in index.php (lines 41-73) to add DeepL-supported languages.

Production Deployment

See README.md sections:

"Production Deployment (Apache)" for full Apache setup
"Setup Python API as Systemd Service" for running API as a service
"Monitoring and Maintenance" for logs and cleanup

Key production considerations:

Set up systemd service for Python API (voice2text-api.service)
Configure Apache virtual host
Set proper file permissions (www-data:www-data)
Set up log rotation
Configure cron job to clean old files in outputs/
Move API keys to environment variables

Debugging

API Not Responding

Check if API is running: curl http://localhost:5010/health
Check process: ps aux | grep python
Test Python directly: source venv/bin/activate && python api.py
Visit check_api.php in browser for diagnostic info

Upload Fails

Check outputs/ directory exists and is writable
Verify file size is under 350MB
Check Apache/PHP error logs
Verify FFmpeg is installed: which ffmpeg

Transcription Errors

Check Python API logs (stdout/stderr)
Verify audio file format is supported by FFmpeg
Test with a small sample file first
Check available disk space in /tmp

Code Style Notes

Python: Uses Flask conventions, logging via Python logging module
PHP: Uses procedural style, cURL for HTTP requests
JavaScript: jQuery-based, uses AJAX for async file upload
CSS: BEM-like naming, black/gold theme with animations

14 KiB Executable file Raw Permalink Blame History