2025-11-03 08:45:12 -06:00

22 KiB

Raw Blame History

Voice to Text with Whisper & DeepL Translation

A secure web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Features Microsoft Azure AD authentication and supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

Features

🔐 Microsoft Azure AD SSO authentication with OAuth2 PKCE flow
🎤 Audio transcription using OpenAI Whisper (multiple models available)
🌍 Translation using DeepL API (30+ languages)
📝 Multiple output formats: Text, VTT, SRT
🚀 Python Flask API backend (port 5010)
💻 PHP frontend (MAMP/Apache compatible with PHP-FPM)
📦 350MB file size limit for audio uploads
📄 Generates both original and translated files
🎨 Modern black/gold UI with dark theme
📊 Real-time progress bar during processing
👀 In-page preview of transcriptions
⬇️ One-click download for all formats
🔒 Session-based file access control - users can only access their own files
🧪 Dev mode for local testing without Microsoft authentication

Requirements

Required Software

Python 3.8+ (Recommended: 3.10 or 3.11)
PHP 7.4+ with PHP-FPM
MAMP or Apache web server
Composer for PHP dependency management
FFmpeg for audio processing

For Production (Optional for Local Dev)

Microsoft Azure AD application with registered redirect URI
HTTPS/SSL certificate (required for production secure cookies)

Quick Start (Local Development)

For local testing without Azure AD setup:

# 1. Copy environment file
cp .env.example .env

# 2. Enable dev mode in .env
# Edit .env and set: DEV_MODE=true

# 3. Install PHP dependencies
composer install

# 4. Install Python dependencies
./setup.sh

# 5. Start Python API
./start_api.sh

# 6. Access via MAMP at http://localhost:8888/voice2text/

Dev Mode: When DEV_MODE=true, authentication is bypassed and you'll see a mock user "Dev User (Local)". This allows testing without Azure AD configuration.

Full Installation

1. Configure Authentication

This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.

Step 1: Copy and configure environment file

cp .env.example .env

Step 2: Edit .env file with your configuration:

# Set to true for local testing (bypasses Microsoft auth)
DEV_MODE=false

# Azure AD Configuration (required for production)
AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/

# API Keys
DEEPL_API_KEY=your_deepl_api_key_here

# API Configuration
PYTHON_API_URL=http://localhost:5010

# Session timeout in seconds (default: 8 hours)
SESSION_TIMEOUT=28800

For Local Development:

Set DEV_MODE=true to bypass authentication
Azure credentials not required when in dev mode

For Production:

Set DEV_MODE=false
Configure valid Azure AD credentials
Register your redirect URI in Azure AD Portal

Step 3: Install PHP dependencies

composer install

This installs:

league/oauth2-client - OAuth2 PKCE authentication
vlucas/phpdotenv - Environment variable management

2. Install FFmpeg

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

3. Setup Python Environment

Run the setup script:

chmod +x setup.sh
./setup.sh

This will:

Create a Python virtual environment
Install all dependencies (Flask, Whisper, etc.)
Create the outputs directory

4. Start the API Server

chmod +x start_api.sh
./start_api.sh

Or manually:

source venv/bin/activate
python api.py

The API will run on http://localhost:5010

5. Configure Web Server

MAMP Setup:

Point MAMP document root to this directory
Ensure PHP is enabled (PHP 7.4+ recommended)
IMPORTANT: MAMP uses PHP-FPM, so PHP configuration is in .user.ini (not .htaccess)
Restart MAMP servers after changing .user.ini
Access at: http://localhost:8888/voice2text/

Apache Setup:

See "Production Deployment (Apache)" section below for full Apache configuration

Usage

Development Mode (DEV_MODE=true)

Start the Python API server: ./start_api.sh
Open the web application in your browser
You'll be automatically logged in as "Dev User (Local)"
See orange "DEV MODE ACTIVE" banner at the top
Select output format (Text/VTT/SRT)
(Optional) Enable translation and select target language
Upload an audio file (max 350MB)
Wait for processing
Download original and/or translated transcription

Production Mode (DEV_MODE=false)

Start the Python API server: ./start_api.sh
Open the web application in your browser
You'll see a login page with "Sign in with Microsoft" button
Click and authenticate with your Microsoft account
After authentication, you'll be redirected to the main application
See your name and email in the user header
Select output format (Text/VTT/SRT)
(Optional) Enable translation and select target language
Upload an audio file (max 350MB)
Wait for processing - see real-time progress bar
View transcription preview in-page (truncated at 10,000 chars)
Download original and/or translated transcription files
Your files are associated with your session and only accessible to you
Click "Logout" when finished

Translation

The app uses DeepL API for high-quality translations. When translation is enabled:

The audio is first transcribed in its original language
The transcription is then translated to your selected target language
Both original and translated files are generated
Both files are tracked in your session for access control
Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

File Upload Configuration

MAMP (PHP-FPM): PHP settings are configured in .user.ini (automatically created during setup):

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M

Note: Changes to .user.ini require MAMP restart. It may take up to 5 minutes to take effect.

Apache (mod_php): Uncomment the settings in .htaccess or configure in php.ini

API Endpoints

POST /transcribe

Transcribe audio file to text/VTT/SRT

Parameters:

audio (file): Audio file to transcribe
format (string): Output format (txt/vtt/srt)

Response:

{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}

GET /health

Health check endpoint

GET /download/

Download transcribed file

Whisper Models

The default model is base which provides a good balance of speed and accuracy.

Available models:

tiny - Fastest, least accurate
base - Good balance (default)
small - Better accuracy, slower
medium - High accuracy, much slower
large - Best accuracy, very slow

To change the model, edit api.py line 24:

model = whisper.load_model("base")  # Change to desired model

Authentication & Security

Microsoft Azure AD SSO (Production)

OAuth2 with PKCE (Proof Key for Code Exchange) flow - RFC 7636
No client secrets needed - secure public client authentication
Code challenge: SHA256 hash of random 64-character verifier
State validation: CSRF protection on callback
Microsoft Graph API: Retrieves user profile information

Development Mode

DEV_MODE=true: Bypasses authentication entirely
Mock user: Auto-creates session with "Dev User (Local)"
Visual indicator: Orange banner shows when dev mode is active
No Microsoft credentials required: Perfect for local testing

Session Management

Secure session cookies: httponly, secure (HTTPS only), samesite=Lax
Session timeout: Configurable (default: 8 hours)
Session regeneration: After login to prevent fixation attacks
Auto-timeout: Sessions expire and require re-authentication

File Access Control

Session-based tracking: Files stored in $_SESSION['user_files'] array
Upload tracking: Files automatically added to user's session on transcription
Download validation: Only files in user's session can be downloaded
Ownership logging: Unauthorized attempts logged with user ID
No cross-user access: Users cannot access other users' files

Security Architecture

User Request
    ↓
Authentication Check (isAuthenticated())
    ↓
Process Request (process.php)
    ↓
Add Files to Session ($_SESSION['user_files'])
    ↓
Download Request (download.php)
    ↓
Verify File in User's Session
    ↓
Serve File or 403 Forbidden

Important Security Notes

✅ .env file excluded from git (contains sensitive credentials)
✅ HTTPS required in production for secure cookie transmission
⚠️ Files persist in outputs/ after session expires - can't be downloaded but still exist on disk
💡 Recommendation: Set up cron job to clean old files from outputs/ directory
🔒 Session-only access: Files become inaccessible when session expires or user logs out
🚨 Dev mode security: Only use DEV_MODE=true for local development, never in production

File Structure

voice2text/
├── api.py                  # Python Flask API with Whisper & DeepL
├── login.php               # Landing page with Microsoft SSO button
├── auth.php                # OAuth2 PKCE authentication handler
├── logout.php              # Session destruction handler
├── index.php               # Main application interface (auth required)
├── process.php             # PHP request handler (auth + file tracking)
├── download.php            # File download handler (auth + ownership check)
├── check_api.php           # API status checker (auth required)
├── test_download.php       # Download functionality tester (auth required)
├── config.php              # Configuration loader
├── auth_config.php         # Authentication & environment config
├── style.css               # Black/gold theme styles
├── V2T.svg                 # Application logo
│
├── .env                    # Environment variables (NOT in git) ⚠️
├── .env.example            # Environment variables template
├── .user.ini               # PHP-FPM configuration (MAMP)
├── .htaccess               # Apache rewrite rules
├── .gitignore              # Git ignore rules
│
├── composer.json           # PHP dependencies manifest
├── composer.lock           # PHP dependency lock file (NOT in git)
├── requirements.txt        # Python dependencies
├── setup.sh                # Python environment setup script
├── start_api.sh            # Python API start script
│
├── README.md               # This file - comprehensive documentation
├── CLAUDE.md               # Claude Code guidance for AI assistance
│
├── outputs/                # Transcribed files directory (files NOT in git)
├── vendor/                 # Composer PHP dependencies (NOT in git)
└── venv/                   # Python virtual environment (NOT in git)

Key Files Explained:
- .env: Contains secrets (Azure credentials, API keys) - NEVER commit
- .user.ini: PHP settings for PHP-FPM (MAMP) - upload limits, timeouts
- auth_config.php: Core authentication logic and session management
- process.php: Handles uploads, calls Python API, tracks files in session
- download.php: Validates ownership before serving files

Production Deployment (Apache)

Prerequisites

Apache 2.4+
PHP 7.4+ with mod_php or PHP-FPM
Python 3.8+
FFmpeg
Root/sudo access for system configuration

Step 1: Install Required Apache Modules

Ubuntu/Debian:

sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2

CentOS/RHEL:

sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd

Step 2: Deploy Application Files

# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text

# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh

# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Step 3: Configure Apache Virtual Host

Create /etc/apache2/sites-available/voice2text.conf:

<VirtualHost *:80>
    ServerName voice2text.yourdomain.com
    ServerAdmin admin@yourdomain.com

    DocumentRoot /var/www/voice2text

    <Directory /var/www/voice2text>
        Options -Indexes +FollowSymLinks
        AllowOverride All
        Require all granted

        # PHP settings for large uploads
        php_value upload_max_filesize 350M
        php_value post_max_size 350M
        php_value max_execution_time 1200
        php_value max_input_time 1200
        php_value memory_limit 512M
    </Directory>

    # Protect sensitive files
    <FilesMatch "^(config\.php|\.git|\.htaccess)">
        Require all denied
    </FilesMatch>

    # Logging
    ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
    CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>

Enable the site:

sudo a2ensite voice2text.conf
sudo systemctl reload apache2

Step 4: Configure PHP for Large Uploads

Edit /etc/php/7.4/apache2/php.ini (adjust version as needed):

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M

Restart Apache:

sudo systemctl restart apache2

Step 5: Setup Python API as Systemd Service

Create /etc/systemd/system/voice2text-api.service:

[Unit]
Description=Voice to Text Whisper API
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10

# Security settings
NoNewPrivileges=true
PrivateTmp=true

# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api

# Check status
sudo systemctl status voice2text-api

# View logs
sudo journalctl -u voice2text-api -f

Step 6: Configure Firewall

UFW (Ubuntu):

sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp  # Python API
sudo ufw enable

Firewalld (CentOS):

sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload

Step 7: SSL Configuration (Optional but Recommended)

Using Let's Encrypt with Certbot:

# Install Certbot
sudo apt install certbot python3-certbot-apache

# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com

# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run

Step 8: Verify Deployment

Check Apache status: sudo systemctl status apache2
Check API status: sudo systemctl status voice2text-api
Visit: http://voice2text.yourdomain.com/check_api.php
Test file upload with a small audio file

Monitoring and Maintenance

Check API Status

# View API logs
sudo journalctl -u voice2text-api -n 100

# Check if API is responding
curl http://localhost:5010/health

Check Apache Logs

# Error log
sudo tail -f /var/log/apache2/voice2text-error.log

# Access log
sudo tail -f /var/log/apache2/voice2text-access.log

Restart Services

# Restart Apache
sudo systemctl restart apache2

# Restart Python API
sudo systemctl restart voice2text-api

# Restart both
sudo systemctl restart apache2 voice2text-api

Clean Old Files

The outputs/ directory can grow large. Set up a cron job to clean old files:

# Edit crontab
sudo crontab -e

# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete

Troubleshooting

MAMP/PHP-FPM Issues

"Invalid command 'php_value'" error:

This means MAMP is using PHP-FPM instead of mod_php
Solution: Use .user.ini instead of .htaccess for PHP settings
The .user.ini file should already be created
Restart MAMP servers after any changes to .user.ini

"Session ini settings cannot be changed" warnings:

Cause: session_start() being called before session configuration
Solution: Always load config.php BEFORE starting sessions
Fixed in current version - if you see this, ensure you have latest code

Changes to .user.ini not taking effect:

PHP-FPM caches .user.ini for up to 5 minutes
Solution: Restart MAMP servers completely
Wait a few minutes or check phpinfo() to verify settings

Authentication Issues

Stuck on login page in dev mode:

Check .env file: DEV_MODE=true
Clear browser cookies and cache
Restart MAMP servers
Verify auth_config.php has latest dev mode logic

Microsoft authentication fails:

Ensure DEV_MODE=false in .env
Verify Azure AD credentials are correct
Check redirect URI matches Azure AD Portal configuration
Ensure redirect URI ends with trailing slash if configured that way
Check browser console for detailed OAuth errors

"Authentication required" on every page:

Session may not be persisting
Check browser allows cookies
Verify .user.ini session settings are loaded
Try clearing browser cookies

API Issues

API not connecting:

Check if API is running: sudo systemctl status voice2text-api
Test health endpoint: curl http://localhost:5010/health
Check API logs: sudo journalctl -u voice2text-api -n 50
Verify firewall allows port 5010
Visit check_api.php in browser for detailed status

API won't start:

Check Python version: python3 --version (must be 3.8+)
Verify virtual environment: ls -la venv/
Check dependencies: source venv/bin/activate && pip list
Review error logs: sudo journalctl -u voice2text-api -xe
Ensure FFmpeg is installed: which ffmpeg

Upload Issues

File upload fails:

Check file size limits in php.ini
Verify .htaccess is being read (requires AllowOverride All)
Check disk space: df -h
Verify outputs/ directory permissions: ls -ld outputs/
Check Apache error log: tail -f /var/log/apache2/error.log

"413 Request Entity Too Large":

If using Nginx as reverse proxy, add to nginx config:

client_max_body_size 350M;

Transcription Issues

Transcription fails:

Verify FFmpeg is installed: ffmpeg -version
Check audio file format (supported: mp3, wav, m4a, etc.)
Review API logs for specific errors
Test with a small file first
Ensure enough disk space in /tmp

Slow transcription:

Use a smaller Whisper model (tiny or base)
Consider using GPU acceleration (requires CUDA setup)
Upgrade server hardware (more CPU/RAM)
Reduce audio file length/quality

Translation Issues

Translation fails:

Verify DeepL API key is valid in config.php
Check DeepL API usage: https://www.deepl.com/pro-account
Review API response for specific error messages
Ensure internet connectivity for DeepL API

Permission Issues

403 Forbidden errors:

sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Can't write to outputs directory:

sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs

Performance Issues

Out of memory:

Use a smaller Whisper model (tiny or base)
Increase PHP memory limit in php.ini
Increase system swap space
Add more RAM to server

Timeout errors:

Increase PHP max_execution_time in php.ini
Increase Apache timeout in virtual host config
Process smaller audio files
Use faster Whisper model

Debugging Tips

Enable debug mode: Add to config.php:

error_reporting(E_ALL);
ini_set('display_errors', 1);

Check system resources:

# CPU and memory usage
htop

# Disk space
df -h

# Check running processes
ps aux | grep -E 'python|apache'

Test components individually:

Test PHP: Create test.php with <?php phpinfo(); ?>
Test Python API: curl http://localhost:5010/health
Test file upload: Use small test file first
Check browser console for JavaScript errors (F12)

Quick Reference

Development Commands

# Start Python API
./start_api.sh
# or manually:
source venv/bin/activate && python api.py

# Check Python API status
curl http://localhost:5010/health

# Install/update PHP dependencies
composer install

# Install/update Python dependencies
./setup.sh

# View Python API output
# (if running in background, check terminal where you started it)

Configuration Files

.env - Environment variables (authentication, API keys, dev mode)
.user.ini - PHP settings for MAMP/PHP-FPM (upload limits, timeouts)
api.py line 26 - Whisper model selection (tiny, base, small, medium, large)

Dev Mode Toggle

# Edit .env file:
DEV_MODE=true   # Local testing - bypasses authentication
DEV_MODE=false  # Production - requires Microsoft SSO

Diagnostic Pages

check_api.php - Verify Python API connection and view status
test_download.php - View your accessible files and test downloads
Access both at: http://localhost:8888/voice2text/check_api.php

Log Locations

Python API: Terminal output where ./start_api.sh was run
PHP errors: Check MAMP logs directory
Download attempts: PHP error log (unauthorized attempts are logged)

License

MIT

22 KiB Raw Blame History