22 KiB
Voice to Text with Whisper & DeepL Translation
A secure web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Features Microsoft Azure AD authentication and supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
Features
- 🔐 Microsoft Azure AD SSO authentication with OAuth2 PKCE flow
- 🎤 Audio transcription using OpenAI Whisper (multiple models available)
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend (port 5010)
- 💻 PHP frontend (MAMP/Apache compatible with PHP-FPM)
- 📦 350MB file size limit for audio uploads
- 📄 Generates both original and translated files
- 🎨 Modern black/gold UI with dark theme
- 📊 Real-time progress bar during processing
- 👀 In-page preview of transcriptions
- ⬇️ One-click download for all formats
- 🔒 Session-based file access control - users can only access their own files
- 🧪 Dev mode for local testing without Microsoft authentication
Requirements
Required Software
- Python 3.8+ (Recommended: 3.10 or 3.11)
- PHP 7.4+ with PHP-FPM
- MAMP or Apache web server
- Composer for PHP dependency management
- FFmpeg for audio processing
For Production (Optional for Local Dev)
- Microsoft Azure AD application with registered redirect URI
- HTTPS/SSL certificate (required for production secure cookies)
Quick Start (Local Development)
For local testing without Azure AD setup:
# 1. Copy environment file
cp .env.example .env
# 2. Enable dev mode in .env
# Edit .env and set: DEV_MODE=true
# 3. Install PHP dependencies
composer install
# 4. Install Python dependencies
./setup.sh
# 5. Start Python API
./start_api.sh
# 6. Access via MAMP at http://localhost:8888/voice2text/
Dev Mode: When DEV_MODE=true, authentication is bypassed and you'll see a mock user "Dev User (Local)". This allows testing without Azure AD configuration.
Full Installation
1. Configure Authentication
This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.
Step 1: Copy and configure environment file
cp .env.example .env
Step 2: Edit .env file with your configuration:
# Set to true for local testing (bypasses Microsoft auth)
DEV_MODE=false
# Azure AD Configuration (required for production)
AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
# API Keys
DEEPL_API_KEY=your_deepl_api_key_here
# API Configuration
PYTHON_API_URL=http://localhost:5010
# Session timeout in seconds (default: 8 hours)
SESSION_TIMEOUT=28800
For Local Development:
- Set
DEV_MODE=trueto bypass authentication - Azure credentials not required when in dev mode
For Production:
- Set
DEV_MODE=false - Configure valid Azure AD credentials
- Register your redirect URI in Azure AD Portal
Step 3: Install PHP dependencies
composer install
This installs:
league/oauth2-client- OAuth2 PKCE authenticationvlucas/phpdotenv- Environment variable management
2. Install FFmpeg
macOS:
brew install ffmpeg
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg
Windows: Download from https://ffmpeg.org/download.html
3. Setup Python Environment
Run the setup script:
chmod +x setup.sh
./setup.sh
This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory
4. Start the API Server
chmod +x start_api.sh
./start_api.sh
Or manually:
source venv/bin/activate
python api.py
The API will run on http://localhost:5010
5. Configure Web Server
MAMP Setup:
- Point MAMP document root to this directory
- Ensure PHP is enabled (PHP 7.4+ recommended)
- IMPORTANT: MAMP uses PHP-FPM, so PHP configuration is in
.user.ini(not.htaccess) - Restart MAMP servers after changing
.user.ini - Access at:
http://localhost:8888/voice2text/
Apache Setup:
- See "Production Deployment (Apache)" section below for full Apache configuration
Usage
Development Mode (DEV_MODE=true)
- Start the Python API server:
./start_api.sh - Open the web application in your browser
- You'll be automatically logged in as "Dev User (Local)"
- See orange "DEV MODE ACTIVE" banner at the top
- Select output format (Text/VTT/SRT)
- (Optional) Enable translation and select target language
- Upload an audio file (max 350MB)
- Wait for processing
- Download original and/or translated transcription
Production Mode (DEV_MODE=false)
- Start the Python API server:
./start_api.sh - Open the web application in your browser
- You'll see a login page with "Sign in with Microsoft" button
- Click and authenticate with your Microsoft account
- After authentication, you'll be redirected to the main application
- See your name and email in the user header
- Select output format (Text/VTT/SRT)
- (Optional) Enable translation and select target language
- Upload an audio file (max 350MB)
- Wait for processing - see real-time progress bar
- View transcription preview in-page (truncated at 10,000 chars)
- Download original and/or translated transcription files
- Your files are associated with your session and only accessible to you
- Click "Logout" when finished
Translation
The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Both files are tracked in your session for access control
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
File Upload Configuration
MAMP (PHP-FPM):
PHP settings are configured in .user.ini (automatically created during setup):
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M
Note: Changes to .user.ini require MAMP restart. It may take up to 5 minutes to take effect.
Apache (mod_php):
Uncomment the settings in .htaccess or configure in php.ini
API Endpoints
POST /transcribe
Transcribe audio file to text/VTT/SRT
Parameters:
audio(file): Audio file to transcribeformat(string): Output format (txt/vtt/srt)
Response:
{
"success": true,
"text": "transcribed text...",
"filename": "output.txt",
"format": "txt"
}
GET /health
Health check endpoint
GET /download/
Download transcribed file
Whisper Models
The default model is base which provides a good balance of speed and accuracy.
Available models:
tiny- Fastest, least accuratebase- Good balance (default)small- Better accuracy, slowermedium- High accuracy, much slowerlarge- Best accuracy, very slow
To change the model, edit api.py line 24:
model = whisper.load_model("base") # Change to desired model
Authentication & Security
Microsoft Azure AD SSO (Production)
- OAuth2 with PKCE (Proof Key for Code Exchange) flow - RFC 7636
- No client secrets needed - secure public client authentication
- Code challenge: SHA256 hash of random 64-character verifier
- State validation: CSRF protection on callback
- Microsoft Graph API: Retrieves user profile information
Development Mode
- DEV_MODE=true: Bypasses authentication entirely
- Mock user: Auto-creates session with "Dev User (Local)"
- Visual indicator: Orange banner shows when dev mode is active
- No Microsoft credentials required: Perfect for local testing
Session Management
- Secure session cookies: httponly, secure (HTTPS only), samesite=Lax
- Session timeout: Configurable (default: 8 hours)
- Session regeneration: After login to prevent fixation attacks
- Auto-timeout: Sessions expire and require re-authentication
File Access Control
- Session-based tracking: Files stored in
$_SESSION['user_files']array - Upload tracking: Files automatically added to user's session on transcription
- Download validation: Only files in user's session can be downloaded
- Ownership logging: Unauthorized attempts logged with user ID
- No cross-user access: Users cannot access other users' files
Security Architecture
User Request
↓
Authentication Check (isAuthenticated())
↓
Process Request (process.php)
↓
Add Files to Session ($_SESSION['user_files'])
↓
Download Request (download.php)
↓
Verify File in User's Session
↓
Serve File or 403 Forbidden
Important Security Notes
- ✅
.envfile excluded from git (contains sensitive credentials) - ✅ HTTPS required in production for secure cookie transmission
- ⚠️ Files persist in
outputs/after session expires - can't be downloaded but still exist on disk - 💡 Recommendation: Set up cron job to clean old files from
outputs/directory - 🔒 Session-only access: Files become inaccessible when session expires or user logs out
- 🚨 Dev mode security: Only use
DEV_MODE=truefor local development, never in production
File Structure
voice2text/
├── api.py # Python Flask API with Whisper & DeepL
├── login.php # Landing page with Microsoft SSO button
├── auth.php # OAuth2 PKCE authentication handler
├── logout.php # Session destruction handler
├── index.php # Main application interface (auth required)
├── process.php # PHP request handler (auth + file tracking)
├── download.php # File download handler (auth + ownership check)
├── check_api.php # API status checker (auth required)
├── test_download.php # Download functionality tester (auth required)
├── config.php # Configuration loader
├── auth_config.php # Authentication & environment config
├── style.css # Black/gold theme styles
├── V2T.svg # Application logo
│
├── .env # Environment variables (NOT in git) ⚠️
├── .env.example # Environment variables template
├── .user.ini # PHP-FPM configuration (MAMP)
├── .htaccess # Apache rewrite rules
├── .gitignore # Git ignore rules
│
├── composer.json # PHP dependencies manifest
├── composer.lock # PHP dependency lock file (NOT in git)
├── requirements.txt # Python dependencies
├── setup.sh # Python environment setup script
├── start_api.sh # Python API start script
│
├── README.md # This file - comprehensive documentation
├── CLAUDE.md # Claude Code guidance for AI assistance
│
├── outputs/ # Transcribed files directory (files NOT in git)
├── vendor/ # Composer PHP dependencies (NOT in git)
└── venv/ # Python virtual environment (NOT in git)
Key Files Explained:
- .env: Contains secrets (Azure credentials, API keys) - NEVER commit
- .user.ini: PHP settings for PHP-FPM (MAMP) - upload limits, timeouts
- auth_config.php: Core authentication logic and session management
- process.php: Handles uploads, calls Python API, tracks files in session
- download.php: Validates ownership before serving files
Production Deployment (Apache)
Prerequisites
- Apache 2.4+
- PHP 7.4+ with mod_php or PHP-FPM
- Python 3.8+
- FFmpeg
- Root/sudo access for system configuration
Step 1: Install Required Apache Modules
Ubuntu/Debian:
sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2
CentOS/RHEL:
sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd
Step 2: Deploy Application Files
# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text
# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh
# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
Step 3: Configure Apache Virtual Host
Create /etc/apache2/sites-available/voice2text.conf:
<VirtualHost *:80>
ServerName voice2text.yourdomain.com
ServerAdmin admin@yourdomain.com
DocumentRoot /var/www/voice2text
<Directory /var/www/voice2text>
Options -Indexes +FollowSymLinks
AllowOverride All
Require all granted
# PHP settings for large uploads
php_value upload_max_filesize 350M
php_value post_max_size 350M
php_value max_execution_time 1200
php_value max_input_time 1200
php_value memory_limit 512M
</Directory>
# Protect sensitive files
<FilesMatch "^(config\.php|\.git|\.htaccess)">
Require all denied
</FilesMatch>
# Logging
ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>
Enable the site:
sudo a2ensite voice2text.conf
sudo systemctl reload apache2
Step 4: Configure PHP for Large Uploads
Edit /etc/php/7.4/apache2/php.ini (adjust version as needed):
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M
Restart Apache:
sudo systemctl restart apache2
Step 5: Setup Python API as Systemd Service
Create /etc/systemd/system/voice2text-api.service:
[Unit]
Description=Voice to Text Whisper API
After=network.target
[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10
# Security settings
NoNewPrivileges=true
PrivateTmp=true
# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api
# Check status
sudo systemctl status voice2text-api
# View logs
sudo journalctl -u voice2text-api -f
Step 6: Configure Firewall
UFW (Ubuntu):
sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp # Python API
sudo ufw enable
Firewalld (CentOS):
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload
Step 7: SSL Configuration (Optional but Recommended)
Using Let's Encrypt with Certbot:
# Install Certbot
sudo apt install certbot python3-certbot-apache
# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com
# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run
Step 8: Verify Deployment
- Check Apache status:
sudo systemctl status apache2 - Check API status:
sudo systemctl status voice2text-api - Visit:
http://voice2text.yourdomain.com/check_api.php - Test file upload with a small audio file
Monitoring and Maintenance
Check API Status
# View API logs
sudo journalctl -u voice2text-api -n 100
# Check if API is responding
curl http://localhost:5010/health
Check Apache Logs
# Error log
sudo tail -f /var/log/apache2/voice2text-error.log
# Access log
sudo tail -f /var/log/apache2/voice2text-access.log
Restart Services
# Restart Apache
sudo systemctl restart apache2
# Restart Python API
sudo systemctl restart voice2text-api
# Restart both
sudo systemctl restart apache2 voice2text-api
Clean Old Files
The outputs/ directory can grow large. Set up a cron job to clean old files:
# Edit crontab
sudo crontab -e
# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete
Troubleshooting
MAMP/PHP-FPM Issues
"Invalid command 'php_value'" error:
- This means MAMP is using PHP-FPM instead of mod_php
- Solution: Use
.user.iniinstead of.htaccessfor PHP settings - The
.user.inifile should already be created - Restart MAMP servers after any changes to
.user.ini
"Session ini settings cannot be changed" warnings:
- Cause:
session_start()being called before session configuration - Solution: Always load
config.phpBEFORE starting sessions - Fixed in current version - if you see this, ensure you have latest code
Changes to .user.ini not taking effect:
- PHP-FPM caches
.user.inifor up to 5 minutes - Solution: Restart MAMP servers completely
- Wait a few minutes or check
phpinfo()to verify settings
Authentication Issues
Stuck on login page in dev mode:
- Check
.envfile:DEV_MODE=true - Clear browser cookies and cache
- Restart MAMP servers
- Verify
auth_config.phphas latest dev mode logic
Microsoft authentication fails:
- Ensure
DEV_MODE=falsein.env - Verify Azure AD credentials are correct
- Check redirect URI matches Azure AD Portal configuration
- Ensure redirect URI ends with trailing slash if configured that way
- Check browser console for detailed OAuth errors
"Authentication required" on every page:
- Session may not be persisting
- Check browser allows cookies
- Verify
.user.inisession settings are loaded - Try clearing browser cookies
API Issues
API not connecting:
- Check if API is running:
sudo systemctl status voice2text-api - Test health endpoint:
curl http://localhost:5010/health - Check API logs:
sudo journalctl -u voice2text-api -n 50 - Verify firewall allows port 5010
- Visit
check_api.phpin browser for detailed status
API won't start:
- Check Python version:
python3 --version(must be 3.8+) - Verify virtual environment:
ls -la venv/ - Check dependencies:
source venv/bin/activate && pip list - Review error logs:
sudo journalctl -u voice2text-api -xe - Ensure FFmpeg is installed:
which ffmpeg
Upload Issues
File upload fails:
- Check file size limits in
php.ini - Verify
.htaccessis being read (requiresAllowOverride All) - Check disk space:
df -h - Verify
outputs/directory permissions:ls -ld outputs/ - Check Apache error log:
tail -f /var/log/apache2/error.log
"413 Request Entity Too Large":
- If using Nginx as reverse proxy, add to nginx config:
client_max_body_size 350M;
Transcription Issues
Transcription fails:
- Verify FFmpeg is installed:
ffmpeg -version - Check audio file format (supported: mp3, wav, m4a, etc.)
- Review API logs for specific errors
- Test with a small file first
- Ensure enough disk space in
/tmp
Slow transcription:
- Use a smaller Whisper model (
tinyorbase) - Consider using GPU acceleration (requires CUDA setup)
- Upgrade server hardware (more CPU/RAM)
- Reduce audio file length/quality
Translation Issues
Translation fails:
- Verify DeepL API key is valid in
config.php - Check DeepL API usage: https://www.deepl.com/pro-account
- Review API response for specific error messages
- Ensure internet connectivity for DeepL API
Permission Issues
403 Forbidden errors:
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
Can't write to outputs directory:
sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs
Performance Issues
Out of memory:
- Use a smaller Whisper model (
tinyorbase) - Increase PHP memory limit in
php.ini - Increase system swap space
- Add more RAM to server
Timeout errors:
- Increase PHP
max_execution_timeinphp.ini - Increase Apache timeout in virtual host config
- Process smaller audio files
- Use faster Whisper model
Debugging Tips
Enable debug mode:
Add to config.php:
error_reporting(E_ALL);
ini_set('display_errors', 1);
Check system resources:
# CPU and memory usage
htop
# Disk space
df -h
# Check running processes
ps aux | grep -E 'python|apache'
Test components individually:
- Test PHP: Create
test.phpwith<?php phpinfo(); ?> - Test Python API:
curl http://localhost:5010/health - Test file upload: Use small test file first
- Check browser console for JavaScript errors (F12)
Quick Reference
Development Commands
# Start Python API
./start_api.sh
# or manually:
source venv/bin/activate && python api.py
# Check Python API status
curl http://localhost:5010/health
# Install/update PHP dependencies
composer install
# Install/update Python dependencies
./setup.sh
# View Python API output
# (if running in background, check terminal where you started it)
Configuration Files
.env- Environment variables (authentication, API keys, dev mode).user.ini- PHP settings for MAMP/PHP-FPM (upload limits, timeouts)api.pyline 26 - Whisper model selection (tiny,base,small,medium,large)
Dev Mode Toggle
# Edit .env file:
DEV_MODE=true # Local testing - bypasses authentication
DEV_MODE=false # Production - requires Microsoft SSO
Diagnostic Pages
check_api.php- Verify Python API connection and view statustest_download.php- View your accessible files and test downloads- Access both at:
http://localhost:8888/voice2text/check_api.php
Log Locations
- Python API: Terminal output where
./start_api.shwas run - PHP errors: Check MAMP logs directory
- Download attempts: PHP error log (unauthorized attempts are logged)
License
MIT