| .env.example | ||
| .gitignore | ||
| .htaccess | ||
| .user.ini | ||
| api.py | ||
| auth.php | ||
| auth_config.php | ||
| check_api.php | ||
| CLAUDE.md | ||
| composer.json | ||
| config.php | ||
| download.php | ||
| index.php | ||
| login.php | ||
| logout.php | ||
| process.php | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
| start_api.sh | ||
| style.css | ||
| test_download.php | ||
| V2T.svg | ||
Voice to Text with Whisper & DeepL Translation
A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
Features
- 🎤 Audio transcription using OpenAI Whisper
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend
- 💻 PHP frontend (MAMP/Apache compatible)
- 📦 350MB file size limit
- 📄 Generates both original and translated files
- 🎨 Modern black/gold UI with dark theme
- 📊 Real-time progress bar during processing
- 👀 In-page preview of transcriptions
- ⬇️ One-click download for all formats
Requirements
- Python 3.8 or higher
- PHP 7.4 or higher
- MAMP or Apache server
- FFmpeg (for audio processing)
- Composer (for PHP dependencies)
- Microsoft Azure AD application (for SSO authentication)
Installation
1. Configure Authentication
This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.
Step 1: Copy and configure environment file
cp .env.example .env
Step 2: Edit .env file with your Azure AD credentials:
AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
DEEPL_API_KEY=your_deepl_api_key_here
PYTHON_API_URL=http://localhost:5010
SESSION_TIMEOUT=28800
Step 3: Install PHP dependencies
composer install
2. Install FFmpeg
macOS:
brew install ffmpeg
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg
Windows: Download from https://ffmpeg.org/download.html
3. Setup Python Environment
Run the setup script:
chmod +x setup.sh
./setup.sh
This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory
4. Start the API Server
chmod +x start_api.sh
./start_api.sh
Or manually:
source venv/bin/activate
python api.py
The API will run on http://localhost:5010
5. Configure Web Server
Ensure your MAMP/Apache server points to this directory and PHP is enabled.
Usage
- Start the Python API server (see step 4 above)
- Open the web application in your browser (you'll see a login page)
- Click "Sign in with Microsoft" and authenticate with your Microsoft account
- After authentication, you'll be redirected to the main application
- Select output format (Text/VTT/SRT)
- (Optional) Enable translation and select target language
- Upload an audio file (max 350MB)
- Wait for processing
- Download original and/or translated transcription
- Your files are associated with your session and only accessible to you
Translation
The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
API Endpoints
POST /transcribe
Transcribe audio file to text/VTT/SRT
Parameters:
audio(file): Audio file to transcribeformat(string): Output format (txt/vtt/srt)
Response:
{
"success": true,
"text": "transcribed text...",
"filename": "output.txt",
"format": "txt"
}
GET /health
Health check endpoint
GET /download/
Download transcribed file
Whisper Models
The default model is base which provides a good balance of speed and accuracy.
Available models:
tiny- Fastest, least accuratebase- Good balance (default)small- Better accuracy, slowermedium- High accuracy, much slowerlarge- Best accuracy, very slow
To change the model, edit api.py line 24:
model = whisper.load_model("base") # Change to desired model
Authentication & Security
Microsoft Azure AD SSO
- Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
- Secure authentication without client secrets
- Session-based file access control
- Users can only download files they've uploaded in their current session
Session Management
- Secure session cookies (httponly, secure, samesite)
- Configurable session timeout (default: 8 hours)
- Session regeneration after login for security
File Access Control
- Files are tracked per-user session in
$_SESSION['user_files'] - Download attempts are validated against user's file list
- Unauthorized access attempts are logged and blocked
Important Security Notes
- Ensure your
.envfile is never committed to git (it's in.gitignore) - Use HTTPS in production for secure cookie transmission
- Files become inaccessible after session expires (files remain in
outputs/but can't be downloaded) - Consider setting up a cron job to clean old files from
outputs/directory
File Structure
.
├── api.py # Python Flask API with Whisper & DeepL
├── login.php # Landing page with Microsoft SSO
├── auth.php # OAuth2 PKCE authentication handler
├── logout.php # Session destruction handler
├── index.php # Main application interface (auth required)
├── process.php # PHP request handler (auth required)
├── download.php # File download handler (auth + ownership check)
├── check_api.php # API status checker (auth required)
├── test_download.php # Download functionality tester (auth required)
├── config.php # Configuration loader
├── auth_config.php # Authentication & environment config
├── style.css # Black/gold theme styles
├── .env # Environment variables (NOT in git)
├── .env.example # Environment variables template
├── .htaccess # PHP upload limits
├── .gitignore # Git ignore rules
├── composer.json # PHP dependencies
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── start_api.sh # API start script
├── README.md # This file
├── CLAUDE.md # Claude Code guidance
├── outputs/ # Transcribed files directory
├── vendor/ # Composer dependencies (NOT in git)
└── venv/ # Python virtual environment
Production Deployment (Apache)
Prerequisites
- Apache 2.4+
- PHP 7.4+ with mod_php or PHP-FPM
- Python 3.8+
- FFmpeg
- Root/sudo access for system configuration
Step 1: Install Required Apache Modules
Ubuntu/Debian:
sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2
CentOS/RHEL:
sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd
Step 2: Deploy Application Files
# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text
# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh
# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
Step 3: Configure Apache Virtual Host
Create /etc/apache2/sites-available/voice2text.conf:
<VirtualHost *:80>
ServerName voice2text.yourdomain.com
ServerAdmin admin@yourdomain.com
DocumentRoot /var/www/voice2text
<Directory /var/www/voice2text>
Options -Indexes +FollowSymLinks
AllowOverride All
Require all granted
# PHP settings for large uploads
php_value upload_max_filesize 350M
php_value post_max_size 350M
php_value max_execution_time 1200
php_value max_input_time 1200
php_value memory_limit 512M
</Directory>
# Protect sensitive files
<FilesMatch "^(config\.php|\.git|\.htaccess)">
Require all denied
</FilesMatch>
# Logging
ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>
Enable the site:
sudo a2ensite voice2text.conf
sudo systemctl reload apache2
Step 4: Configure PHP for Large Uploads
Edit /etc/php/7.4/apache2/php.ini (adjust version as needed):
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M
Restart Apache:
sudo systemctl restart apache2
Step 5: Setup Python API as Systemd Service
Create /etc/systemd/system/voice2text-api.service:
[Unit]
Description=Voice to Text Whisper API
After=network.target
[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10
# Security settings
NoNewPrivileges=true
PrivateTmp=true
# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api
# Check status
sudo systemctl status voice2text-api
# View logs
sudo journalctl -u voice2text-api -f
Step 6: Configure Firewall
UFW (Ubuntu):
sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp # Python API
sudo ufw enable
Firewalld (CentOS):
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload
Step 7: SSL Configuration (Optional but Recommended)
Using Let's Encrypt with Certbot:
# Install Certbot
sudo apt install certbot python3-certbot-apache
# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com
# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run
Step 8: Verify Deployment
- Check Apache status:
sudo systemctl status apache2 - Check API status:
sudo systemctl status voice2text-api - Visit:
http://voice2text.yourdomain.com/check_api.php - Test file upload with a small audio file
Monitoring and Maintenance
Check API Status
# View API logs
sudo journalctl -u voice2text-api -n 100
# Check if API is responding
curl http://localhost:5010/health
Check Apache Logs
# Error log
sudo tail -f /var/log/apache2/voice2text-error.log
# Access log
sudo tail -f /var/log/apache2/voice2text-access.log
Restart Services
# Restart Apache
sudo systemctl restart apache2
# Restart Python API
sudo systemctl restart voice2text-api
# Restart both
sudo systemctl restart apache2 voice2text-api
Clean Old Files
The outputs/ directory can grow large. Set up a cron job to clean old files:
# Edit crontab
sudo crontab -e
# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete
Troubleshooting
API Issues
API not connecting:
- Check if API is running:
sudo systemctl status voice2text-api - Test health endpoint:
curl http://localhost:5010/health - Check API logs:
sudo journalctl -u voice2text-api -n 50 - Verify firewall allows port 5010
- Visit
check_api.phpin browser for detailed status
API won't start:
- Check Python version:
python3 --version(must be 3.8+) - Verify virtual environment:
ls -la venv/ - Check dependencies:
source venv/bin/activate && pip list - Review error logs:
sudo journalctl -u voice2text-api -xe - Ensure FFmpeg is installed:
which ffmpeg
Upload Issues
File upload fails:
- Check file size limits in
php.ini - Verify
.htaccessis being read (requiresAllowOverride All) - Check disk space:
df -h - Verify
outputs/directory permissions:ls -ld outputs/ - Check Apache error log:
tail -f /var/log/apache2/error.log
"413 Request Entity Too Large":
- If using Nginx as reverse proxy, add to nginx config:
client_max_body_size 350M;
Transcription Issues
Transcription fails:
- Verify FFmpeg is installed:
ffmpeg -version - Check audio file format (supported: mp3, wav, m4a, etc.)
- Review API logs for specific errors
- Test with a small file first
- Ensure enough disk space in
/tmp
Slow transcription:
- Use a smaller Whisper model (
tinyorbase) - Consider using GPU acceleration (requires CUDA setup)
- Upgrade server hardware (more CPU/RAM)
- Reduce audio file length/quality
Translation Issues
Translation fails:
- Verify DeepL API key is valid in
config.php - Check DeepL API usage: https://www.deepl.com/pro-account
- Review API response for specific error messages
- Ensure internet connectivity for DeepL API
Permission Issues
403 Forbidden errors:
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
Can't write to outputs directory:
sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs
Performance Issues
Out of memory:
- Use a smaller Whisper model (
tinyorbase) - Increase PHP memory limit in
php.ini - Increase system swap space
- Add more RAM to server
Timeout errors:
- Increase PHP
max_execution_timeinphp.ini - Increase Apache timeout in virtual host config
- Process smaller audio files
- Use faster Whisper model
Debugging Tips
Enable debug mode:
Add to config.php:
error_reporting(E_ALL);
ini_set('display_errors', 1);
Check system resources:
# CPU and memory usage
htop
# Disk space
df -h
# Check running processes
ps aux | grep -E 'python|apache'
Test components individually:
- Test PHP: Create
test.phpwith<?php phpinfo(); ?> - Test Python API:
curl http://localhost:5010/health - Test file upload: Use small test file first
- Check browser console for JavaScript errors (F12)
License
MIT