# Voice to Text with Whisper & DeepL Translation A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip). ## Features - 🎤 Audio transcription using OpenAI Whisper - 🌍 Translation using DeepL API (30+ languages) - 📝 Multiple output formats: Text, VTT, SRT - 🚀 Python Flask API backend - 💻 PHP frontend (MAMP/Apache compatible) - 📦 350MB file size limit - 📄 Generates both original and translated files - 🎨 Modern black/gold UI with dark theme - 📊 Real-time progress bar during processing - 👀 In-page preview of transcriptions - ⬇️ One-click download for all formats ## Requirements - Python 3.8 or higher - PHP 7.4 or higher - MAMP or Apache server - FFmpeg (for audio processing) - Composer (for PHP dependencies) - Microsoft Azure AD application (for SSO authentication) ## Installation ### 1. Configure Authentication This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow. **Step 1: Copy and configure environment file** ```bash cp .env.example .env ``` **Step 2: Edit `.env` file with your Azure AD credentials:** ```env AZURE_CLIENT_ID=your_client_id_here AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/ DEEPL_API_KEY=your_deepl_api_key_here PYTHON_API_URL=http://localhost:5010 SESSION_TIMEOUT=28800 ``` **Step 3: Install PHP dependencies** ```bash composer install ``` ### 2. Install FFmpeg **macOS:** ```bash brew install ffmpeg ``` **Linux (Ubuntu/Debian):** ```bash sudo apt update sudo apt install ffmpeg ``` **Windows:** Download from https://ffmpeg.org/download.html ### 3. Setup Python Environment Run the setup script: ```bash chmod +x setup.sh ./setup.sh ``` This will: - Create a Python virtual environment - Install all dependencies (Flask, Whisper, etc.) - Create the outputs directory ### 4. Start the API Server ```bash chmod +x start_api.sh ./start_api.sh ``` Or manually: ```bash source venv/bin/activate python api.py ``` The API will run on http://localhost:5010 ### 5. Configure Web Server Ensure your MAMP/Apache server points to this directory and PHP is enabled. ## Usage 1. Start the Python API server (see step 4 above) 2. Open the web application in your browser (you'll see a login page) 3. Click "Sign in with Microsoft" and authenticate with your Microsoft account 4. After authentication, you'll be redirected to the main application 5. Select output format (Text/VTT/SRT) 6. (Optional) Enable translation and select target language 7. Upload an audio file (max 350MB) 8. Wait for processing 9. Download original and/or translated transcription 10. Your files are associated with your session and only accessible to you ### Translation The app uses DeepL API for high-quality translations. When translation is enabled: - The audio is first transcribed in its original language - The transcription is then translated to your selected target language - Both original and translated files are generated - Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more **Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`: ``` upload_max_filesize = 350M post_max_size = 350M max_execution_time = 1200 ``` ## API Endpoints ### POST /transcribe Transcribe audio file to text/VTT/SRT **Parameters:** - `audio` (file): Audio file to transcribe - `format` (string): Output format (txt/vtt/srt) **Response:** ```json { "success": true, "text": "transcribed text...", "filename": "output.txt", "format": "txt" } ``` ### GET /health Health check endpoint ### GET /download/ Download transcribed file ## Whisper Models The default model is `base` which provides a good balance of speed and accuracy. Available models: - `tiny` - Fastest, least accurate - `base` - Good balance (default) - `small` - Better accuracy, slower - `medium` - High accuracy, much slower - `large` - Best accuracy, very slow To change the model, edit `api.py` line 24: ```python model = whisper.load_model("base") # Change to desired model ``` ## Authentication & Security ### Microsoft Azure AD SSO - Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow - Secure authentication without client secrets - Session-based file access control - Users can only download files they've uploaded in their current session ### Session Management - Secure session cookies (httponly, secure, samesite) - Configurable session timeout (default: 8 hours) - Session regeneration after login for security ### File Access Control - Files are tracked per-user session in `$_SESSION['user_files']` - Download attempts are validated against user's file list - Unauthorized access attempts are logged and blocked ### Important Security Notes - Ensure your `.env` file is never committed to git (it's in `.gitignore`) - Use HTTPS in production for secure cookie transmission - Files become inaccessible after session expires (files remain in `outputs/` but can't be downloaded) - Consider setting up a cron job to clean old files from `outputs/` directory ## File Structure ``` . ├── api.py # Python Flask API with Whisper & DeepL ├── login.php # Landing page with Microsoft SSO ├── auth.php # OAuth2 PKCE authentication handler ├── logout.php # Session destruction handler ├── index.php # Main application interface (auth required) ├── process.php # PHP request handler (auth required) ├── download.php # File download handler (auth + ownership check) ├── check_api.php # API status checker (auth required) ├── test_download.php # Download functionality tester (auth required) ├── config.php # Configuration loader ├── auth_config.php # Authentication & environment config ├── style.css # Black/gold theme styles ├── .env # Environment variables (NOT in git) ├── .env.example # Environment variables template ├── .htaccess # PHP upload limits ├── .gitignore # Git ignore rules ├── composer.json # PHP dependencies ├── requirements.txt # Python dependencies ├── setup.sh # Setup script ├── start_api.sh # API start script ├── README.md # This file ├── CLAUDE.md # Claude Code guidance ├── outputs/ # Transcribed files directory ├── vendor/ # Composer dependencies (NOT in git) └── venv/ # Python virtual environment ``` ## Production Deployment (Apache) ### Prerequisites - Apache 2.4+ - PHP 7.4+ with mod_php or PHP-FPM - Python 3.8+ - FFmpeg - Root/sudo access for system configuration ### Step 1: Install Required Apache Modules **Ubuntu/Debian:** ```bash sudo apt update sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring sudo a2enmod rewrite sudo systemctl restart apache2 ``` **CentOS/RHEL:** ```bash sudo yum install httpd php php-curl php-xml php-mbstring sudo systemctl enable httpd sudo systemctl start httpd ``` ### Step 2: Deploy Application Files ```bash # Clone or copy your application cd /var/www sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text cd voice2text # Set up Python environment sudo chmod +x setup.sh sudo ./setup.sh # Set proper ownership and permissions sudo chown -R www-data:www-data /var/www/voice2text sudo chmod -R 755 /var/www/voice2text sudo chmod 777 /var/www/voice2text/outputs ``` ### Step 3: Configure Apache Virtual Host Create `/etc/apache2/sites-available/voice2text.conf`: ```apache ServerName voice2text.yourdomain.com ServerAdmin admin@yourdomain.com DocumentRoot /var/www/voice2text Options -Indexes +FollowSymLinks AllowOverride All Require all granted # PHP settings for large uploads php_value upload_max_filesize 350M php_value post_max_size 350M php_value max_execution_time 1200 php_value max_input_time 1200 php_value memory_limit 512M # Protect sensitive files Require all denied # Logging ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined ``` Enable the site: ```bash sudo a2ensite voice2text.conf sudo systemctl reload apache2 ``` ### Step 4: Configure PHP for Large Uploads Edit `/etc/php/7.4/apache2/php.ini` (adjust version as needed): ```ini upload_max_filesize = 350M post_max_size = 350M max_execution_time = 1200 max_input_time = 1200 memory_limit = 512M ``` Restart Apache: ```bash sudo systemctl restart apache2 ``` ### Step 5: Setup Python API as Systemd Service Create `/etc/systemd/system/voice2text-api.service`: ```ini [Unit] Description=Voice to Text Whisper API After=network.target [Service] Type=simple User=www-data Group=www-data WorkingDirectory=/var/www/voice2text Environment="PATH=/var/www/voice2text/venv/bin" ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py Restart=always RestartSec=10 # Security settings NoNewPrivileges=true PrivateTmp=true # Logging StandardOutput=append:/var/log/voice2text-api.log StandardError=append:/var/log/voice2text-api-error.log [Install] WantedBy=multi-user.target ``` Enable and start the service: ```bash sudo systemctl daemon-reload sudo systemctl enable voice2text-api sudo systemctl start voice2text-api # Check status sudo systemctl status voice2text-api # View logs sudo journalctl -u voice2text-api -f ``` ### Step 6: Configure Firewall **UFW (Ubuntu):** ```bash sudo ufw allow 'Apache Full' sudo ufw allow 5010/tcp # Python API sudo ufw enable ``` **Firewalld (CentOS):** ```bash sudo firewall-cmd --permanent --add-service=http sudo firewall-cmd --permanent --add-service=https sudo firewall-cmd --permanent --add-port=5010/tcp sudo firewall-cmd --reload ``` ### Step 7: SSL Configuration (Optional but Recommended) Using Let's Encrypt with Certbot: ```bash # Install Certbot sudo apt install certbot python3-certbot-apache # Get SSL certificate sudo certbot --apache -d voice2text.yourdomain.com # Auto-renewal is configured automatically # Test renewal with: sudo certbot renew --dry-run ``` ### Step 8: Verify Deployment 1. Check Apache status: `sudo systemctl status apache2` 2. Check API status: `sudo systemctl status voice2text-api` 3. Visit: `http://voice2text.yourdomain.com/check_api.php` 4. Test file upload with a small audio file ## Monitoring and Maintenance ### Check API Status ```bash # View API logs sudo journalctl -u voice2text-api -n 100 # Check if API is responding curl http://localhost:5010/health ``` ### Check Apache Logs ```bash # Error log sudo tail -f /var/log/apache2/voice2text-error.log # Access log sudo tail -f /var/log/apache2/voice2text-access.log ``` ### Restart Services ```bash # Restart Apache sudo systemctl restart apache2 # Restart Python API sudo systemctl restart voice2text-api # Restart both sudo systemctl restart apache2 voice2text-api ``` ### Clean Old Files The `outputs/` directory can grow large. Set up a cron job to clean old files: ```bash # Edit crontab sudo crontab -e # Add this line to delete files older than 24 hours daily at 2 AM 0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete ``` ## Troubleshooting ### API Issues **API not connecting:** 1. Check if API is running: `sudo systemctl status voice2text-api` 2. Test health endpoint: `curl http://localhost:5010/health` 3. Check API logs: `sudo journalctl -u voice2text-api -n 50` 4. Verify firewall allows port 5010 5. Visit `check_api.php` in browser for detailed status **API won't start:** 1. Check Python version: `python3 --version` (must be 3.8+) 2. Verify virtual environment: `ls -la venv/` 3. Check dependencies: `source venv/bin/activate && pip list` 4. Review error logs: `sudo journalctl -u voice2text-api -xe` 5. Ensure FFmpeg is installed: `which ffmpeg` ### Upload Issues **File upload fails:** 1. Check file size limits in `php.ini` 2. Verify `.htaccess` is being read (requires `AllowOverride All`) 3. Check disk space: `df -h` 4. Verify `outputs/` directory permissions: `ls -ld outputs/` 5. Check Apache error log: `tail -f /var/log/apache2/error.log` **"413 Request Entity Too Large":** - If using Nginx as reverse proxy, add to nginx config: ```nginx client_max_body_size 350M; ``` ### Transcription Issues **Transcription fails:** 1. Verify FFmpeg is installed: `ffmpeg -version` 2. Check audio file format (supported: mp3, wav, m4a, etc.) 3. Review API logs for specific errors 4. Test with a small file first 5. Ensure enough disk space in `/tmp` **Slow transcription:** 1. Use a smaller Whisper model (`tiny` or `base`) 2. Consider using GPU acceleration (requires CUDA setup) 3. Upgrade server hardware (more CPU/RAM) 4. Reduce audio file length/quality ### Translation Issues **Translation fails:** 1. Verify DeepL API key is valid in `config.php` 2. Check DeepL API usage: https://www.deepl.com/pro-account 3. Review API response for specific error messages 4. Ensure internet connectivity for DeepL API ### Permission Issues **403 Forbidden errors:** ```bash sudo chown -R www-data:www-data /var/www/voice2text sudo chmod -R 755 /var/www/voice2text sudo chmod 777 /var/www/voice2text/outputs ``` **Can't write to outputs directory:** ```bash sudo mkdir -p /var/www/voice2text/outputs sudo chown www-data:www-data /var/www/voice2text/outputs sudo chmod 777 /var/www/voice2text/outputs ``` ### Performance Issues **Out of memory:** 1. Use a smaller Whisper model (`tiny` or `base`) 2. Increase PHP memory limit in `php.ini` 3. Increase system swap space 4. Add more RAM to server **Timeout errors:** 1. Increase PHP `max_execution_time` in `php.ini` 2. Increase Apache timeout in virtual host config 3. Process smaller audio files 4. Use faster Whisper model ### Debugging Tips **Enable debug mode:** Add to `config.php`: ```php error_reporting(E_ALL); ini_set('display_errors', 1); ``` **Check system resources:** ```bash # CPU and memory usage htop # Disk space df -h # Check running processes ps aux | grep -E 'python|apache' ``` **Test components individually:** 1. Test PHP: Create `test.php` with `` 2. Test Python API: `curl http://localhost:5010/health` 3. Test file upload: Use small test file first 4. Check browser console for JavaScript errors (F12) ## License MIT