571 lines
14 KiB
Markdown
571 lines
14 KiB
Markdown
# Voice to Text with Whisper & DeepL Translation
|
|
|
|
A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
|
|
|
|
## Features
|
|
|
|
- 🎤 Audio transcription using OpenAI Whisper
|
|
- 🌍 Translation using DeepL API (30+ languages)
|
|
- 📝 Multiple output formats: Text, VTT, SRT
|
|
- 🚀 Python Flask API backend
|
|
- 💻 PHP frontend (MAMP/Apache compatible)
|
|
- 📦 350MB file size limit
|
|
- 📄 Generates both original and translated files
|
|
- 🎨 Modern black/gold UI with dark theme
|
|
- 📊 Real-time progress bar during processing
|
|
- 👀 In-page preview of transcriptions
|
|
- ⬇️ One-click download for all formats
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8 or higher
|
|
- PHP 7.4 or higher
|
|
- MAMP or Apache server
|
|
- FFmpeg (for audio processing)
|
|
- Composer (for PHP dependencies)
|
|
- Microsoft Azure AD application (for SSO authentication)
|
|
|
|
## Installation
|
|
|
|
### 1. Configure Authentication
|
|
|
|
This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.
|
|
|
|
**Step 1: Copy and configure environment file**
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
**Step 2: Edit `.env` file with your Azure AD credentials:**
|
|
```env
|
|
AZURE_CLIENT_ID=your_client_id_here
|
|
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
|
|
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
|
|
DEEPL_API_KEY=your_deepl_api_key_here
|
|
PYTHON_API_URL=http://localhost:5010
|
|
SESSION_TIMEOUT=28800
|
|
```
|
|
|
|
**Step 3: Install PHP dependencies**
|
|
```bash
|
|
composer install
|
|
```
|
|
|
|
### 2. Install FFmpeg
|
|
|
|
**macOS:**
|
|
```bash
|
|
brew install ffmpeg
|
|
```
|
|
|
|
**Linux (Ubuntu/Debian):**
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install ffmpeg
|
|
```
|
|
|
|
**Windows:**
|
|
Download from https://ffmpeg.org/download.html
|
|
|
|
### 3. Setup Python Environment
|
|
|
|
Run the setup script:
|
|
```bash
|
|
chmod +x setup.sh
|
|
./setup.sh
|
|
```
|
|
|
|
This will:
|
|
- Create a Python virtual environment
|
|
- Install all dependencies (Flask, Whisper, etc.)
|
|
- Create the outputs directory
|
|
|
|
### 4. Start the API Server
|
|
|
|
```bash
|
|
chmod +x start_api.sh
|
|
./start_api.sh
|
|
```
|
|
|
|
Or manually:
|
|
```bash
|
|
source venv/bin/activate
|
|
python api.py
|
|
```
|
|
|
|
The API will run on http://localhost:5010
|
|
|
|
### 5. Configure Web Server
|
|
|
|
Ensure your MAMP/Apache server points to this directory and PHP is enabled.
|
|
|
|
## Usage
|
|
|
|
1. Start the Python API server (see step 4 above)
|
|
2. Open the web application in your browser (you'll see a login page)
|
|
3. Click "Sign in with Microsoft" and authenticate with your Microsoft account
|
|
4. After authentication, you'll be redirected to the main application
|
|
5. Select output format (Text/VTT/SRT)
|
|
6. (Optional) Enable translation and select target language
|
|
7. Upload an audio file (max 350MB)
|
|
8. Wait for processing
|
|
9. Download original and/or translated transcription
|
|
10. Your files are associated with your session and only accessible to you
|
|
|
|
### Translation
|
|
|
|
The app uses DeepL API for high-quality translations. When translation is enabled:
|
|
- The audio is first transcribed in its original language
|
|
- The transcription is then translated to your selected target language
|
|
- Both original and translated files are generated
|
|
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
|
|
|
|
**Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`:
|
|
```
|
|
upload_max_filesize = 350M
|
|
post_max_size = 350M
|
|
max_execution_time = 1200
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### POST /transcribe
|
|
Transcribe audio file to text/VTT/SRT
|
|
|
|
**Parameters:**
|
|
- `audio` (file): Audio file to transcribe
|
|
- `format` (string): Output format (txt/vtt/srt)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"text": "transcribed text...",
|
|
"filename": "output.txt",
|
|
"format": "txt"
|
|
}
|
|
```
|
|
|
|
### GET /health
|
|
Health check endpoint
|
|
|
|
### GET /download/<filename>
|
|
Download transcribed file
|
|
|
|
## Whisper Models
|
|
|
|
The default model is `base` which provides a good balance of speed and accuracy.
|
|
|
|
Available models:
|
|
- `tiny` - Fastest, least accurate
|
|
- `base` - Good balance (default)
|
|
- `small` - Better accuracy, slower
|
|
- `medium` - High accuracy, much slower
|
|
- `large` - Best accuracy, very slow
|
|
|
|
To change the model, edit `api.py` line 24:
|
|
```python
|
|
model = whisper.load_model("base") # Change to desired model
|
|
```
|
|
|
|
## Authentication & Security
|
|
|
|
### Microsoft Azure AD SSO
|
|
- Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
|
|
- Secure authentication without client secrets
|
|
- Session-based file access control
|
|
- Users can only download files they've uploaded in their current session
|
|
|
|
### Session Management
|
|
- Secure session cookies (httponly, secure, samesite)
|
|
- Configurable session timeout (default: 8 hours)
|
|
- Session regeneration after login for security
|
|
|
|
### File Access Control
|
|
- Files are tracked per-user session in `$_SESSION['user_files']`
|
|
- Download attempts are validated against user's file list
|
|
- Unauthorized access attempts are logged and blocked
|
|
|
|
### Important Security Notes
|
|
- Ensure your `.env` file is never committed to git (it's in `.gitignore`)
|
|
- Use HTTPS in production for secure cookie transmission
|
|
- Files become inaccessible after session expires (files remain in `outputs/` but can't be downloaded)
|
|
- Consider setting up a cron job to clean old files from `outputs/` directory
|
|
|
|
## File Structure
|
|
|
|
```
|
|
.
|
|
├── api.py # Python Flask API with Whisper & DeepL
|
|
├── login.php # Landing page with Microsoft SSO
|
|
├── auth.php # OAuth2 PKCE authentication handler
|
|
├── logout.php # Session destruction handler
|
|
├── index.php # Main application interface (auth required)
|
|
├── process.php # PHP request handler (auth required)
|
|
├── download.php # File download handler (auth + ownership check)
|
|
├── check_api.php # API status checker (auth required)
|
|
├── test_download.php # Download functionality tester (auth required)
|
|
├── config.php # Configuration loader
|
|
├── auth_config.php # Authentication & environment config
|
|
├── style.css # Black/gold theme styles
|
|
├── .env # Environment variables (NOT in git)
|
|
├── .env.example # Environment variables template
|
|
├── .htaccess # PHP upload limits
|
|
├── .gitignore # Git ignore rules
|
|
├── composer.json # PHP dependencies
|
|
├── requirements.txt # Python dependencies
|
|
├── setup.sh # Setup script
|
|
├── start_api.sh # API start script
|
|
├── README.md # This file
|
|
├── CLAUDE.md # Claude Code guidance
|
|
├── outputs/ # Transcribed files directory
|
|
├── vendor/ # Composer dependencies (NOT in git)
|
|
└── venv/ # Python virtual environment
|
|
```
|
|
|
|
## Production Deployment (Apache)
|
|
|
|
### Prerequisites
|
|
|
|
- Apache 2.4+
|
|
- PHP 7.4+ with mod_php or PHP-FPM
|
|
- Python 3.8+
|
|
- FFmpeg
|
|
- Root/sudo access for system configuration
|
|
|
|
### Step 1: Install Required Apache Modules
|
|
|
|
**Ubuntu/Debian:**
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
|
|
sudo a2enmod rewrite
|
|
sudo systemctl restart apache2
|
|
```
|
|
|
|
**CentOS/RHEL:**
|
|
```bash
|
|
sudo yum install httpd php php-curl php-xml php-mbstring
|
|
sudo systemctl enable httpd
|
|
sudo systemctl start httpd
|
|
```
|
|
|
|
### Step 2: Deploy Application Files
|
|
|
|
```bash
|
|
# Clone or copy your application
|
|
cd /var/www
|
|
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
|
|
cd voice2text
|
|
|
|
# Set up Python environment
|
|
sudo chmod +x setup.sh
|
|
sudo ./setup.sh
|
|
|
|
# Set proper ownership and permissions
|
|
sudo chown -R www-data:www-data /var/www/voice2text
|
|
sudo chmod -R 755 /var/www/voice2text
|
|
sudo chmod 777 /var/www/voice2text/outputs
|
|
```
|
|
|
|
### Step 3: Configure Apache Virtual Host
|
|
|
|
Create `/etc/apache2/sites-available/voice2text.conf`:
|
|
|
|
```apache
|
|
<VirtualHost *:80>
|
|
ServerName voice2text.yourdomain.com
|
|
ServerAdmin admin@yourdomain.com
|
|
|
|
DocumentRoot /var/www/voice2text
|
|
|
|
<Directory /var/www/voice2text>
|
|
Options -Indexes +FollowSymLinks
|
|
AllowOverride All
|
|
Require all granted
|
|
|
|
# PHP settings for large uploads
|
|
php_value upload_max_filesize 350M
|
|
php_value post_max_size 350M
|
|
php_value max_execution_time 1200
|
|
php_value max_input_time 1200
|
|
php_value memory_limit 512M
|
|
</Directory>
|
|
|
|
# Protect sensitive files
|
|
<FilesMatch "^(config\.php|\.git|\.htaccess)">
|
|
Require all denied
|
|
</FilesMatch>
|
|
|
|
# Logging
|
|
ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
|
|
CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
|
|
</VirtualHost>
|
|
```
|
|
|
|
Enable the site:
|
|
```bash
|
|
sudo a2ensite voice2text.conf
|
|
sudo systemctl reload apache2
|
|
```
|
|
|
|
### Step 4: Configure PHP for Large Uploads
|
|
|
|
Edit `/etc/php/7.4/apache2/php.ini` (adjust version as needed):
|
|
|
|
```ini
|
|
upload_max_filesize = 350M
|
|
post_max_size = 350M
|
|
max_execution_time = 1200
|
|
max_input_time = 1200
|
|
memory_limit = 512M
|
|
```
|
|
|
|
Restart Apache:
|
|
```bash
|
|
sudo systemctl restart apache2
|
|
```
|
|
|
|
### Step 5: Setup Python API as Systemd Service
|
|
|
|
Create `/etc/systemd/system/voice2text-api.service`:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Voice to Text Whisper API
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=www-data
|
|
Group=www-data
|
|
WorkingDirectory=/var/www/voice2text
|
|
Environment="PATH=/var/www/voice2text/venv/bin"
|
|
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
|
|
Restart=always
|
|
RestartSec=10
|
|
|
|
# Security settings
|
|
NoNewPrivileges=true
|
|
PrivateTmp=true
|
|
|
|
# Logging
|
|
StandardOutput=append:/var/log/voice2text-api.log
|
|
StandardError=append:/var/log/voice2text-api-error.log
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
Enable and start the service:
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable voice2text-api
|
|
sudo systemctl start voice2text-api
|
|
|
|
# Check status
|
|
sudo systemctl status voice2text-api
|
|
|
|
# View logs
|
|
sudo journalctl -u voice2text-api -f
|
|
```
|
|
|
|
### Step 6: Configure Firewall
|
|
|
|
**UFW (Ubuntu):**
|
|
```bash
|
|
sudo ufw allow 'Apache Full'
|
|
sudo ufw allow 5010/tcp # Python API
|
|
sudo ufw enable
|
|
```
|
|
|
|
**Firewalld (CentOS):**
|
|
```bash
|
|
sudo firewall-cmd --permanent --add-service=http
|
|
sudo firewall-cmd --permanent --add-service=https
|
|
sudo firewall-cmd --permanent --add-port=5010/tcp
|
|
sudo firewall-cmd --reload
|
|
```
|
|
|
|
### Step 7: SSL Configuration (Optional but Recommended)
|
|
|
|
Using Let's Encrypt with Certbot:
|
|
|
|
```bash
|
|
# Install Certbot
|
|
sudo apt install certbot python3-certbot-apache
|
|
|
|
# Get SSL certificate
|
|
sudo certbot --apache -d voice2text.yourdomain.com
|
|
|
|
# Auto-renewal is configured automatically
|
|
# Test renewal with:
|
|
sudo certbot renew --dry-run
|
|
```
|
|
|
|
### Step 8: Verify Deployment
|
|
|
|
1. Check Apache status: `sudo systemctl status apache2`
|
|
2. Check API status: `sudo systemctl status voice2text-api`
|
|
3. Visit: `http://voice2text.yourdomain.com/check_api.php`
|
|
4. Test file upload with a small audio file
|
|
|
|
## Monitoring and Maintenance
|
|
|
|
### Check API Status
|
|
```bash
|
|
# View API logs
|
|
sudo journalctl -u voice2text-api -n 100
|
|
|
|
# Check if API is responding
|
|
curl http://localhost:5010/health
|
|
```
|
|
|
|
### Check Apache Logs
|
|
```bash
|
|
# Error log
|
|
sudo tail -f /var/log/apache2/voice2text-error.log
|
|
|
|
# Access log
|
|
sudo tail -f /var/log/apache2/voice2text-access.log
|
|
```
|
|
|
|
### Restart Services
|
|
```bash
|
|
# Restart Apache
|
|
sudo systemctl restart apache2
|
|
|
|
# Restart Python API
|
|
sudo systemctl restart voice2text-api
|
|
|
|
# Restart both
|
|
sudo systemctl restart apache2 voice2text-api
|
|
```
|
|
|
|
### Clean Old Files
|
|
The `outputs/` directory can grow large. Set up a cron job to clean old files:
|
|
|
|
```bash
|
|
# Edit crontab
|
|
sudo crontab -e
|
|
|
|
# Add this line to delete files older than 24 hours daily at 2 AM
|
|
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### API Issues
|
|
|
|
**API not connecting:**
|
|
1. Check if API is running: `sudo systemctl status voice2text-api`
|
|
2. Test health endpoint: `curl http://localhost:5010/health`
|
|
3. Check API logs: `sudo journalctl -u voice2text-api -n 50`
|
|
4. Verify firewall allows port 5010
|
|
5. Visit `check_api.php` in browser for detailed status
|
|
|
|
**API won't start:**
|
|
1. Check Python version: `python3 --version` (must be 3.8+)
|
|
2. Verify virtual environment: `ls -la venv/`
|
|
3. Check dependencies: `source venv/bin/activate && pip list`
|
|
4. Review error logs: `sudo journalctl -u voice2text-api -xe`
|
|
5. Ensure FFmpeg is installed: `which ffmpeg`
|
|
|
|
### Upload Issues
|
|
|
|
**File upload fails:**
|
|
1. Check file size limits in `php.ini`
|
|
2. Verify `.htaccess` is being read (requires `AllowOverride All`)
|
|
3. Check disk space: `df -h`
|
|
4. Verify `outputs/` directory permissions: `ls -ld outputs/`
|
|
5. Check Apache error log: `tail -f /var/log/apache2/error.log`
|
|
|
|
**"413 Request Entity Too Large":**
|
|
- If using Nginx as reverse proxy, add to nginx config:
|
|
```nginx
|
|
client_max_body_size 350M;
|
|
```
|
|
|
|
### Transcription Issues
|
|
|
|
**Transcription fails:**
|
|
1. Verify FFmpeg is installed: `ffmpeg -version`
|
|
2. Check audio file format (supported: mp3, wav, m4a, etc.)
|
|
3. Review API logs for specific errors
|
|
4. Test with a small file first
|
|
5. Ensure enough disk space in `/tmp`
|
|
|
|
**Slow transcription:**
|
|
1. Use a smaller Whisper model (`tiny` or `base`)
|
|
2. Consider using GPU acceleration (requires CUDA setup)
|
|
3. Upgrade server hardware (more CPU/RAM)
|
|
4. Reduce audio file length/quality
|
|
|
|
### Translation Issues
|
|
|
|
**Translation fails:**
|
|
1. Verify DeepL API key is valid in `config.php`
|
|
2. Check DeepL API usage: https://www.deepl.com/pro-account
|
|
3. Review API response for specific error messages
|
|
4. Ensure internet connectivity for DeepL API
|
|
|
|
### Permission Issues
|
|
|
|
**403 Forbidden errors:**
|
|
```bash
|
|
sudo chown -R www-data:www-data /var/www/voice2text
|
|
sudo chmod -R 755 /var/www/voice2text
|
|
sudo chmod 777 /var/www/voice2text/outputs
|
|
```
|
|
|
|
**Can't write to outputs directory:**
|
|
```bash
|
|
sudo mkdir -p /var/www/voice2text/outputs
|
|
sudo chown www-data:www-data /var/www/voice2text/outputs
|
|
sudo chmod 777 /var/www/voice2text/outputs
|
|
```
|
|
|
|
### Performance Issues
|
|
|
|
**Out of memory:**
|
|
1. Use a smaller Whisper model (`tiny` or `base`)
|
|
2. Increase PHP memory limit in `php.ini`
|
|
3. Increase system swap space
|
|
4. Add more RAM to server
|
|
|
|
**Timeout errors:**
|
|
1. Increase PHP `max_execution_time` in `php.ini`
|
|
2. Increase Apache timeout in virtual host config
|
|
3. Process smaller audio files
|
|
4. Use faster Whisper model
|
|
|
|
### Debugging Tips
|
|
|
|
**Enable debug mode:**
|
|
Add to `config.php`:
|
|
```php
|
|
error_reporting(E_ALL);
|
|
ini_set('display_errors', 1);
|
|
```
|
|
|
|
**Check system resources:**
|
|
```bash
|
|
# CPU and memory usage
|
|
htop
|
|
|
|
# Disk space
|
|
df -h
|
|
|
|
# Check running processes
|
|
ps aux | grep -E 'python|apache'
|
|
```
|
|
|
|
**Test components individually:**
|
|
1. Test PHP: Create `test.php` with `<?php phpinfo(); ?>`
|
|
2. Test Python API: `curl http://localhost:5010/health`
|
|
3. Test file upload: Use small test file first
|
|
4. Check browser console for JavaScript errors (F12)
|
|
|
|
## License
|
|
|
|
MIT
|