voice2text/README.md

571 lines
14 KiB
Markdown

# Voice to Text with Whisper & DeepL Translation
A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
## Features
- 🎤 Audio transcription using OpenAI Whisper
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend
- 💻 PHP frontend (MAMP/Apache compatible)
- 📦 350MB file size limit
- 📄 Generates both original and translated files
- 🎨 Modern black/gold UI with dark theme
- 📊 Real-time progress bar during processing
- 👀 In-page preview of transcriptions
- ⬇️ One-click download for all formats
## Requirements
- Python 3.8 or higher
- PHP 7.4 or higher
- MAMP or Apache server
- FFmpeg (for audio processing)
- Composer (for PHP dependencies)
- Microsoft Azure AD application (for SSO authentication)
## Installation
### 1. Configure Authentication
This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.
**Step 1: Copy and configure environment file**
```bash
cp .env.example .env
```
**Step 2: Edit `.env` file with your Azure AD credentials:**
```env
AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
DEEPL_API_KEY=your_deepl_api_key_here
PYTHON_API_URL=http://localhost:5010
SESSION_TIMEOUT=28800
```
**Step 3: Install PHP dependencies**
```bash
composer install
```
### 2. Install FFmpeg
**macOS:**
```bash
brew install ffmpeg
```
**Linux (Ubuntu/Debian):**
```bash
sudo apt update
sudo apt install ffmpeg
```
**Windows:**
Download from https://ffmpeg.org/download.html
### 3. Setup Python Environment
Run the setup script:
```bash
chmod +x setup.sh
./setup.sh
```
This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory
### 4. Start the API Server
```bash
chmod +x start_api.sh
./start_api.sh
```
Or manually:
```bash
source venv/bin/activate
python api.py
```
The API will run on http://localhost:5010
### 5. Configure Web Server
Ensure your MAMP/Apache server points to this directory and PHP is enabled.
## Usage
1. Start the Python API server (see step 4 above)
2. Open the web application in your browser (you'll see a login page)
3. Click "Sign in with Microsoft" and authenticate with your Microsoft account
4. After authentication, you'll be redirected to the main application
5. Select output format (Text/VTT/SRT)
6. (Optional) Enable translation and select target language
7. Upload an audio file (max 350MB)
8. Wait for processing
9. Download original and/or translated transcription
10. Your files are associated with your session and only accessible to you
### Translation
The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
**Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`:
```
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
```
## API Endpoints
### POST /transcribe
Transcribe audio file to text/VTT/SRT
**Parameters:**
- `audio` (file): Audio file to transcribe
- `format` (string): Output format (txt/vtt/srt)
**Response:**
```json
{
"success": true,
"text": "transcribed text...",
"filename": "output.txt",
"format": "txt"
}
```
### GET /health
Health check endpoint
### GET /download/<filename>
Download transcribed file
## Whisper Models
The default model is `base` which provides a good balance of speed and accuracy.
Available models:
- `tiny` - Fastest, least accurate
- `base` - Good balance (default)
- `small` - Better accuracy, slower
- `medium` - High accuracy, much slower
- `large` - Best accuracy, very slow
To change the model, edit `api.py` line 24:
```python
model = whisper.load_model("base") # Change to desired model
```
## Authentication & Security
### Microsoft Azure AD SSO
- Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
- Secure authentication without client secrets
- Session-based file access control
- Users can only download files they've uploaded in their current session
### Session Management
- Secure session cookies (httponly, secure, samesite)
- Configurable session timeout (default: 8 hours)
- Session regeneration after login for security
### File Access Control
- Files are tracked per-user session in `$_SESSION['user_files']`
- Download attempts are validated against user's file list
- Unauthorized access attempts are logged and blocked
### Important Security Notes
- Ensure your `.env` file is never committed to git (it's in `.gitignore`)
- Use HTTPS in production for secure cookie transmission
- Files become inaccessible after session expires (files remain in `outputs/` but can't be downloaded)
- Consider setting up a cron job to clean old files from `outputs/` directory
## File Structure
```
.
├── api.py # Python Flask API with Whisper & DeepL
├── login.php # Landing page with Microsoft SSO
├── auth.php # OAuth2 PKCE authentication handler
├── logout.php # Session destruction handler
├── index.php # Main application interface (auth required)
├── process.php # PHP request handler (auth required)
├── download.php # File download handler (auth + ownership check)
├── check_api.php # API status checker (auth required)
├── test_download.php # Download functionality tester (auth required)
├── config.php # Configuration loader
├── auth_config.php # Authentication & environment config
├── style.css # Black/gold theme styles
├── .env # Environment variables (NOT in git)
├── .env.example # Environment variables template
├── .htaccess # PHP upload limits
├── .gitignore # Git ignore rules
├── composer.json # PHP dependencies
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── start_api.sh # API start script
├── README.md # This file
├── CLAUDE.md # Claude Code guidance
├── outputs/ # Transcribed files directory
├── vendor/ # Composer dependencies (NOT in git)
└── venv/ # Python virtual environment
```
## Production Deployment (Apache)
### Prerequisites
- Apache 2.4+
- PHP 7.4+ with mod_php or PHP-FPM
- Python 3.8+
- FFmpeg
- Root/sudo access for system configuration
### Step 1: Install Required Apache Modules
**Ubuntu/Debian:**
```bash
sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2
```
**CentOS/RHEL:**
```bash
sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd
```
### Step 2: Deploy Application Files
```bash
# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text
# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh
# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
```
### Step 3: Configure Apache Virtual Host
Create `/etc/apache2/sites-available/voice2text.conf`:
```apache
<VirtualHost *:80>
ServerName voice2text.yourdomain.com
ServerAdmin admin@yourdomain.com
DocumentRoot /var/www/voice2text
<Directory /var/www/voice2text>
Options -Indexes +FollowSymLinks
AllowOverride All
Require all granted
# PHP settings for large uploads
php_value upload_max_filesize 350M
php_value post_max_size 350M
php_value max_execution_time 1200
php_value max_input_time 1200
php_value memory_limit 512M
</Directory>
# Protect sensitive files
<FilesMatch "^(config\.php|\.git|\.htaccess)">
Require all denied
</FilesMatch>
# Logging
ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>
```
Enable the site:
```bash
sudo a2ensite voice2text.conf
sudo systemctl reload apache2
```
### Step 4: Configure PHP for Large Uploads
Edit `/etc/php/7.4/apache2/php.ini` (adjust version as needed):
```ini
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M
```
Restart Apache:
```bash
sudo systemctl restart apache2
```
### Step 5: Setup Python API as Systemd Service
Create `/etc/systemd/system/voice2text-api.service`:
```ini
[Unit]
Description=Voice to Text Whisper API
After=network.target
[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10
# Security settings
NoNewPrivileges=true
PrivateTmp=true
# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log
[Install]
WantedBy=multi-user.target
```
Enable and start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api
# Check status
sudo systemctl status voice2text-api
# View logs
sudo journalctl -u voice2text-api -f
```
### Step 6: Configure Firewall
**UFW (Ubuntu):**
```bash
sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp # Python API
sudo ufw enable
```
**Firewalld (CentOS):**
```bash
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload
```
### Step 7: SSL Configuration (Optional but Recommended)
Using Let's Encrypt with Certbot:
```bash
# Install Certbot
sudo apt install certbot python3-certbot-apache
# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com
# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run
```
### Step 8: Verify Deployment
1. Check Apache status: `sudo systemctl status apache2`
2. Check API status: `sudo systemctl status voice2text-api`
3. Visit: `http://voice2text.yourdomain.com/check_api.php`
4. Test file upload with a small audio file
## Monitoring and Maintenance
### Check API Status
```bash
# View API logs
sudo journalctl -u voice2text-api -n 100
# Check if API is responding
curl http://localhost:5010/health
```
### Check Apache Logs
```bash
# Error log
sudo tail -f /var/log/apache2/voice2text-error.log
# Access log
sudo tail -f /var/log/apache2/voice2text-access.log
```
### Restart Services
```bash
# Restart Apache
sudo systemctl restart apache2
# Restart Python API
sudo systemctl restart voice2text-api
# Restart both
sudo systemctl restart apache2 voice2text-api
```
### Clean Old Files
The `outputs/` directory can grow large. Set up a cron job to clean old files:
```bash
# Edit crontab
sudo crontab -e
# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete
```
## Troubleshooting
### API Issues
**API not connecting:**
1. Check if API is running: `sudo systemctl status voice2text-api`
2. Test health endpoint: `curl http://localhost:5010/health`
3. Check API logs: `sudo journalctl -u voice2text-api -n 50`
4. Verify firewall allows port 5010
5. Visit `check_api.php` in browser for detailed status
**API won't start:**
1. Check Python version: `python3 --version` (must be 3.8+)
2. Verify virtual environment: `ls -la venv/`
3. Check dependencies: `source venv/bin/activate && pip list`
4. Review error logs: `sudo journalctl -u voice2text-api -xe`
5. Ensure FFmpeg is installed: `which ffmpeg`
### Upload Issues
**File upload fails:**
1. Check file size limits in `php.ini`
2. Verify `.htaccess` is being read (requires `AllowOverride All`)
3. Check disk space: `df -h`
4. Verify `outputs/` directory permissions: `ls -ld outputs/`
5. Check Apache error log: `tail -f /var/log/apache2/error.log`
**"413 Request Entity Too Large":**
- If using Nginx as reverse proxy, add to nginx config:
```nginx
client_max_body_size 350M;
```
### Transcription Issues
**Transcription fails:**
1. Verify FFmpeg is installed: `ffmpeg -version`
2. Check audio file format (supported: mp3, wav, m4a, etc.)
3. Review API logs for specific errors
4. Test with a small file first
5. Ensure enough disk space in `/tmp`
**Slow transcription:**
1. Use a smaller Whisper model (`tiny` or `base`)
2. Consider using GPU acceleration (requires CUDA setup)
3. Upgrade server hardware (more CPU/RAM)
4. Reduce audio file length/quality
### Translation Issues
**Translation fails:**
1. Verify DeepL API key is valid in `config.php`
2. Check DeepL API usage: https://www.deepl.com/pro-account
3. Review API response for specific error messages
4. Ensure internet connectivity for DeepL API
### Permission Issues
**403 Forbidden errors:**
```bash
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
```
**Can't write to outputs directory:**
```bash
sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs
```
### Performance Issues
**Out of memory:**
1. Use a smaller Whisper model (`tiny` or `base`)
2. Increase PHP memory limit in `php.ini`
3. Increase system swap space
4. Add more RAM to server
**Timeout errors:**
1. Increase PHP `max_execution_time` in `php.ini`
2. Increase Apache timeout in virtual host config
3. Process smaller audio files
4. Use faster Whisper model
### Debugging Tips
**Enable debug mode:**
Add to `config.php`:
```php
error_reporting(E_ALL);
ini_set('display_errors', 1);
```
**Check system resources:**
```bash
# CPU and memory usage
htop
# Disk space
df -h
# Check running processes
ps aux | grep -E 'python|apache'
```
**Test components individually:**
1. Test PHP: Create `test.php` with `<?php phpinfo(); ?>`
2. Test Python API: `curl http://localhost:5010/health`
3. Test file upload: Use small test file first
4. Check browser console for JavaScript errors (F12)
## License
MIT