voice2text/README.md

# Voice to Text with Whisper & DeepL Translation

A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

## Features

- 🎤 Audio transcription using OpenAI Whisper
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend
- 💻 PHP frontend (MAMP/Apache compatible)
- 📦 350MB file size limit
- 📄 Generates both original and translated files
- 🎨 Modern black/gold UI with dark theme
- 📊 Real-time progress bar during processing
- 👀 In-page preview of transcriptions
- ⬇️ One-click download for all formats

## Requirements

- Python 3.8 or higher
- PHP 7.4 or higher
- MAMP or Apache server
- FFmpeg (for audio processing)
- Composer (for PHP dependencies)
- Microsoft Azure AD application (for SSO authentication)

## Installation

### 1. Configure Authentication

This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.

**Step 1: Copy and configure environment file**
```bash
cp .env.example .env
```

**Step 2: Edit `.env` file with your Azure AD credentials:**
```env
AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
DEEPL_API_KEY=your_deepl_api_key_here
PYTHON_API_URL=http://localhost:5010
SESSION_TIMEOUT=28800
```

**Step 3: Install PHP dependencies**
```bash
composer install
```

### 2. Install FFmpeg

**macOS:**
```bash
brew install ffmpeg
```

**Linux (Ubuntu/Debian):**
```bash
sudo apt update
sudo apt install ffmpeg
```

**Windows:**
Download from https://ffmpeg.org/download.html

### 3. Setup Python Environment

Run the setup script:
```bash
chmod +x setup.sh
./setup.sh
```

This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory

### 4. Start the API Server

```bash
chmod +x start_api.sh
./start_api.sh
```

Or manually:
```bash
source venv/bin/activate
python api.py
```

The API will run on http://localhost:5010

### 5. Configure Web Server

Ensure your MAMP/Apache server points to this directory and PHP is enabled.

## Usage

1. Start the Python API server (see step 4 above)
2. Open the web application in your browser (you'll see a login page)
3. Click "Sign in with Microsoft" and authenticate with your Microsoft account
4. After authentication, you'll be redirected to the main application
5. Select output format (Text/VTT/SRT)
6. (Optional) Enable translation and select target language
7. Upload an audio file (max 350MB)
8. Wait for processing
9. Download original and/or translated transcription
10. Your files are associated with your session and only accessible to you

### Translation

The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

**Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`:
```
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
```

## API Endpoints

### POST /transcribe
Transcribe audio file to text/VTT/SRT

**Parameters:**
- `audio` (file): Audio file to transcribe
- `format` (string): Output format (txt/vtt/srt)

**Response:**
```json
{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}
```

### GET /health
Health check endpoint

### GET /download/<filename>
Download transcribed file

## Whisper Models

The default model is `base` which provides a good balance of speed and accuracy.

Available models:
- `tiny` - Fastest, least accurate
- `base` - Good balance (default)
- `small` - Better accuracy, slower
- `medium` - High accuracy, much slower
- `large` - Best accuracy, very slow

To change the model, edit `api.py` line 24:
```python
model = whisper.load_model("base")  # Change to desired model
```

## Authentication & Security

### Microsoft Azure AD SSO
- Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
- Secure authentication without client secrets
- Session-based file access control
- Users can only download files they've uploaded in their current session

### Session Management
- Secure session cookies (httponly, secure, samesite)
- Configurable session timeout (default: 8 hours)
- Session regeneration after login for security

### File Access Control
- Files are tracked per-user session in `$_SESSION['user_files']`
- Download attempts are validated against user's file list
- Unauthorized access attempts are logged and blocked

### Important Security Notes
- Ensure your `.env` file is never committed to git (it's in `.gitignore`)
- Use HTTPS in production for secure cookie transmission
- Files become inaccessible after session expires (files remain in `outputs/` but can't be downloaded)
- Consider setting up a cron job to clean old files from `outputs/` directory

## File Structure

```
.
├── api.py              # Python Flask API with Whisper & DeepL
├── login.php           # Landing page with Microsoft SSO
├── auth.php            # OAuth2 PKCE authentication handler
├── logout.php          # Session destruction handler
├── index.php           # Main application interface (auth required)
├── process.php         # PHP request handler (auth required)
├── download.php        # File download handler (auth + ownership check)
├── check_api.php       # API status checker (auth required)
├── test_download.php   # Download functionality tester (auth required)
├── config.php          # Configuration loader
├── auth_config.php     # Authentication & environment config
├── style.css           # Black/gold theme styles
├── .env                # Environment variables (NOT in git)
├── .env.example        # Environment variables template
├── .htaccess           # PHP upload limits
├── .gitignore          # Git ignore rules
├── composer.json       # PHP dependencies
├── requirements.txt    # Python dependencies
├── setup.sh            # Setup script
├── start_api.sh        # API start script
├── README.md           # This file
├── CLAUDE.md           # Claude Code guidance
├── outputs/            # Transcribed files directory
├── vendor/             # Composer dependencies (NOT in git)
└── venv/               # Python virtual environment
```

## Production Deployment (Apache)

### Prerequisites

- Apache 2.4+
- PHP 7.4+ with mod_php or PHP-FPM
- Python 3.8+
- FFmpeg
- Root/sudo access for system configuration

### Step 1: Install Required Apache Modules

**Ubuntu/Debian:**
```bash
sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2
```

**CentOS/RHEL:**
```bash
sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd
```

### Step 2: Deploy Application Files

```bash
# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text

# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh

# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
```

### Step 3: Configure Apache Virtual Host

Create `/etc/apache2/sites-available/voice2text.conf`:

```apache
<VirtualHost *:80>
    ServerName voice2text.yourdomain.com
    ServerAdmin admin@yourdomain.com

    DocumentRoot /var/www/voice2text

    <Directory /var/www/voice2text>
        Options -Indexes +FollowSymLinks
        AllowOverride All
        Require all granted

        # PHP settings for large uploads
        php_value upload_max_filesize 350M
        php_value post_max_size 350M
        php_value max_execution_time 1200
        php_value max_input_time 1200
        php_value memory_limit 512M
    </Directory>

    # Protect sensitive files
    <FilesMatch "^(config\.php|\.git|\.htaccess)">
        Require all denied
    </FilesMatch>

    # Logging
    ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
    CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>
```

Enable the site:
```bash
sudo a2ensite voice2text.conf
sudo systemctl reload apache2
```

### Step 4: Configure PHP for Large Uploads

Edit `/etc/php/7.4/apache2/php.ini` (adjust version as needed):

```ini
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M
```

Restart Apache:
```bash
sudo systemctl restart apache2
```

### Step 5: Setup Python API as Systemd Service

Create `/etc/systemd/system/voice2text-api.service`:

```ini
[Unit]
Description=Voice to Text Whisper API
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10

# Security settings
NoNewPrivileges=true
PrivateTmp=true

# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log

[Install]
WantedBy=multi-user.target
```

Enable and start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api

# Check status
sudo systemctl status voice2text-api

# View logs
sudo journalctl -u voice2text-api -f
```

### Step 6: Configure Firewall

**UFW (Ubuntu):**
```bash
sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp  # Python API
sudo ufw enable
```

**Firewalld (CentOS):**
```bash
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload
```

### Step 7: SSL Configuration (Optional but Recommended)

Using Let's Encrypt with Certbot:

```bash
# Install Certbot
sudo apt install certbot python3-certbot-apache

# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com

# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run
```

### Step 8: Verify Deployment

1. Check Apache status: `sudo systemctl status apache2`
2. Check API status: `sudo systemctl status voice2text-api`
3. Visit: `http://voice2text.yourdomain.com/check_api.php`
4. Test file upload with a small audio file

## Monitoring and Maintenance

### Check API Status
```bash
# View API logs
sudo journalctl -u voice2text-api -n 100

# Check if API is responding
curl http://localhost:5010/health
```

### Check Apache Logs
```bash
# Error log
sudo tail -f /var/log/apache2/voice2text-error.log

# Access log
sudo tail -f /var/log/apache2/voice2text-access.log
```

### Restart Services
```bash
# Restart Apache
sudo systemctl restart apache2

# Restart Python API
sudo systemctl restart voice2text-api

# Restart both
sudo systemctl restart apache2 voice2text-api
```

### Clean Old Files
The `outputs/` directory can grow large. Set up a cron job to clean old files:

```bash
# Edit crontab
sudo crontab -e

# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete
```

## Troubleshooting

### API Issues

**API not connecting:**
1. Check if API is running: `sudo systemctl status voice2text-api`
2. Test health endpoint: `curl http://localhost:5010/health`
3. Check API logs: `sudo journalctl -u voice2text-api -n 50`
4. Verify firewall allows port 5010
5. Visit `check_api.php` in browser for detailed status

**API won't start:**
1. Check Python version: `python3 --version` (must be 3.8+)
2. Verify virtual environment: `ls -la venv/`
3. Check dependencies: `source venv/bin/activate && pip list`
4. Review error logs: `sudo journalctl -u voice2text-api -xe`
5. Ensure FFmpeg is installed: `which ffmpeg`

### Upload Issues

**File upload fails:**
1. Check file size limits in `php.ini`
2. Verify `.htaccess` is being read (requires `AllowOverride All`)
3. Check disk space: `df -h`
4. Verify `outputs/` directory permissions: `ls -ld outputs/`
5. Check Apache error log: `tail -f /var/log/apache2/error.log`

**"413 Request Entity Too Large":**
- If using Nginx as reverse proxy, add to nginx config:
```nginx
client_max_body_size 350M;
```

### Transcription Issues

**Transcription fails:**
1. Verify FFmpeg is installed: `ffmpeg -version`
2. Check audio file format (supported: mp3, wav, m4a, etc.)
3. Review API logs for specific errors
4. Test with a small file first
5. Ensure enough disk space in `/tmp`

**Slow transcription:**
1. Use a smaller Whisper model (`tiny` or `base`)
2. Consider using GPU acceleration (requires CUDA setup)
3. Upgrade server hardware (more CPU/RAM)
4. Reduce audio file length/quality

### Translation Issues

**Translation fails:**
1. Verify DeepL API key is valid in `config.php`
2. Check DeepL API usage: https://www.deepl.com/pro-account
3. Review API response for specific error messages
4. Ensure internet connectivity for DeepL API

### Permission Issues

**403 Forbidden errors:**
```bash
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs
```

**Can't write to outputs directory:**
```bash
sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs
```

### Performance Issues

**Out of memory:**
1. Use a smaller Whisper model (`tiny` or `base`)
2. Increase PHP memory limit in `php.ini`
3. Increase system swap space
4. Add more RAM to server

**Timeout errors:**
1. Increase PHP `max_execution_time` in `php.ini`
2. Increase Apache timeout in virtual host config
3. Process smaller audio files
4. Use faster Whisper model

### Debugging Tips

**Enable debug mode:**
Add to `config.php`:
```php
error_reporting(E_ALL);
ini_set('display_errors', 1);
```

**Check system resources:**
```bash
# CPU and memory usage
htop

# Disk space
df -h

# Check running processes
ps aux | grep -E 'python|apache'
```

**Test components individually:**
1. Test PHP: Create `test.php` with `<?php phpinfo(); ?>`
2. Test Python API: `curl http://localhost:5010/health`
3. Test file upload: Use small test file first
4. Check browser console for JavaScript errors (F12)

## License

MIT