No description
Find a file
2025-11-03 08:41:27 -06:00
.env.example added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
.gitignore added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
.htaccess added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
.user.ini added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
api.py Fix VTT/SRT display and download functionality 2025-10-21 13:05:04 -04:00
auth.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
auth_config.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
check_api.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
CLAUDE.md added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
composer.json added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
config.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
download.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
index.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
login.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
logout.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
process.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
README.md added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
requirements.txt Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
setup.sh Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
start_api.sh Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
style.css added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
test_download.php added SSO login via MSAL and PKCE, ready for deployment (theoretically) 2025-11-03 08:41:27 -06:00
V2T.svg Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00

Voice to Text with Whisper & DeepL Translation

A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

Features

  • 🎤 Audio transcription using OpenAI Whisper
  • 🌍 Translation using DeepL API (30+ languages)
  • 📝 Multiple output formats: Text, VTT, SRT
  • 🚀 Python Flask API backend
  • 💻 PHP frontend (MAMP/Apache compatible)
  • 📦 350MB file size limit
  • 📄 Generates both original and translated files
  • 🎨 Modern black/gold UI with dark theme
  • 📊 Real-time progress bar during processing
  • 👀 In-page preview of transcriptions
  • ⬇️ One-click download for all formats

Requirements

  • Python 3.8 or higher
  • PHP 7.4 or higher
  • MAMP or Apache server
  • FFmpeg (for audio processing)
  • Composer (for PHP dependencies)
  • Microsoft Azure AD application (for SSO authentication)

Installation

1. Configure Authentication

This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.

Step 1: Copy and configure environment file

cp .env.example .env

Step 2: Edit .env file with your Azure AD credentials:

AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
DEEPL_API_KEY=your_deepl_api_key_here
PYTHON_API_URL=http://localhost:5010
SESSION_TIMEOUT=28800

Step 3: Install PHP dependencies

composer install

2. Install FFmpeg

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

3. Setup Python Environment

Run the setup script:

chmod +x setup.sh
./setup.sh

This will:

  • Create a Python virtual environment
  • Install all dependencies (Flask, Whisper, etc.)
  • Create the outputs directory

4. Start the API Server

chmod +x start_api.sh
./start_api.sh

Or manually:

source venv/bin/activate
python api.py

The API will run on http://localhost:5010

5. Configure Web Server

Ensure your MAMP/Apache server points to this directory and PHP is enabled.

Usage

  1. Start the Python API server (see step 4 above)
  2. Open the web application in your browser (you'll see a login page)
  3. Click "Sign in with Microsoft" and authenticate with your Microsoft account
  4. After authentication, you'll be redirected to the main application
  5. Select output format (Text/VTT/SRT)
  6. (Optional) Enable translation and select target language
  7. Upload an audio file (max 350MB)
  8. Wait for processing
  9. Download original and/or translated transcription
  10. Your files are associated with your session and only accessible to you

Translation

The app uses DeepL API for high-quality translations. When translation is enabled:

  • The audio is first transcribed in its original language
  • The transcription is then translated to your selected target language
  • Both original and translated files are generated
  • Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200

API Endpoints

POST /transcribe

Transcribe audio file to text/VTT/SRT

Parameters:

  • audio (file): Audio file to transcribe
  • format (string): Output format (txt/vtt/srt)

Response:

{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}

GET /health

Health check endpoint

GET /download/

Download transcribed file

Whisper Models

The default model is base which provides a good balance of speed and accuracy.

Available models:

  • tiny - Fastest, least accurate
  • base - Good balance (default)
  • small - Better accuracy, slower
  • medium - High accuracy, much slower
  • large - Best accuracy, very slow

To change the model, edit api.py line 24:

model = whisper.load_model("base")  # Change to desired model

Authentication & Security

Microsoft Azure AD SSO

  • Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
  • Secure authentication without client secrets
  • Session-based file access control
  • Users can only download files they've uploaded in their current session

Session Management

  • Secure session cookies (httponly, secure, samesite)
  • Configurable session timeout (default: 8 hours)
  • Session regeneration after login for security

File Access Control

  • Files are tracked per-user session in $_SESSION['user_files']
  • Download attempts are validated against user's file list
  • Unauthorized access attempts are logged and blocked

Important Security Notes

  • Ensure your .env file is never committed to git (it's in .gitignore)
  • Use HTTPS in production for secure cookie transmission
  • Files become inaccessible after session expires (files remain in outputs/ but can't be downloaded)
  • Consider setting up a cron job to clean old files from outputs/ directory

File Structure

.
├── api.py              # Python Flask API with Whisper & DeepL
├── login.php           # Landing page with Microsoft SSO
├── auth.php            # OAuth2 PKCE authentication handler
├── logout.php          # Session destruction handler
├── index.php           # Main application interface (auth required)
├── process.php         # PHP request handler (auth required)
├── download.php        # File download handler (auth + ownership check)
├── check_api.php       # API status checker (auth required)
├── test_download.php   # Download functionality tester (auth required)
├── config.php          # Configuration loader
├── auth_config.php     # Authentication & environment config
├── style.css           # Black/gold theme styles
├── .env                # Environment variables (NOT in git)
├── .env.example        # Environment variables template
├── .htaccess           # PHP upload limits
├── .gitignore          # Git ignore rules
├── composer.json       # PHP dependencies
├── requirements.txt    # Python dependencies
├── setup.sh            # Setup script
├── start_api.sh        # API start script
├── README.md           # This file
├── CLAUDE.md           # Claude Code guidance
├── outputs/            # Transcribed files directory
├── vendor/             # Composer dependencies (NOT in git)
└── venv/               # Python virtual environment

Production Deployment (Apache)

Prerequisites

  • Apache 2.4+
  • PHP 7.4+ with mod_php or PHP-FPM
  • Python 3.8+
  • FFmpeg
  • Root/sudo access for system configuration

Step 1: Install Required Apache Modules

Ubuntu/Debian:

sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2

CentOS/RHEL:

sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd

Step 2: Deploy Application Files

# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text

# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh

# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Step 3: Configure Apache Virtual Host

Create /etc/apache2/sites-available/voice2text.conf:

<VirtualHost *:80>
    ServerName voice2text.yourdomain.com
    ServerAdmin admin@yourdomain.com

    DocumentRoot /var/www/voice2text

    <Directory /var/www/voice2text>
        Options -Indexes +FollowSymLinks
        AllowOverride All
        Require all granted

        # PHP settings for large uploads
        php_value upload_max_filesize 350M
        php_value post_max_size 350M
        php_value max_execution_time 1200
        php_value max_input_time 1200
        php_value memory_limit 512M
    </Directory>

    # Protect sensitive files
    <FilesMatch "^(config\.php|\.git|\.htaccess)">
        Require all denied
    </FilesMatch>

    # Logging
    ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
    CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>

Enable the site:

sudo a2ensite voice2text.conf
sudo systemctl reload apache2

Step 4: Configure PHP for Large Uploads

Edit /etc/php/7.4/apache2/php.ini (adjust version as needed):

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M

Restart Apache:

sudo systemctl restart apache2

Step 5: Setup Python API as Systemd Service

Create /etc/systemd/system/voice2text-api.service:

[Unit]
Description=Voice to Text Whisper API
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10

# Security settings
NoNewPrivileges=true
PrivateTmp=true

# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api

# Check status
sudo systemctl status voice2text-api

# View logs
sudo journalctl -u voice2text-api -f

Step 6: Configure Firewall

UFW (Ubuntu):

sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp  # Python API
sudo ufw enable

Firewalld (CentOS):

sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload

Using Let's Encrypt with Certbot:

# Install Certbot
sudo apt install certbot python3-certbot-apache

# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com

# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run

Step 8: Verify Deployment

  1. Check Apache status: sudo systemctl status apache2
  2. Check API status: sudo systemctl status voice2text-api
  3. Visit: http://voice2text.yourdomain.com/check_api.php
  4. Test file upload with a small audio file

Monitoring and Maintenance

Check API Status

# View API logs
sudo journalctl -u voice2text-api -n 100

# Check if API is responding
curl http://localhost:5010/health

Check Apache Logs

# Error log
sudo tail -f /var/log/apache2/voice2text-error.log

# Access log
sudo tail -f /var/log/apache2/voice2text-access.log

Restart Services

# Restart Apache
sudo systemctl restart apache2

# Restart Python API
sudo systemctl restart voice2text-api

# Restart both
sudo systemctl restart apache2 voice2text-api

Clean Old Files

The outputs/ directory can grow large. Set up a cron job to clean old files:

# Edit crontab
sudo crontab -e

# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete

Troubleshooting

API Issues

API not connecting:

  1. Check if API is running: sudo systemctl status voice2text-api
  2. Test health endpoint: curl http://localhost:5010/health
  3. Check API logs: sudo journalctl -u voice2text-api -n 50
  4. Verify firewall allows port 5010
  5. Visit check_api.php in browser for detailed status

API won't start:

  1. Check Python version: python3 --version (must be 3.8+)
  2. Verify virtual environment: ls -la venv/
  3. Check dependencies: source venv/bin/activate && pip list
  4. Review error logs: sudo journalctl -u voice2text-api -xe
  5. Ensure FFmpeg is installed: which ffmpeg

Upload Issues

File upload fails:

  1. Check file size limits in php.ini
  2. Verify .htaccess is being read (requires AllowOverride All)
  3. Check disk space: df -h
  4. Verify outputs/ directory permissions: ls -ld outputs/
  5. Check Apache error log: tail -f /var/log/apache2/error.log

"413 Request Entity Too Large":

  • If using Nginx as reverse proxy, add to nginx config:
client_max_body_size 350M;

Transcription Issues

Transcription fails:

  1. Verify FFmpeg is installed: ffmpeg -version
  2. Check audio file format (supported: mp3, wav, m4a, etc.)
  3. Review API logs for specific errors
  4. Test with a small file first
  5. Ensure enough disk space in /tmp

Slow transcription:

  1. Use a smaller Whisper model (tiny or base)
  2. Consider using GPU acceleration (requires CUDA setup)
  3. Upgrade server hardware (more CPU/RAM)
  4. Reduce audio file length/quality

Translation Issues

Translation fails:

  1. Verify DeepL API key is valid in config.php
  2. Check DeepL API usage: https://www.deepl.com/pro-account
  3. Review API response for specific error messages
  4. Ensure internet connectivity for DeepL API

Permission Issues

403 Forbidden errors:

sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Can't write to outputs directory:

sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs

Performance Issues

Out of memory:

  1. Use a smaller Whisper model (tiny or base)
  2. Increase PHP memory limit in php.ini
  3. Increase system swap space
  4. Add more RAM to server

Timeout errors:

  1. Increase PHP max_execution_time in php.ini
  2. Increase Apache timeout in virtual host config
  3. Process smaller audio files
  4. Use faster Whisper model

Debugging Tips

Enable debug mode: Add to config.php:

error_reporting(E_ALL);
ini_set('display_errors', 1);

Check system resources:

# CPU and memory usage
htop

# Disk space
df -h

# Check running processes
ps aux | grep -E 'python|apache'

Test components individually:

  1. Test PHP: Create test.php with <?php phpinfo(); ?>
  2. Test Python API: curl http://localhost:5010/health
  3. Test file upload: Use small test file first
  4. Check browser console for JavaScript errors (F12)

License

MIT