No description

Find a file

michael 5de2003aff added SSO login via MSAL and PKCE, ready for deployment (theoretically)		2025-11-03 08:41:27 -06:00
.env.example	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
.gitignore	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
.htaccess	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
.user.ini	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
api.py	Fix VTT/SRT display and download functionality	2025-10-21 13:05:04 -04:00
auth.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
auth_config.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
check_api.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
CLAUDE.md	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
composer.json	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
config.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
download.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
index.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
login.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
logout.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
process.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
README.md	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
requirements.txt	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
setup.sh	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
start_api.sh	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
style.css	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
test_download.php	added SSO login via MSAL and PKCE, ready for deployment (theoretically)	2025-11-03 08:41:27 -06:00
V2T.svg	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00

README.md

Voice to Text with Whisper & DeepL Translation

A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

Features

🎤 Audio transcription using OpenAI Whisper
🌍 Translation using DeepL API (30+ languages)
📝 Multiple output formats: Text, VTT, SRT
🚀 Python Flask API backend
💻 PHP frontend (MAMP/Apache compatible)
📦 350MB file size limit
📄 Generates both original and translated files
🎨 Modern black/gold UI with dark theme
📊 Real-time progress bar during processing
👀 In-page preview of transcriptions
⬇️ One-click download for all formats

Requirements

Python 3.8 or higher
PHP 7.4 or higher
MAMP or Apache server
FFmpeg (for audio processing)
Composer (for PHP dependencies)
Microsoft Azure AD application (for SSO authentication)

Installation

1. Configure Authentication

This application uses Microsoft Azure AD for Single Sign-On (SSO) authentication with PKCE flow.

Step 1: Copy and configure environment file

cp .env.example .env

Step 2: Edit .env file with your Azure AD credentials:

AZURE_CLIENT_ID=your_client_id_here
AZURE_AUTHORITY=https://login.microsoftonline.com/your_tenant_id_here
AZURE_REDIRECT_URI=https://yourdomain.com/voice2text/
DEEPL_API_KEY=your_deepl_api_key_here
PYTHON_API_URL=http://localhost:5010
SESSION_TIMEOUT=28800

Step 3: Install PHP dependencies

composer install

2. Install FFmpeg

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

3. Setup Python Environment

Run the setup script:

chmod +x setup.sh
./setup.sh

This will:

Create a Python virtual environment
Install all dependencies (Flask, Whisper, etc.)
Create the outputs directory

4. Start the API Server

chmod +x start_api.sh
./start_api.sh

Or manually:

source venv/bin/activate
python api.py

The API will run on http://localhost:5010

5. Configure Web Server

Ensure your MAMP/Apache server points to this directory and PHP is enabled.

Usage

Start the Python API server (see step 4 above)
Open the web application in your browser (you'll see a login page)
Click "Sign in with Microsoft" and authenticate with your Microsoft account
After authentication, you'll be redirected to the main application
Select output format (Text/VTT/SRT)
(Optional) Enable translation and select target language
Upload an audio file (max 350MB)
Wait for processing
Download original and/or translated transcription
Your files are associated with your session and only accessible to you

Translation

The app uses DeepL API for high-quality translations. When translation is enabled:

The audio is first transcribed in its original language
The transcription is then translated to your selected target language
Both original and translated files are generated
Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200

API Endpoints

POST /transcribe

Transcribe audio file to text/VTT/SRT

Parameters:

audio (file): Audio file to transcribe
format (string): Output format (txt/vtt/srt)

Response:

{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}

GET /health

Health check endpoint

GET /download/

Download transcribed file

Whisper Models

The default model is base which provides a good balance of speed and accuracy.

Available models:

tiny - Fastest, least accurate
base - Good balance (default)
small - Better accuracy, slower
medium - High accuracy, much slower
large - Best accuracy, very slow

To change the model, edit api.py line 24:

model = whisper.load_model("base")  # Change to desired model

Authentication & Security

Microsoft Azure AD SSO

Uses OAuth2 with PKCE (Proof Key for Code Exchange) flow
Secure authentication without client secrets
Session-based file access control
Users can only download files they've uploaded in their current session

Session Management

Secure session cookies (httponly, secure, samesite)
Configurable session timeout (default: 8 hours)
Session regeneration after login for security

File Access Control

Files are tracked per-user session in $_SESSION['user_files']
Download attempts are validated against user's file list
Unauthorized access attempts are logged and blocked

Important Security Notes

Ensure your .env file is never committed to git (it's in .gitignore)
Use HTTPS in production for secure cookie transmission
Files become inaccessible after session expires (files remain in outputs/ but can't be downloaded)
Consider setting up a cron job to clean old files from outputs/ directory

File Structure

.
├── api.py              # Python Flask API with Whisper & DeepL
├── login.php           # Landing page with Microsoft SSO
├── auth.php            # OAuth2 PKCE authentication handler
├── logout.php          # Session destruction handler
├── index.php           # Main application interface (auth required)
├── process.php         # PHP request handler (auth required)
├── download.php        # File download handler (auth + ownership check)
├── check_api.php       # API status checker (auth required)
├── test_download.php   # Download functionality tester (auth required)
├── config.php          # Configuration loader
├── auth_config.php     # Authentication & environment config
├── style.css           # Black/gold theme styles
├── .env                # Environment variables (NOT in git)
├── .env.example        # Environment variables template
├── .htaccess           # PHP upload limits
├── .gitignore          # Git ignore rules
├── composer.json       # PHP dependencies
├── requirements.txt    # Python dependencies
├── setup.sh            # Setup script
├── start_api.sh        # API start script
├── README.md           # This file
├── CLAUDE.md           # Claude Code guidance
├── outputs/            # Transcribed files directory
├── vendor/             # Composer dependencies (NOT in git)
└── venv/               # Python virtual environment

Production Deployment (Apache)

Prerequisites

Apache 2.4+
PHP 7.4+ with mod_php or PHP-FPM
Python 3.8+
FFmpeg
Root/sudo access for system configuration

Step 1: Install Required Apache Modules

Ubuntu/Debian:

sudo apt update
sudo apt install apache2 libapache2-mod-php php-curl php-xml php-mbstring
sudo a2enmod rewrite
sudo systemctl restart apache2

CentOS/RHEL:

sudo yum install httpd php php-curl php-xml php-mbstring
sudo systemctl enable httpd
sudo systemctl start httpd

Step 2: Deploy Application Files

# Clone or copy your application
cd /var/www
sudo git clone https://bitbucket.org/zlalani/voice2text.git voice2text
cd voice2text

# Set up Python environment
sudo chmod +x setup.sh
sudo ./setup.sh

# Set proper ownership and permissions
sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Step 3: Configure Apache Virtual Host

Create /etc/apache2/sites-available/voice2text.conf:

<VirtualHost *:80>
    ServerName voice2text.yourdomain.com
    ServerAdmin admin@yourdomain.com

    DocumentRoot /var/www/voice2text

    <Directory /var/www/voice2text>
        Options -Indexes +FollowSymLinks
        AllowOverride All
        Require all granted

        # PHP settings for large uploads
        php_value upload_max_filesize 350M
        php_value post_max_size 350M
        php_value max_execution_time 1200
        php_value max_input_time 1200
        php_value memory_limit 512M
    </Directory>

    # Protect sensitive files
    <FilesMatch "^(config\.php|\.git|\.htaccess)">
        Require all denied
    </FilesMatch>

    # Logging
    ErrorLog ${APACHE_LOG_DIR}/voice2text-error.log
    CustomLog ${APACHE_LOG_DIR}/voice2text-access.log combined
</VirtualHost>

Enable the site:

sudo a2ensite voice2text.conf
sudo systemctl reload apache2

Step 4: Configure PHP for Large Uploads

Edit /etc/php/7.4/apache2/php.ini (adjust version as needed):

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
max_input_time = 1200
memory_limit = 512M

Restart Apache:

sudo systemctl restart apache2

Step 5: Setup Python API as Systemd Service

Create /etc/systemd/system/voice2text-api.service:

[Unit]
Description=Voice to Text Whisper API
After=network.target

[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/voice2text
Environment="PATH=/var/www/voice2text/venv/bin"
ExecStart=/var/www/voice2text/venv/bin/python /var/www/voice2text/api.py
Restart=always
RestartSec=10

# Security settings
NoNewPrivileges=true
PrivateTmp=true

# Logging
StandardOutput=append:/var/log/voice2text-api.log
StandardError=append:/var/log/voice2text-api-error.log

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable voice2text-api
sudo systemctl start voice2text-api

# Check status
sudo systemctl status voice2text-api

# View logs
sudo journalctl -u voice2text-api -f

Step 6: Configure Firewall

UFW (Ubuntu):

sudo ufw allow 'Apache Full'
sudo ufw allow 5010/tcp  # Python API
sudo ufw enable

Firewalld (CentOS):

sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=5010/tcp
sudo firewall-cmd --reload

Step 7: SSL Configuration (Optional but Recommended)

Using Let's Encrypt with Certbot:

# Install Certbot
sudo apt install certbot python3-certbot-apache

# Get SSL certificate
sudo certbot --apache -d voice2text.yourdomain.com

# Auto-renewal is configured automatically
# Test renewal with:
sudo certbot renew --dry-run

Step 8: Verify Deployment

Check Apache status: sudo systemctl status apache2
Check API status: sudo systemctl status voice2text-api
Visit: http://voice2text.yourdomain.com/check_api.php
Test file upload with a small audio file

Monitoring and Maintenance

Check API Status

# View API logs
sudo journalctl -u voice2text-api -n 100

# Check if API is responding
curl http://localhost:5010/health

Check Apache Logs

# Error log
sudo tail -f /var/log/apache2/voice2text-error.log

# Access log
sudo tail -f /var/log/apache2/voice2text-access.log

Restart Services

# Restart Apache
sudo systemctl restart apache2

# Restart Python API
sudo systemctl restart voice2text-api

# Restart both
sudo systemctl restart apache2 voice2text-api

Clean Old Files

The outputs/ directory can grow large. Set up a cron job to clean old files:

# Edit crontab
sudo crontab -e

# Add this line to delete files older than 24 hours daily at 2 AM
0 2 * * * find /var/www/voice2text/outputs -type f -mtime +1 -delete

Troubleshooting

API Issues

API not connecting:

Check if API is running: sudo systemctl status voice2text-api
Test health endpoint: curl http://localhost:5010/health
Check API logs: sudo journalctl -u voice2text-api -n 50
Verify firewall allows port 5010
Visit check_api.php in browser for detailed status

API won't start:

Check Python version: python3 --version (must be 3.8+)
Verify virtual environment: ls -la venv/
Check dependencies: source venv/bin/activate && pip list
Review error logs: sudo journalctl -u voice2text-api -xe
Ensure FFmpeg is installed: which ffmpeg

Upload Issues

File upload fails:

Check file size limits in php.ini
Verify .htaccess is being read (requires AllowOverride All)
Check disk space: df -h
Verify outputs/ directory permissions: ls -ld outputs/
Check Apache error log: tail -f /var/log/apache2/error.log

"413 Request Entity Too Large":

If using Nginx as reverse proxy, add to nginx config:

client_max_body_size 350M;

Transcription Issues

Transcription fails:

Verify FFmpeg is installed: ffmpeg -version
Check audio file format (supported: mp3, wav, m4a, etc.)
Review API logs for specific errors
Test with a small file first
Ensure enough disk space in /tmp

Slow transcription:

Use a smaller Whisper model (tiny or base)
Consider using GPU acceleration (requires CUDA setup)
Upgrade server hardware (more CPU/RAM)
Reduce audio file length/quality

Translation Issues

Translation fails:

Verify DeepL API key is valid in config.php
Check DeepL API usage: https://www.deepl.com/pro-account
Review API response for specific error messages
Ensure internet connectivity for DeepL API

Permission Issues

403 Forbidden errors:

sudo chown -R www-data:www-data /var/www/voice2text
sudo chmod -R 755 /var/www/voice2text
sudo chmod 777 /var/www/voice2text/outputs

Can't write to outputs directory:

sudo mkdir -p /var/www/voice2text/outputs
sudo chown www-data:www-data /var/www/voice2text/outputs
sudo chmod 777 /var/www/voice2text/outputs

Performance Issues

Out of memory:

Use a smaller Whisper model (tiny or base)
Increase PHP memory limit in php.ini
Increase system swap space
Add more RAM to server

Timeout errors:

Increase PHP max_execution_time in php.ini
Increase Apache timeout in virtual host config
Process smaller audio files
Use faster Whisper model

Debugging Tips

Enable debug mode: Add to config.php:

error_reporting(E_ALL);
ini_set('display_errors', 1);

Check system resources:

# CPU and memory usage
htop

# Disk space
df -h

# Check running processes
ps aux | grep -E 'python|apache'

Test components individually:

Test PHP: Create test.php with <?php phpinfo(); ?>
Test Python API: curl http://localhost:5010/health
Test file upload: Use small test file first
Check browser console for JavaScript errors (F12)

License

MIT