No description

Find a file

DJP 846693b097 Initial commit: Voice to Text with Whisper & DeepL Translation Features: - OpenAI Whisper for audio transcription - DeepL API for translation (30+ languages) - Multiple output formats: TXT, VTT, SRT - Flask Python API backend - PHP frontend with black/gold theme - Support for 350MB files - Generates both original and translated files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-10-21 11:54:39 -04:00
.gitignore	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
.htaccess	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
api.py	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
config.php	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
download.php	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
index.php	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
process.php	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
README.md	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
requirements.txt	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
setup.sh	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
start_api.sh	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
style.css	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
test_download.php	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00
V2T.svg	Initial commit: Voice to Text with Whisper & DeepL Translation	2025-10-21 11:54:39 -04:00

README.md

Voice to Text with Whisper & DeepL Translation

A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

Features

🎤 Audio transcription using OpenAI Whisper
🌍 Translation using DeepL API (30+ languages)
📝 Multiple output formats: Text, VTT, SRT
🚀 Python Flask API backend
💻 PHP frontend (MAMP/Apache compatible)
📦 350MB file size limit
📄 Generates both original and translated files

Requirements

Python 3.8 or higher
PHP 7.4 or higher
MAMP or Apache server
FFmpeg (for audio processing)

Installation

1. Install FFmpeg

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

2. Setup Python Environment

Run the setup script:

chmod +x setup.sh
./setup.sh

This will:

Create a Python virtual environment
Install all dependencies (Flask, Whisper, etc.)
Create the outputs directory

3. Start the API Server

chmod +x start_api.sh
./start_api.sh

Or manually:

source venv/bin/activate
python api.py

The API will run on http://localhost:5010

4. Configure Web Server

Ensure your MAMP/Apache server points to this directory and PHP is enabled.

Usage

Start the Python API server (see step 3 above)
Open the web application in your browser
Select output format (Text/VTT/SRT)
(Optional) Enable translation and select target language
Upload an audio file (max 350MB)
Wait for processing
Download original and/or translated transcription

Translation

The app uses DeepL API for high-quality translations. When translation is enabled:

The audio is first transcribed in its original language
The transcription is then translated to your selected target language
Both original and translated files are generated
Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200

API Endpoints

POST /transcribe

Transcribe audio file to text/VTT/SRT

Parameters:

audio (file): Audio file to transcribe
format (string): Output format (txt/vtt/srt)

Response:

{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}

GET /health

Health check endpoint

GET /download/

Download transcribed file

Whisper Models

The default model is base which provides a good balance of speed and accuracy.

Available models:

tiny - Fastest, least accurate
base - Good balance (default)
small - Better accuracy, slower
medium - High accuracy, much slower
large - Best accuracy, very slow

To change the model, edit api.py line 24:

model = whisper.load_model("base")  # Change to desired model

File Structure

.
├── api.py              # Python Flask API
├── index.php           # Frontend interface
├── process.php         # PHP request handler
├── download.php        # File download handler
├── config.php          # Configuration
├── style.css           # Styles
├── requirements.txt    # Python dependencies
├── setup.sh           # Setup script
├── start_api.sh       # API start script
├── outputs/           # Transcribed files directory
└── venv/              # Python virtual environment

Production Deployment

For Apache deployment:

Ensure mod_php is enabled
Point document root to this directory
Run the API as a systemd service (see below)

Systemd Service (Linux)

Create /etc/systemd/system/whisper-api.service:

[Unit]
Description=Whisper API Service
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/app/venv/bin/python /path/to/your/app/api.py
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable whisper-api
sudo systemctl start whisper-api

Troubleshooting

API not connecting:

Verify Python API is running on port 5010
Check config.php has correct API URL
Ensure firewall allows port 5010

Transcription fails:

Verify FFmpeg is installed: ffmpeg -version
Check audio file format is supported
Review API logs for errors

Out of memory:

Use a smaller Whisper model (tiny or base)
Reduce audio file size
Increase system memory

License

MIT