Features: - OpenAI Whisper for audio transcription - DeepL API for translation (30+ languages) - Multiple output formats: TXT, VTT, SRT - Flask Python API backend - PHP frontend with black/gold theme - Support for 350MB files - Generates both original and translated files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .gitignore | ||
| .htaccess | ||
| api.py | ||
| config.php | ||
| download.php | ||
| index.php | ||
| process.php | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
| start_api.sh | ||
| style.css | ||
| test_download.php | ||
| V2T.svg | ||
Voice to Text with Whisper & DeepL Translation
A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
Features
- 🎤 Audio transcription using OpenAI Whisper
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend
- 💻 PHP frontend (MAMP/Apache compatible)
- 📦 350MB file size limit
- 📄 Generates both original and translated files
Requirements
- Python 3.8 or higher
- PHP 7.4 or higher
- MAMP or Apache server
- FFmpeg (for audio processing)
Installation
1. Install FFmpeg
macOS:
brew install ffmpeg
Linux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpeg
Windows: Download from https://ffmpeg.org/download.html
2. Setup Python Environment
Run the setup script:
chmod +x setup.sh
./setup.sh
This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory
3. Start the API Server
chmod +x start_api.sh
./start_api.sh
Or manually:
source venv/bin/activate
python api.py
The API will run on http://localhost:5010
4. Configure Web Server
Ensure your MAMP/Apache server points to this directory and PHP is enabled.
Usage
- Start the Python API server (see step 3 above)
- Open the web application in your browser
- Select output format (Text/VTT/SRT)
- (Optional) Enable translation and select target language
- Upload an audio file (max 350MB)
- Wait for processing
- Download original and/or translated transcription
Translation
The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
API Endpoints
POST /transcribe
Transcribe audio file to text/VTT/SRT
Parameters:
audio(file): Audio file to transcribeformat(string): Output format (txt/vtt/srt)
Response:
{
"success": true,
"text": "transcribed text...",
"filename": "output.txt",
"format": "txt"
}
GET /health
Health check endpoint
GET /download/
Download transcribed file
Whisper Models
The default model is base which provides a good balance of speed and accuracy.
Available models:
tiny- Fastest, least accuratebase- Good balance (default)small- Better accuracy, slowermedium- High accuracy, much slowerlarge- Best accuracy, very slow
To change the model, edit api.py line 24:
model = whisper.load_model("base") # Change to desired model
File Structure
.
├── api.py # Python Flask API
├── index.php # Frontend interface
├── process.php # PHP request handler
├── download.php # File download handler
├── config.php # Configuration
├── style.css # Styles
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── start_api.sh # API start script
├── outputs/ # Transcribed files directory
└── venv/ # Python virtual environment
Production Deployment
For Apache deployment:
- Ensure mod_php is enabled
- Point document root to this directory
- Run the API as a systemd service (see below)
Systemd Service (Linux)
Create /etc/systemd/system/whisper-api.service:
[Unit]
Description=Whisper API Service
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/app/venv/bin/python /path/to/your/app/api.py
Restart=always
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable whisper-api
sudo systemctl start whisper-api
Troubleshooting
API not connecting:
- Verify Python API is running on port 5010
- Check
config.phphas correct API URL - Ensure firewall allows port 5010
Transcription fails:
- Verify FFmpeg is installed:
ffmpeg -version - Check audio file format is supported
- Review API logs for errors
Out of memory:
- Use a smaller Whisper model (tiny or base)
- Reduce audio file size
- Increase system memory
License
MIT