No description
Find a file
DJP 846693b097 Initial commit: Voice to Text with Whisper & DeepL Translation
Features:
- OpenAI Whisper for audio transcription
- DeepL API for translation (30+ languages)
- Multiple output formats: TXT, VTT, SRT
- Flask Python API backend
- PHP frontend with black/gold theme
- Support for 350MB files
- Generates both original and translated files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 11:54:39 -04:00
.gitignore Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
.htaccess Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
api.py Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
config.php Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
download.php Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
index.php Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
process.php Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
README.md Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
requirements.txt Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
setup.sh Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
start_api.sh Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
style.css Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
test_download.php Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00
V2T.svg Initial commit: Voice to Text with Whisper & DeepL Translation 2025-10-21 11:54:39 -04:00

Voice to Text with Whisper & DeepL Translation

A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).

Features

  • 🎤 Audio transcription using OpenAI Whisper
  • 🌍 Translation using DeepL API (30+ languages)
  • 📝 Multiple output formats: Text, VTT, SRT
  • 🚀 Python Flask API backend
  • 💻 PHP frontend (MAMP/Apache compatible)
  • 📦 350MB file size limit
  • 📄 Generates both original and translated files

Requirements

  • Python 3.8 or higher
  • PHP 7.4 or higher
  • MAMP or Apache server
  • FFmpeg (for audio processing)

Installation

1. Install FFmpeg

macOS:

brew install ffmpeg

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

2. Setup Python Environment

Run the setup script:

chmod +x setup.sh
./setup.sh

This will:

  • Create a Python virtual environment
  • Install all dependencies (Flask, Whisper, etc.)
  • Create the outputs directory

3. Start the API Server

chmod +x start_api.sh
./start_api.sh

Or manually:

source venv/bin/activate
python api.py

The API will run on http://localhost:5010

4. Configure Web Server

Ensure your MAMP/Apache server points to this directory and PHP is enabled.

Usage

  1. Start the Python API server (see step 3 above)
  2. Open the web application in your browser
  3. Select output format (Text/VTT/SRT)
  4. (Optional) Enable translation and select target language
  5. Upload an audio file (max 350MB)
  6. Wait for processing
  7. Download original and/or translated transcription

Translation

The app uses DeepL API for high-quality translations. When translation is enabled:

  • The audio is first transcribed in its original language
  • The transcription is then translated to your selected target language
  • Both original and translated files are generated
  • Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more

Note: PHP settings are configured via .htaccess for 350MB uploads. If you need larger files, adjust php.ini:

upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200

API Endpoints

POST /transcribe

Transcribe audio file to text/VTT/SRT

Parameters:

  • audio (file): Audio file to transcribe
  • format (string): Output format (txt/vtt/srt)

Response:

{
  "success": true,
  "text": "transcribed text...",
  "filename": "output.txt",
  "format": "txt"
}

GET /health

Health check endpoint

GET /download/

Download transcribed file

Whisper Models

The default model is base which provides a good balance of speed and accuracy.

Available models:

  • tiny - Fastest, least accurate
  • base - Good balance (default)
  • small - Better accuracy, slower
  • medium - High accuracy, much slower
  • large - Best accuracy, very slow

To change the model, edit api.py line 24:

model = whisper.load_model("base")  # Change to desired model

File Structure

.
├── api.py              # Python Flask API
├── index.php           # Frontend interface
├── process.php         # PHP request handler
├── download.php        # File download handler
├── config.php          # Configuration
├── style.css           # Styles
├── requirements.txt    # Python dependencies
├── setup.sh           # Setup script
├── start_api.sh       # API start script
├── outputs/           # Transcribed files directory
└── venv/              # Python virtual environment

Production Deployment

For Apache deployment:

  1. Ensure mod_php is enabled
  2. Point document root to this directory
  3. Run the API as a systemd service (see below)

Systemd Service (Linux)

Create /etc/systemd/system/whisper-api.service:

[Unit]
Description=Whisper API Service
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/app/venv/bin/python /path/to/your/app/api.py
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable whisper-api
sudo systemctl start whisper-api

Troubleshooting

API not connecting:

  • Verify Python API is running on port 5010
  • Check config.php has correct API URL
  • Ensure firewall allows port 5010

Transcription fails:

  • Verify FFmpeg is installed: ffmpeg -version
  • Check audio file format is supported
  • Review API logs for errors

Out of memory:

  • Use a smaller Whisper model (tiny or base)
  • Reduce audio file size
  • Increase system memory

License

MIT