Initial commit: Voice to Text with Whisper & DeepL Translation

Features:
- OpenAI Whisper for audio transcription
- DeepL API for translation (30+ languages)
- Multiple output formats: TXT, VTT, SRT
- Flask Python API backend
- PHP frontend with black/gold theme
- Support for 350MB files
- Generates both original and translated files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
DJP 2025-10-21 11:54:39 -04:00
commit 846693b097
14 changed files with 1355 additions and 0 deletions

30
.gitignore vendored Normal file
View file

@ -0,0 +1,30 @@
# Python
venv/
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
# Output files
outputs/*.txt
outputs/*.vtt
outputs/*.srt
outputs/.DS_Store
# Logs
*.log
# OS Files
.DS_Store
Thumbs.db
# IDE
.vscode/
.idea/
*.swp
*.swo
# Temporary files
temp_audio.wav
*.tmp

5
.htaccess Normal file
View file

@ -0,0 +1,5 @@
php_value upload_max_filesize 350M
php_value post_max_size 350M
php_value max_execution_time 1200
php_value max_input_time 1200
php_value memory_limit 512M

208
README.md Normal file
View file

@ -0,0 +1,208 @@
# Voice to Text with Whisper & DeepL Translation
A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
## Features
- 🎤 Audio transcription using OpenAI Whisper
- 🌍 Translation using DeepL API (30+ languages)
- 📝 Multiple output formats: Text, VTT, SRT
- 🚀 Python Flask API backend
- 💻 PHP frontend (MAMP/Apache compatible)
- 📦 350MB file size limit
- 📄 Generates both original and translated files
## Requirements
- Python 3.8 or higher
- PHP 7.4 or higher
- MAMP or Apache server
- FFmpeg (for audio processing)
## Installation
### 1. Install FFmpeg
**macOS:**
```bash
brew install ffmpeg
```
**Linux (Ubuntu/Debian):**
```bash
sudo apt update
sudo apt install ffmpeg
```
**Windows:**
Download from https://ffmpeg.org/download.html
### 2. Setup Python Environment
Run the setup script:
```bash
chmod +x setup.sh
./setup.sh
```
This will:
- Create a Python virtual environment
- Install all dependencies (Flask, Whisper, etc.)
- Create the outputs directory
### 3. Start the API Server
```bash
chmod +x start_api.sh
./start_api.sh
```
Or manually:
```bash
source venv/bin/activate
python api.py
```
The API will run on http://localhost:5010
### 4. Configure Web Server
Ensure your MAMP/Apache server points to this directory and PHP is enabled.
## Usage
1. Start the Python API server (see step 3 above)
2. Open the web application in your browser
3. Select output format (Text/VTT/SRT)
4. (Optional) Enable translation and select target language
5. Upload an audio file (max 350MB)
6. Wait for processing
7. Download original and/or translated transcription
### Translation
The app uses DeepL API for high-quality translations. When translation is enabled:
- The audio is first transcribed in its original language
- The transcription is then translated to your selected target language
- Both original and translated files are generated
- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
**Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`:
```
upload_max_filesize = 350M
post_max_size = 350M
max_execution_time = 1200
```
## API Endpoints
### POST /transcribe
Transcribe audio file to text/VTT/SRT
**Parameters:**
- `audio` (file): Audio file to transcribe
- `format` (string): Output format (txt/vtt/srt)
**Response:**
```json
{
"success": true,
"text": "transcribed text...",
"filename": "output.txt",
"format": "txt"
}
```
### GET /health
Health check endpoint
### GET /download/<filename>
Download transcribed file
## Whisper Models
The default model is `base` which provides a good balance of speed and accuracy.
Available models:
- `tiny` - Fastest, least accurate
- `base` - Good balance (default)
- `small` - Better accuracy, slower
- `medium` - High accuracy, much slower
- `large` - Best accuracy, very slow
To change the model, edit `api.py` line 24:
```python
model = whisper.load_model("base") # Change to desired model
```
## File Structure
```
.
├── api.py # Python Flask API
├── index.php # Frontend interface
├── process.php # PHP request handler
├── download.php # File download handler
├── config.php # Configuration
├── style.css # Styles
├── requirements.txt # Python dependencies
├── setup.sh # Setup script
├── start_api.sh # API start script
├── outputs/ # Transcribed files directory
└── venv/ # Python virtual environment
```
## Production Deployment
For Apache deployment:
1. Ensure mod_php is enabled
2. Point document root to this directory
3. Run the API as a systemd service (see below)
### Systemd Service (Linux)
Create `/etc/systemd/system/whisper-api.service`:
```ini
[Unit]
Description=Whisper API Service
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/path/to/your/app
ExecStart=/path/to/your/app/venv/bin/python /path/to/your/app/api.py
Restart=always
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl enable whisper-api
sudo systemctl start whisper-api
```
## Troubleshooting
**API not connecting:**
- Verify Python API is running on port 5010
- Check `config.php` has correct API URL
- Ensure firewall allows port 5010
**Transcription fails:**
- Verify FFmpeg is installed: `ffmpeg -version`
- Check audio file format is supported
- Review API logs for errors
**Out of memory:**
- Use a smaller Whisper model (tiny or base)
- Reduce audio file size
- Increase system memory
## License
MIT

49
V2T.svg Normal file
View file

@ -0,0 +1,49 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 28.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 400 250" style="enable-background:new 0 0 400 250;" xml:space="preserve">
<style type="text/css">
.st0{font-family:'Montserrat-Light';}
.st1{font-size:14px;}
.st2{letter-spacing:23;}
.st3{letter-spacing:22;}
.st4{letter-spacing:19;}
.st5{letter-spacing:18;}
.st6{letter-spacing:28;}
.st7{letter-spacing:21;}
.st8{fill:none;stroke:#000000;stroke-width:2;stroke-miterlimit:10;}
</style>
<text transform="matrix(1 0 0 1 250.2509 47.3586)" class="st0 st1">T</text>
<text transform="matrix(1 0 0 1 258.1509 47.3586)" class="st0 st1 st2"> </text>
<text transform="matrix(1 0 0 1 286.2509 47.3586)" class="st0 st1">E</text>
<text transform="matrix(1 0 0 1 295.6509 47.3586)" class="st0 st1 st3"> </text>
<text transform="matrix(1 0 0 1 322.2509 47.3586)" class="st0 st1">X</text>
<text transform="matrix(1 0 0 1 331.2509 47.3586)" class="st0 st1 st3"> </text>
<text transform="matrix(1 0 0 1 358.2509 47.3586)" class="st0 st1">T</text>
<text transform="matrix(1 0 0 1 366.1509 47.3586)" class="st0 st1 st2"> </text>
<text transform="matrix(1 0 0 1 213.7628 47.3586)" class="st0 st1"> 2</text>
<text transform="matrix(1 0 0 1 16.5853 47.2885)" class="st0 st1 st4"> </text>
<text transform="matrix(1 0 0 1 41.0853 47.2885)" class="st0 st1"> V</text>
<text transform="matrix(1 0 0 1 54.2853 47.2885)" class="st0 st1 st5"> </text>
<text transform="matrix(1 0 0 1 77.0853 47.2885)" class="st0 st1">O</text>
<text transform="matrix(1 0 0 1 88.7853 47.2885)" class="st0 st1 st4"> </text>
<text transform="matrix(1 0 0 1 113.0852 47.2885)" class="st0 st1">I</text>
<text transform="matrix(1 0 0 1 117.1853 47.2885)" class="st0 st1 st6"> </text>
<text transform="matrix(1 0 0 1 149.0852 47.2885)" class="st0 st1">C</text>
<text transform="matrix(1 0 0 1 159.0852 47.2885)" class="st0 st1 st7"> </text>
<text transform="matrix(1 0 0 1 185.0852 47.2885)" class="st0 st1">E</text>
<path class="st8" d="M193.5,88.1h72.9c8.1,0,15.3,3.2,20.6,8.5s8.5,12.6,8.5,20.6v59.7c0,8.1-3.2,15.3-8.5,20.6
c-5.3,5.3-12.6,8.5-20.6,8.5h-72.9c-8.1,0-15.3-3.2-20.6-8.5c-5.3-5.3-8.5-12.6-8.5-20.6v-5.6c0-1.1-0.2-2-0.6-2.9s-1-1.7-1.9-2.3
l-7.7-6c-2-1.7-3.7-3.6-4.7-5.8c-1.1-2.2-1.7-4.7-1.7-7.3s0.6-5.1,1.7-7.3c1.1-2.2,2.7-4.2,4.7-5.8l7.7-6c0.8-0.6,1.5-1.4,1.9-2.3
c0.4-0.8,0.6-1.9,0.6-2.9v-5.6c0-8.1,3.2-15.3,8.5-20.6C178.2,91.5,185.5,88.1,193.5,88.1L193.5,88.1z M232.3,178.3
c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h39.3c3.5,0,6.4,2.9,6.4,6.4c0,3.5-2.9,6.4-6.4,6.4H232.3z M196.8,153.4
c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h74.8c3.5,0,6.4,2.9,6.4,6.4c0,3.5-2.9,6.4-6.4,6.4H196.8z M127.8,144.4
c1.9,2,1.8,5.2-0.2,7.1c-2,1.9-5.2,1.8-7.1-0.2c-3.7-3.9-5.7-8.9-6.1-14c-0.4-5,0.9-10.2,4.1-14.6s7.6-7.4,12.5-8.8
c4.9-1.4,10.4-1.1,15.2,1c2.5,1.1,3.7,4.1,2.6,6.6c-1.1,2.5-4.1,3.7-6.6,2.6c-2.8-1.2-5.7-1.4-8.5-0.6c-2.7,0.7-5.2,2.4-6.9,4.9
c-1.8,2.5-2.5,5.4-2.3,8.2C124.5,139.5,125.7,142.3,127.8,144.4L127.8,144.4z M96.8,146.8c1.6,2.3,0.9,5.5-1.4,6.9
c-2.3,1.6-5.5,0.9-6.9-1.4c-5-7.6-7.6-16.4-7.4-25.4c0.1-8.7,2.8-17.4,8.1-25.1c5.4-7.6,12.8-13.1,20.9-16.1
c8.4-3.1,17.6-3.7,26.4-1.5c2.7,0.6,4.4,3.4,3.7,6.1s-3.4,4.4-6.1,3.7c-6.9-1.7-14-1.3-20.5,1.1c-6.3,2.3-12,6.6-16.2,12.5
c-4.2,5.9-6.3,12.7-6.4,19.4C90.9,134.2,92.9,141,96.8,146.8L96.8,146.8z"/>
<path class="st8" d="M196.8,128.8c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h74.8c3.5,0,6.4,2.9,6.4,6.4s-2.9,6.4-6.4,6.4H196.8
z"/>
</svg>

After

Width:  |  Height:  |  Size: 3.6 KiB

209
api.py Normal file
View file

@ -0,0 +1,209 @@
#!/usr/bin/env python3
"""
Voice to Text API using OpenAI Whisper with DeepL Translation
Transcribes audio files to text, VTT, or SRT format and optionally translates them
"""
from flask import Flask, request, jsonify, send_file
from flask_cors import CORS
import whisper
import deepl
import os
import tempfile
from datetime import timedelta
import logging
app = Flask(__name__)
CORS(app)
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Load Whisper model (using base model for balance of speed and accuracy)
# Options: tiny, base, small, medium, large
logger.info("Loading Whisper model...")
model = whisper.load_model("base")
logger.info("Whisper model loaded successfully")
# Initialize DeepL translator
DEEPL_API_KEY = "28743b40-d23f-416d-8223-9b868c9531dc"
translator = deepl.Translator(DEEPL_API_KEY)
logger.info("DeepL translator initialized")
# Directory for output files
OUTPUT_DIR = os.path.join(os.path.dirname(__file__), 'outputs')
os.makedirs(OUTPUT_DIR, exist_ok=True)
def format_timestamp(seconds):
"""Convert seconds to SRT timestamp format (HH:MM:SS,mmm)"""
td = timedelta(seconds=seconds)
hours = td.seconds // 3600
minutes = (td.seconds % 3600) // 60
secs = td.seconds % 60
millis = td.microseconds // 1000
return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
def format_timestamp_vtt(seconds):
"""Convert seconds to VTT timestamp format (HH:MM:SS.mmm)"""
td = timedelta(seconds=seconds)
hours = td.seconds // 3600
minutes = (td.seconds % 3600) // 60
secs = td.seconds % 60
millis = td.microseconds // 1000
return f"{hours:02d}:{minutes:02d}:{secs:02d}.{millis:03d}"
def generate_srt(segments):
"""Generate SRT format from Whisper segments"""
srt_content = []
for i, segment in enumerate(segments, 1):
start = format_timestamp(segment['start'])
end = format_timestamp(segment['end'])
text = segment['text'].strip()
srt_content.append(f"{i}\n{start} --> {end}\n{text}\n")
return "\n".join(srt_content)
def generate_vtt(segments):
"""Generate VTT format from Whisper segments"""
vtt_content = ["WEBVTT\n"]
for segment in segments:
start = format_timestamp_vtt(segment['start'])
end = format_timestamp_vtt(segment['end'])
text = segment['text'].strip()
vtt_content.append(f"{start} --> {end}\n{text}\n")
return "\n".join(vtt_content)
def translate_text(text, target_lang):
"""Translate text using DeepL API"""
try:
logger.info(f"Translating text to {target_lang}...")
result = translator.translate_text(text, target_lang=target_lang)
return result.text
except deepl.exceptions.DeepLException as e:
logger.error(f"DeepL translation error: {str(e)}")
raise Exception(f"Translation failed: {str(e)}")
@app.route('/health', methods=['GET'])
def health_check():
"""Health check endpoint"""
return jsonify({"status": "healthy", "model": "whisper-base"})
@app.route('/transcribe', methods=['POST'])
def transcribe():
"""
Transcribe audio file to text, VTT, or SRT format with optional translation
Expects: multipart/form-data with 'audio' file, 'format' (txt/vtt/srt),
'translate' (0/1), and 'target_lang' (e.g., 'EN-US')
"""
try:
# Check if audio file is present
if 'audio' not in request.files:
return jsonify({"error": "No audio file provided"}), 400
audio_file = request.files['audio']
output_format = request.form.get('format', 'txt').lower()
enable_translation = request.form.get('translate', '0') == '1'
target_lang = request.form.get('target_lang', 'EN-US')
if audio_file.filename == '':
return jsonify({"error": "Empty filename"}), 400
# Validate format
if output_format not in ['txt', 'vtt', 'srt']:
return jsonify({"error": "Invalid format. Use txt, vtt, or srt"}), 400
logger.info(f"Processing {audio_file.filename} - Format: {output_format}, Translation: {enable_translation}, Target: {target_lang}")
# Save uploaded file temporarily
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(audio_file.filename)[1]) as temp_audio:
audio_file.save(temp_audio.name)
temp_audio_path = temp_audio.name
try:
# Transcribe with Whisper
logger.info(f"Transcribing {audio_file.filename}...")
result = model.transcribe(temp_audio_path, verbose=False)
logger.info("Transcription complete")
# Generate output based on format
if output_format == 'txt':
content = result['text']
mimetype = 'text/plain'
extension = 'txt'
elif output_format == 'vtt':
content = generate_vtt(result['segments'])
mimetype = 'text/vtt'
extension = 'vtt'
elif output_format == 'srt':
content = generate_srt(result['segments'])
mimetype = 'text/plain'
extension = 'srt'
# Save original output file
base_filename = os.path.splitext(audio_file.filename)[0]
output_filename = f"{base_filename}_original.{extension}"
output_path = os.path.join(OUTPUT_DIR, output_filename)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(content)
response_data = {
"success": True,
"text": result['text'] if output_format == 'txt' else None,
"filename": output_filename,
"format": output_format
}
# Handle translation if requested
if enable_translation:
logger.info(f"Translating to {target_lang}...")
translated_content = translate_text(content, target_lang)
# Save translated output file
translated_filename = f"{base_filename}_translated.{extension}"
translated_path = os.path.join(OUTPUT_DIR, translated_filename)
with open(translated_path, 'w', encoding='utf-8') as f:
f.write(translated_content)
response_data["translated_filename"] = translated_filename
response_data["translated_text"] = translated_content if output_format == 'txt' else None
logger.info("Translation complete")
return jsonify(response_data)
finally:
# Clean up temporary audio file
if os.path.exists(temp_audio_path):
os.remove(temp_audio_path)
except Exception as e:
logger.error(f"Error during transcription: {str(e)}")
return jsonify({"error": f"Transcription failed: {str(e)}"}), 500
@app.route('/download/<filename>', methods=['GET'])
def download_file(filename):
"""Download a transcribed file"""
try:
file_path = os.path.join(OUTPUT_DIR, filename)
if not os.path.exists(file_path):
return jsonify({"error": "File not found"}), 404
return send_file(file_path, as_attachment=True)
except Exception as e:
logger.error(f"Error downloading file: {str(e)}")
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
# Run on port 5010 by default
port = int(os.environ.get('PORT', 5010))
app.run(host='0.0.0.0', port=port, debug=False)

13
config.php Normal file
View file

@ -0,0 +1,13 @@
<?php
// Start session only if not already started
if (session_status() === PHP_SESSION_NONE) {
session_start();
}
// Python API endpoint (adjust port if needed)
define('PYTHON_API_URL', 'http://localhost:5010');
// DeepL API Key
define('DEEPL_API_KEY', '28743b40-d23f-416d-8223-9b868c9531dc');
// Other configuration settings can be added here

69
download.php Normal file
View file

@ -0,0 +1,69 @@
<?php
/**
* Download handler for transcribed files
*/
// Prevent any output before headers
ob_start();
// Enable error reporting for debugging
error_reporting(E_ALL);
ini_set('display_errors', 0); // Don't display errors, log them
if (!isset($_GET['file'])) {
http_response_code(400);
die('No file specified');
}
$filename = basename($_GET['file']); // Security: prevent directory traversal
$filepath = __DIR__ . '/outputs/' . $filename;
// Debug logging
error_log("Download request for: " . $filename);
error_log("Full path: " . $filepath);
error_log("File exists: " . (file_exists($filepath) ? 'yes' : 'no'));
if (!file_exists($filepath)) {
http_response_code(404);
error_log("File not found: " . $filepath);
die('File not found: ' . $filename);
}
// Check if file is readable
if (!is_readable($filepath)) {
http_response_code(403);
error_log("File not readable: " . $filepath);
die('File not readable');
}
// Determine content type based on extension
$extension = strtolower(pathinfo($filename, PATHINFO_EXTENSION));
$contentTypes = [
'txt' => 'text/plain; charset=utf-8',
'vtt' => 'text/vtt; charset=utf-8',
'srt' => 'text/plain; charset=utf-8' // Changed to text/plain for better compatibility
];
$contentType = $contentTypes[$extension] ?? 'application/octet-stream';
// Clear all output buffers
while (ob_get_level()) {
ob_end_clean();
}
// Prevent any caching
header('Content-Description: File Transfer');
header('Content-Type: ' . $contentType);
header('Content-Disposition: attachment; filename="' . basename($filename) . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($filepath));
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Expires: 0');
// Flush system output buffer
flush();
// Output file
readfile($filepath);
exit;

160
index.php Normal file
View file

@ -0,0 +1,160 @@
<?php
require_once 'config.php';
?>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Voice to Text</title>
<link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@400;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="style.css">
<script src="https://cdn.jsdelivr.net/npm/dompurify@2.3.3/dist/purify.min.js"></script>
</head>
<body>
<div class="app-container">
<img src="V2T.svg" alt="Voice to Text" class="logo">
<div id="initialInstruction" class="initial-instruction">
Before we start, select output format and upload the Voice File (Max 350 Megabytes in size)
</div>
<div class="format-selection">
<label for="outputFormat">Output Format:</label>
<select id="outputFormat" name="outputFormat">
<option value="txt">Text Document</option>
<option value="vtt">VTT (WebVTT)</option>
<option value="srt">SRT (SubRip)</option>
</select>
</div>
<div class="translation-section">
<div class="translation-toggle">
<label class="toggle-label">
<input type="checkbox" id="enableTranslation" name="enableTranslation">
<span class="toggle-text">Translate with DeepL</span>
</label>
</div>
<div id="languageSelector" class="language-selector" style="display: none;">
<label for="targetLanguage">Translate to:</label>
<select id="targetLanguage" name="targetLanguage">
<option value="BG">Bulgarian</option>
<option value="CS">Czech</option>
<option value="DA">Danish</option>
<option value="DE">German</option>
<option value="EL">Greek</option>
<option value="EN-GB">English (British)</option>
<option value="EN-US" selected>English (American)</option>
<option value="ES">Spanish</option>
<option value="ET">Estonian</option>
<option value="FI">Finnish</option>
<option value="FR">French</option>
<option value="HU">Hungarian</option>
<option value="ID">Indonesian</option>
<option value="IT">Italian</option>
<option value="JA">Japanese</option>
<option value="KO">Korean</option>
<option value="LT">Lithuanian</option>
<option value="LV">Latvian</option>
<option value="NB">Norwegian (Bokmål)</option>
<option value="NL">Dutch</option>
<option value="PL">Polish</option>
<option value="PT-BR">Portuguese (Brazilian)</option>
<option value="PT-PT">Portuguese (European)</option>
<option value="RO">Romanian</option>
<option value="RU">Russian</option>
<option value="SK">Slovak</option>
<option value="SL">Slovenian</option>
<option value="SV">Swedish</option>
<option value="TR">Turkish</option>
<option value="UK">Ukrainian</option>
<option value="ZH">Chinese (simplified)</option>
</select>
</div>
</div>
<div class="file-upload-container">
<label for="fileUpload" class="file-upload-label">Upload Voice File</label>
<input type="file" id="fileUpload" name="voiceFile" hidden>
</div>
<div id="chatArea" class="chat-area"></div>
<button id="downloadButton" style="display: none;">Download Response</button>
</div>
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script>
$(document).ready(function() {
// Toggle language selector when translation is enabled/disabled
$('#enableTranslation').on('change', function() {
if ($(this).is(':checked')) {
$('#languageSelector').slideDown(300);
} else {
$('#languageSelector').slideUp(300);
}
});
$('#fileUpload').on('change', function() {
var file = this.files[0];
if (file) {
var formData = new FormData();
formData.append('voiceFile', file);
formData.append('outputFormat', $('#outputFormat').val());
formData.append('enableTranslation', $('#enableTranslation').is(':checked') ? '1' : '0');
formData.append('targetLanguage', $('#targetLanguage').val());
$('#chatArea').html('<div class="processing-container"><div class="processing-text">Processing audio file...</div><div class="progress-bar"><div class="progress-bar-fill"></div></div></div>');
$.ajax({
url: 'process.php',
type: 'POST',
data: formData,
processData: false,
contentType: false,
success: function(response) {
var data = JSON.parse(response);
if (data.success) {
if (data.fileUrl) {
var message = '<div class="message bot-message">Transcription complete!<br>';
message += '<a href="' + data.fileUrl + '" download>Download Original ' + data.format.toUpperCase() + ' file</a>';
if (data.translatedFileUrl) {
message += '<br><a href="' + data.translatedFileUrl + '" download>Download Translated ' + data.format.toUpperCase() + ' file</a>';
}
message += '</div>';
$('#chatArea').html(message);
} else {
$('#chatArea').html('<div class="message bot-message">' + data.response + '</div>');
$('#downloadButton').show();
}
} else {
$('#chatArea').html('<div class="message error-message">' + data.error + '</div>');
}
},
error: function() {
$('#chatArea').html('<div class="message error-message">An error occurred while processing the file.</div>');
}
});
}
});
$('#downloadButton').on('click', function() {
var responseText = $('.bot-message').text();
var blob = new Blob([responseText], { type: 'text/plain' });
var url = URL.createObjectURL(blob);
var a = document.createElement('a');
a.href = url;
a.download = 'voice_to_text_response.txt';
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
});
});
</script>
</body>
</html>

86
process.php Normal file
View file

@ -0,0 +1,86 @@
<?php
session_start();
require_once 'config.php';
if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['voiceFile'])) {
$file = $_FILES['voiceFile'];
$outputFormat = isset($_POST['outputFormat']) ? $_POST['outputFormat'] : 'txt';
$enableTranslation = isset($_POST['enableTranslation']) ? $_POST['enableTranslation'] : '0';
$targetLanguage = isset($_POST['targetLanguage']) ? $_POST['targetLanguage'] : 'EN-US';
// Check file size
if ($file['size'] > 350 * 1024 * 1024) { // 350 MB limit
echo json_encode(['success' => false, 'error' => "File is too large. Maximum size is 350 MB."]);
exit;
}
// Prepare the file for sending to Python API
$formData = [
'audio' => new CURLFile($file['tmp_name'], $file['type'], $file['name']),
'format' => $outputFormat,
'translate' => $enableTranslation,
'target_lang' => $targetLanguage
];
// Send request to Python API
$ch = curl_init(PYTHON_API_URL . '/transcribe');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $formData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes timeout for large files
$response = curl_exec($ch);
if (curl_errno($ch)) {
echo json_encode(['success' => false, 'error' => "Error processing file: " . curl_error($ch)]);
} else {
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpCode === 200) {
$data = json_decode($response, true);
if (isset($data['success']) && $data['success']) {
// For text format, return the text directly
if ($outputFormat === 'txt' && isset($data['text'])) {
$response = [
'success' => true,
'response' => nl2br(htmlspecialchars($data['text'])),
'format' => $outputFormat
];
// Add translated text if available
if (isset($data['translated_text'])) {
$response['translatedResponse'] = nl2br(htmlspecialchars($data['translated_text']));
}
echo json_encode($response);
} else {
// For VTT/SRT, provide download links
$downloadUrl = 'download.php?file=' . urlencode($data['filename']);
$response = [
'success' => true,
'fileUrl' => $downloadUrl,
'filename' => $data['filename'],
'format' => $outputFormat
];
// Add translated file download link if available
if (isset($data['translated_filename'])) {
$response['translatedFileUrl'] = 'download.php?file=' . urlencode($data['translated_filename']);
$response['translatedFilename'] = $data['translated_filename'];
}
echo json_encode($response);
}
} else {
echo json_encode(['success' => false, 'error' => $data['error'] ?? "Unknown error occurred"]);
}
} else {
echo json_encode(['success' => false, 'error' => "Server error: HTTP $httpCode"]);
}
}
curl_close($ch);
} else {
echo json_encode(['success' => false, 'error' => "Invalid request."]);
}

6
requirements.txt Normal file
View file

@ -0,0 +1,6 @@
flask>=3.0.0
flask-cors>=4.0.0
openai-whisper
numpy<2.0.0
ffmpeg-python
deepl

84
setup.sh Executable file
View file

@ -0,0 +1,84 @@
#!/bin/bash
# Setup script for Voice to Text Whisper API
echo "==================================="
echo "Voice to Text - Setup Script"
echo "==================================="
echo ""
# Check if Python 3 is installed
if ! command -v python3 &> /dev/null; then
echo "Error: Python 3 is not installed. Please install Python 3.8 or higher."
exit 1
fi
PYTHON_VERSION=$(python3 -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
echo "Python 3 found: Python $PYTHON_VERSION"
# Check if Python version is too new (3.12+)
PYTHON_MAJOR=$(python3 -c 'import sys; print(sys.version_info[0])')
PYTHON_MINOR=$(python3 -c 'import sys; print(sys.version_info[1])')
if [ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -ge 12 ]; then
echo ""
echo "WARNING: Python 3.12+ detected. Some packages may have compatibility issues."
echo "Recommended: Use Python 3.10 or 3.11 for best compatibility."
echo ""
read -p "Continue anyway? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
echo ""
# Remove old venv if exists
if [ -d "venv" ]; then
echo "Removing old virtual environment..."
rm -rf venv
fi
# Determine which Python to use
if command -v python3.11 &> /dev/null; then
PYTHON_CMD=python3.11
echo "Using Python 3.11"
elif command -v python3.10 &> /dev/null; then
PYTHON_CMD=python3.10
echo "Using Python 3.10"
else
PYTHON_CMD=python3
echo "Using default Python 3"
fi
# Create virtual environment
echo "Creating virtual environment..."
$PYTHON_CMD -m venv venv
# Activate virtual environment
echo "Activating virtual environment..."
source venv/bin/activate
# Upgrade pip
echo "Upgrading pip..."
pip install --upgrade pip
# Install dependencies
echo "Installing dependencies (this may take a few minutes)..."
pip install -r requirements.txt
# Create outputs directory
echo "Creating outputs directory..."
mkdir -p outputs
echo ""
echo "==================================="
echo "Setup complete!"
echo "==================================="
echo ""
echo "To start the API server:"
echo " 1. Activate the virtual environment: source venv/bin/activate"
echo " 2. Run the API: python api.py"
echo ""
echo "The API will run on http://localhost:5010"
echo ""

9
start_api.sh Executable file
View file

@ -0,0 +1,9 @@
#!/bin/bash
# Start the Whisper API server
# Activate virtual environment
source venv/bin/activate
# Start the API
echo "Starting Whisper API server on http://localhost:5010..."
python api.py

386
style.css Executable file
View file

@ -0,0 +1,386 @@
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Montserrat', sans-serif;
background: #000000;
min-height: 100vh;
padding: 20px;
display: flex;
justify-content: center;
align-items: center;
}
input, button, textarea, select, label {
font-family: 'Montserrat', sans-serif;
}
.app-container {
background: #1a1a1a;
border-radius: 20px;
box-shadow: 0 20px 60px rgba(255, 196, 7, 0.2);
border: 1px solid #333;
padding: 40px;
max-width: 800px;
width: 100%;
animation: fadeIn 0.5s ease-in;
}
@keyframes fadeIn {
from {
opacity: 0;
transform: translateY(20px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
.logo {
width: 400px;
height: auto;
display: block;
margin: 0 auto 30px;
filter: invert(1) brightness(2);
}
.initial-instruction {
text-align: center;
font-size: 16px;
color: #999;
margin-bottom: 30px;
font-weight: 400;
line-height: 1.6;
}
.format-selection {
background: #0a0a0a;
padding: 20px;
border-radius: 12px;
margin-bottom: 25px;
display: flex;
align-items: center;
justify-content: center;
gap: 15px;
border: 1px solid #333;
}
.format-selection label {
font-weight: 600;
color: #FFC407;
font-size: 15px;
}
.format-selection select {
padding: 10px 20px;
border: 2px solid #333;
border-radius: 8px;
font-size: 15px;
font-weight: 500;
color: #FFC407;
background: #000;
cursor: pointer;
transition: all 0.3s ease;
min-width: 200px;
}
.format-selection select:hover {
border-color: #FFC407;
}
.format-selection select:focus {
outline: none;
border-color: #FFC407;
box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.2);
}
.translation-section {
background: #0a0a0a;
padding: 20px;
border-radius: 12px;
margin-bottom: 25px;
border: 1px solid #333;
}
.translation-toggle {
margin-bottom: 15px;
}
.toggle-label {
display: flex;
align-items: center;
cursor: pointer;
user-select: none;
}
.toggle-label input[type="checkbox"] {
width: 20px;
height: 20px;
cursor: pointer;
margin-right: 10px;
accent-color: #FFC407;
}
.toggle-text {
color: #FFC407;
font-weight: 600;
font-size: 15px;
}
.language-selector {
display: flex;
align-items: center;
justify-content: center;
gap: 15px;
padding-top: 15px;
border-top: 1px solid #333;
}
.language-selector label {
font-weight: 600;
color: #999;
font-size: 14px;
}
.language-selector select {
padding: 10px 20px;
border: 2px solid #333;
border-radius: 8px;
font-size: 14px;
font-weight: 500;
color: #FFC407;
background: #000;
cursor: pointer;
transition: all 0.3s ease;
min-width: 220px;
}
.language-selector select:hover {
border-color: #FFC407;
}
.language-selector select:focus {
outline: none;
border-color: #FFC407;
box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.2);
}
.file-upload-container {
text-align: center;
margin-bottom: 25px;
}
.file-upload-label {
display: inline-block;
padding: 15px 40px;
cursor: pointer;
background: #FFC407;
color: #000;
border-radius: 50px;
font-weight: 700;
font-size: 16px;
transition: all 0.3s ease;
box-shadow: 0 4px 15px rgba(255, 196, 7, 0.4);
}
.file-upload-label:hover {
transform: translateY(-2px);
box-shadow: 0 6px 20px rgba(255, 196, 7, 0.6);
background: #ffcd2e;
}
.file-upload-label:active {
transform: translateY(0);
}
.file-upload-label.disabled {
background: #333;
color: #666;
cursor: not-allowed;
box-shadow: none;
}
.chat-area {
min-height: 200px;
max-height: 400px;
border: 2px solid #333;
border-radius: 12px;
overflow-y: auto;
padding: 20px;
background: #0a0a0a;
margin-bottom: 20px;
}
.chat-area:empty {
display: none;
}
.message {
padding: 15px 20px;
margin-bottom: 15px;
border-radius: 12px;
line-height: 1.6;
animation: slideIn 0.3s ease;
}
@keyframes slideIn {
from {
opacity: 0;
transform: translateX(-20px);
}
to {
opacity: 1;
transform: translateX(0);
}
}
.bot-message {
background: rgba(255, 196, 7, 0.1);
border-left: 4px solid #FFC407;
color: #fff;
}
.bot-message a {
color: #FFC407;
font-weight: 600;
text-decoration: none;
border-bottom: 2px solid #FFC407;
transition: all 0.2s ease;
}
.bot-message a:hover {
color: #ffcd2e;
border-bottom-color: #ffcd2e;
}
.error-message {
background: rgba(255, 0, 0, 0.1);
border-left: 4px solid #ff3333;
color: #ff6666;
}
.processing-container {
padding: 30px;
text-align: center;
}
.processing-text {
color: #FFC407;
font-weight: 600;
font-size: 18px;
margin-bottom: 20px;
animation: pulseAnimation 2s infinite;
}
@keyframes pulseAnimation {
0%, 100% { opacity: 1; }
50% { opacity: 0.6; }
}
.progress-bar {
width: 100%;
height: 8px;
background: #333;
border-radius: 10px;
overflow: hidden;
position: relative;
}
.progress-bar-fill {
height: 100%;
background: linear-gradient(90deg, #FFC407, #ffcd2e, #FFC407);
background-size: 200% 100%;
border-radius: 10px;
animation: progressAnimation 1.5s ease-in-out infinite;
box-shadow: 0 0 10px rgba(255, 196, 7, 0.5);
}
@keyframes progressAnimation {
0% {
width: 0%;
background-position: 0% 0%;
}
50% {
width: 70%;
background-position: 100% 0%;
}
100% {
width: 100%;
background-position: 200% 0%;
}
}
button {
padding: 12px 30px;
cursor: pointer;
background: #FFC407;
color: #000;
border: none;
border-radius: 50px;
font-weight: 700;
font-size: 15px;
transition: all 0.3s ease;
box-shadow: 0 4px 15px rgba(255, 196, 7, 0.4);
}
button:hover {
transform: translateY(-2px);
box-shadow: 0 6px 20px rgba(255, 196, 7, 0.6);
background: #ffcd2e;
}
button:active {
transform: translateY(0);
}
#downloadButton {
display: none;
margin: 0 auto;
}
/* Responsive design */
@media screen and (max-width: 768px) {
.app-container {
padding: 25px;
}
.logo {
width: 300px;
}
.initial-instruction {
font-size: 14px;
}
.format-selection {
flex-direction: column;
gap: 10px;
}
.format-selection select {
width: 100%;
}
.file-upload-label {
padding: 12px 30px;
font-size: 14px;
}
}
@media screen and (max-width: 480px) {
body {
padding: 10px;
}
.app-container {
padding: 20px;
}
.logo {
width: 250px;
}
}

41
test_download.php Normal file
View file

@ -0,0 +1,41 @@
<?php
/**
* Test download functionality
*/
// List all files in outputs directory
$outputDir = __DIR__ . '/outputs/';
echo "<h2>Files in outputs directory:</h2>";
echo "<ul>";
if (is_dir($outputDir)) {
$files = scandir($outputDir);
foreach ($files as $file) {
if ($file !== '.' && $file !== '..' && $file !== '.DS_Store') {
$filepath = $outputDir . $file;
$size = filesize($filepath);
$readable = is_readable($filepath) ? 'Yes' : 'No';
$extension = pathinfo($file, PATHINFO_EXTENSION);
echo "<li>";
echo "<strong>$file</strong><br>";
echo "Size: " . number_format($size) . " bytes<br>";
echo "Readable: $readable<br>";
echo "Extension: $extension<br>";
echo "<a href='download.php?file=" . urlencode($file) . "' target='_blank'>Test Download</a>";
echo "</li><br>";
}
}
} else {
echo "<li>Directory not found</li>";
}
echo "</ul>";
// Test file operations
echo "<h2>Directory permissions:</h2>";
echo "Directory: $outputDir<br>";
echo "Exists: " . (is_dir($outputDir) ? 'Yes' : 'No') . "<br>";
echo "Readable: " . (is_readable($outputDir) ? 'Yes' : 'No') . "<br>";
echo "Writable: " . (is_writable($outputDir) ? 'Yes' : 'No') . "<br>";
?>