Initial commit: Voice to Text with Whisper & DeepL Translation

Features: - OpenAI Whisper for audio transcription - DeepL API for translation (30+ languages) - Multiple output formats: TXT, VTT, SRT - Flask Python API backend - PHP frontend with black/gold theme - Support for 350MB files - Generates both original and translated files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 11:54:39 -04:00 · 2025-10-21 11:54:39 -04:00 · 846693b097
commit 846693b097
14 changed files with 1355 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,30 @@
+# Python
+venv/
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+
+# Output files
+outputs/*.txt
+outputs/*.vtt
+outputs/*.srt
+outputs/.DS_Store
+
+# Logs
+*.log
+
+# OS Files
+.DS_Store
+Thumbs.db
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# Temporary files
+temp_audio.wav
+*.tmp
--- a/.htaccess
+++ b/.htaccess
@ -0,0 +1,5 @@
+php_value upload_max_filesize 350M
+php_value post_max_size 350M
+php_value max_execution_time 1200
+php_value max_input_time 1200
+php_value memory_limit 512M
--- a/README.md
+++ b/README.md
@ -0,0 +1,208 @@
+# Voice to Text with Whisper & DeepL Translation
+
+A web application that converts audio files to text using OpenAI's Whisper model and translates them using DeepL API. Supports multiple output formats: plain text, VTT (WebVTT), and SRT (SubRip).
+
+## Features
+
+- 🎤 Audio transcription using OpenAI Whisper
+- 🌍 Translation using DeepL API (30+ languages)
+- 📝 Multiple output formats: Text, VTT, SRT
+- 🚀 Python Flask API backend
+- 💻 PHP frontend (MAMP/Apache compatible)
+- 📦 350MB file size limit
+- 📄 Generates both original and translated files
+
+## Requirements
+
+- Python 3.8 or higher
+- PHP 7.4 or higher
+- MAMP or Apache server
+- FFmpeg (for audio processing)
+
+## Installation
+
+### 1. Install FFmpeg
+
+**macOS:**
+```bash
+brew install ffmpeg
+```
+
+**Linux (Ubuntu/Debian):**
+```bash
+sudo apt update
+sudo apt install ffmpeg
+```
+
+**Windows:**
+Download from https://ffmpeg.org/download.html
+
+### 2. Setup Python Environment
+
+Run the setup script:
+```bash
+chmod +x setup.sh
+./setup.sh
+```
+
+This will:
+- Create a Python virtual environment
+- Install all dependencies (Flask, Whisper, etc.)
+- Create the outputs directory
+
+### 3. Start the API Server
+
+```bash
+chmod +x start_api.sh
+./start_api.sh
+```
+
+Or manually:
+```bash
+source venv/bin/activate
+python api.py
+```
+
+The API will run on http://localhost:5010
+
+### 4. Configure Web Server
+
+Ensure your MAMP/Apache server points to this directory and PHP is enabled.
+
+## Usage
+
+1. Start the Python API server (see step 3 above)
+2. Open the web application in your browser
+3. Select output format (Text/VTT/SRT)
+4. (Optional) Enable translation and select target language
+5. Upload an audio file (max 350MB)
+6. Wait for processing
+7. Download original and/or translated transcription
+
+### Translation
+
+The app uses DeepL API for high-quality translations. When translation is enabled:
+- The audio is first transcribed in its original language
+- The transcription is then translated to your selected target language
+- Both original and translated files are generated
+- Supports 30+ languages including English, Spanish, French, German, Portuguese, Japanese, Chinese, and more
+
+**Note:** PHP settings are configured via `.htaccess` for 350MB uploads. If you need larger files, adjust `php.ini`:
+```
+upload_max_filesize = 350M
+post_max_size = 350M
+max_execution_time = 1200
+```
+
+## API Endpoints
+
+### POST /transcribe
+Transcribe audio file to text/VTT/SRT
+
+**Parameters:**
+- `audio` (file): Audio file to transcribe
+- `format` (string): Output format (txt/vtt/srt)
+
+**Response:**
+```json
+{
+  "success": true,
+  "text": "transcribed text...",
+  "filename": "output.txt",
+  "format": "txt"
+}
+```
+
+### GET /health
+Health check endpoint
+
+### GET /download/<filename>
+Download transcribed file
+
+## Whisper Models
+
+The default model is `base` which provides a good balance of speed and accuracy.
+
+Available models:
+- `tiny` - Fastest, least accurate
+- `base` - Good balance (default)
+- `small` - Better accuracy, slower
+- `medium` - High accuracy, much slower
+- `large` - Best accuracy, very slow
+
+To change the model, edit `api.py` line 24:
+```python
+model = whisper.load_model("base")  # Change to desired model
+```
+
+## File Structure
+
+```
+.
+├── api.py              # Python Flask API
+├── index.php           # Frontend interface
+├── process.php         # PHP request handler
+├── download.php        # File download handler
+├── config.php          # Configuration
+├── style.css           # Styles
+├── requirements.txt    # Python dependencies
+├── setup.sh           # Setup script
+├── start_api.sh       # API start script
+├── outputs/           # Transcribed files directory
+└── venv/              # Python virtual environment
+```
+
+## Production Deployment
+
+For Apache deployment:
+
+1. Ensure mod_php is enabled
+2. Point document root to this directory
+3. Run the API as a systemd service (see below)
+
+### Systemd Service (Linux)
+
+Create `/etc/systemd/system/whisper-api.service`:
+
+```ini
+[Unit]
+Description=Whisper API Service
+After=network.target
+
+[Service]
+Type=simple
+User=www-data
+WorkingDirectory=/path/to/your/app
+ExecStart=/path/to/your/app/venv/bin/python /path/to/your/app/api.py
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+```bash
+sudo systemctl enable whisper-api
+sudo systemctl start whisper-api
+```
+
+## Troubleshooting
+
+**API not connecting:**
+- Verify Python API is running on port 5010
+- Check `config.php` has correct API URL
+- Ensure firewall allows port 5010
+
+**Transcription fails:**
+- Verify FFmpeg is installed: `ffmpeg -version`
+- Check audio file format is supported
+- Review API logs for errors
+
+**Out of memory:**
+- Use a smaller Whisper model (tiny or base)
+- Reduce audio file size
+- Increase system memory
+
+## License
+
+MIT
--- a/V2T.svg
+++ b/V2T.svg
@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Generator: Adobe Illustrator 28.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
+<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
+	 viewBox="0 0 400 250" style="enable-background:new 0 0 400 250;" xml:space="preserve">
+<style type="text/css">
+	.st0{font-family:'Montserrat-Light';}
+	.st1{font-size:14px;}
+	.st2{letter-spacing:23;}
+	.st3{letter-spacing:22;}
+	.st4{letter-spacing:19;}
+	.st5{letter-spacing:18;}
+	.st6{letter-spacing:28;}
+	.st7{letter-spacing:21;}
+	.st8{fill:none;stroke:#000000;stroke-width:2;stroke-miterlimit:10;}
+</style>
+<text transform="matrix(1 0 0 1 250.2509 47.3586)" class="st0 st1">T</text>
+<text transform="matrix(1 0 0 1 258.1509 47.3586)" class="st0 st1 st2"> </text>
+<text transform="matrix(1 0 0 1 286.2509 47.3586)" class="st0 st1">E</text>
+<text transform="matrix(1 0 0 1 295.6509 47.3586)" class="st0 st1 st3"> </text>
+<text transform="matrix(1 0 0 1 322.2509 47.3586)" class="st0 st1">X</text>
+<text transform="matrix(1 0 0 1 331.2509 47.3586)" class="st0 st1 st3"> </text>
+<text transform="matrix(1 0 0 1 358.2509 47.3586)" class="st0 st1">T</text>
+<text transform="matrix(1 0 0 1 366.1509 47.3586)" class="st0 st1 st2"> </text>
+<text transform="matrix(1 0 0 1 213.7628 47.3586)" class="st0 st1"> 2</text>
+<text transform="matrix(1 0 0 1 16.5853 47.2885)" class="st0 st1 st4"> </text>
+<text transform="matrix(1 0 0 1 41.0853 47.2885)" class="st0 st1"> V</text>
+<text transform="matrix(1 0 0 1 54.2853 47.2885)" class="st0 st1 st5"> </text>
+<text transform="matrix(1 0 0 1 77.0853 47.2885)" class="st0 st1">O</text>
+<text transform="matrix(1 0 0 1 88.7853 47.2885)" class="st0 st1 st4"> </text>
+<text transform="matrix(1 0 0 1 113.0852 47.2885)" class="st0 st1">I</text>
+<text transform="matrix(1 0 0 1 117.1853 47.2885)" class="st0 st1 st6"> </text>
+<text transform="matrix(1 0 0 1 149.0852 47.2885)" class="st0 st1">C</text>
+<text transform="matrix(1 0 0 1 159.0852 47.2885)" class="st0 st1 st7"> </text>
+<text transform="matrix(1 0 0 1 185.0852 47.2885)" class="st0 st1">E</text>
+<path class="st8" d="M193.5,88.1h72.9c8.1,0,15.3,3.2,20.6,8.5s8.5,12.6,8.5,20.6v59.7c0,8.1-3.2,15.3-8.5,20.6
+	c-5.3,5.3-12.6,8.5-20.6,8.5h-72.9c-8.1,0-15.3-3.2-20.6-8.5c-5.3-5.3-8.5-12.6-8.5-20.6v-5.6c0-1.1-0.2-2-0.6-2.9s-1-1.7-1.9-2.3
+	l-7.7-6c-2-1.7-3.7-3.6-4.7-5.8c-1.1-2.2-1.7-4.7-1.7-7.3s0.6-5.1,1.7-7.3c1.1-2.2,2.7-4.2,4.7-5.8l7.7-6c0.8-0.6,1.5-1.4,1.9-2.3
+	c0.4-0.8,0.6-1.9,0.6-2.9v-5.6c0-8.1,3.2-15.3,8.5-20.6C178.2,91.5,185.5,88.1,193.5,88.1L193.5,88.1z M232.3,178.3
+	c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h39.3c3.5,0,6.4,2.9,6.4,6.4c0,3.5-2.9,6.4-6.4,6.4H232.3z M196.8,153.4
+	c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h74.8c3.5,0,6.4,2.9,6.4,6.4c0,3.5-2.9,6.4-6.4,6.4H196.8z M127.8,144.4
+	c1.9,2,1.8,5.2-0.2,7.1c-2,1.9-5.2,1.8-7.1-0.2c-3.7-3.9-5.7-8.9-6.1-14c-0.4-5,0.9-10.2,4.1-14.6s7.6-7.4,12.5-8.8
+	c4.9-1.4,10.4-1.1,15.2,1c2.5,1.1,3.7,4.1,2.6,6.6c-1.1,2.5-4.1,3.7-6.6,2.6c-2.8-1.2-5.7-1.4-8.5-0.6c-2.7,0.7-5.2,2.4-6.9,4.9
+	c-1.8,2.5-2.5,5.4-2.3,8.2C124.5,139.5,125.7,142.3,127.8,144.4L127.8,144.4z M96.8,146.8c1.6,2.3,0.9,5.5-1.4,6.9
+	c-2.3,1.6-5.5,0.9-6.9-1.4c-5-7.6-7.6-16.4-7.4-25.4c0.1-8.7,2.8-17.4,8.1-25.1c5.4-7.6,12.8-13.1,20.9-16.1
+	c8.4-3.1,17.6-3.7,26.4-1.5c2.7,0.6,4.4,3.4,3.7,6.1s-3.4,4.4-6.1,3.7c-6.9-1.7-14-1.3-20.5,1.1c-6.3,2.3-12,6.6-16.2,12.5
+	c-4.2,5.9-6.3,12.7-6.4,19.4C90.9,134.2,92.9,141,96.8,146.8L96.8,146.8z"/>
+<path class="st8" d="M196.8,128.8c-3.5,0-6.4-2.9-6.4-6.4c0-3.5,2.9-6.4,6.4-6.4h74.8c3.5,0,6.4,2.9,6.4,6.4s-2.9,6.4-6.4,6.4H196.8
+	z"/>
+</svg>
--- a/api.py
+++ b/api.py
@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+"""
+Voice to Text API using OpenAI Whisper with DeepL Translation
+Transcribes audio files to text, VTT, or SRT format and optionally translates them
+"""
+
+from flask import Flask, request, jsonify, send_file
+from flask_cors import CORS
+import whisper
+import deepl
+import os
+import tempfile
+from datetime import timedelta
+import logging
+
+app = Flask(__name__)
+CORS(app)
+
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Load Whisper model (using base model for balance of speed and accuracy)
+# Options: tiny, base, small, medium, large
+logger.info("Loading Whisper model...")
+model = whisper.load_model("base")
+logger.info("Whisper model loaded successfully")
+
+# Initialize DeepL translator
+DEEPL_API_KEY = "28743b40-d23f-416d-8223-9b868c9531dc"
+translator = deepl.Translator(DEEPL_API_KEY)
+logger.info("DeepL translator initialized")
+
+# Directory for output files
+OUTPUT_DIR = os.path.join(os.path.dirname(__file__), 'outputs')
+os.makedirs(OUTPUT_DIR, exist_ok=True)
+
+
+def format_timestamp(seconds):
+    """Convert seconds to SRT timestamp format (HH:MM:SS,mmm)"""
+    td = timedelta(seconds=seconds)
+    hours = td.seconds // 3600
+    minutes = (td.seconds % 3600) // 60
+    secs = td.seconds % 60
+    millis = td.microseconds // 1000
+    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
+
+
+def format_timestamp_vtt(seconds):
+    """Convert seconds to VTT timestamp format (HH:MM:SS.mmm)"""
+    td = timedelta(seconds=seconds)
+    hours = td.seconds // 3600
+    minutes = (td.seconds % 3600) // 60
+    secs = td.seconds % 60
+    millis = td.microseconds // 1000
+    return f"{hours:02d}:{minutes:02d}:{secs:02d}.{millis:03d}"
+
+
+def generate_srt(segments):
+    """Generate SRT format from Whisper segments"""
+    srt_content = []
+    for i, segment in enumerate(segments, 1):
+        start = format_timestamp(segment['start'])
+        end = format_timestamp(segment['end'])
+        text = segment['text'].strip()
+        srt_content.append(f"{i}\n{start} --> {end}\n{text}\n")
+    return "\n".join(srt_content)
+
+
+def generate_vtt(segments):
+    """Generate VTT format from Whisper segments"""
+    vtt_content = ["WEBVTT\n"]
+    for segment in segments:
+        start = format_timestamp_vtt(segment['start'])
+        end = format_timestamp_vtt(segment['end'])
+        text = segment['text'].strip()
+        vtt_content.append(f"{start} --> {end}\n{text}\n")
+    return "\n".join(vtt_content)
+
+
+def translate_text(text, target_lang):
+    """Translate text using DeepL API"""
+    try:
+        logger.info(f"Translating text to {target_lang}...")
+        result = translator.translate_text(text, target_lang=target_lang)
+        return result.text
+    except deepl.exceptions.DeepLException as e:
+        logger.error(f"DeepL translation error: {str(e)}")
+        raise Exception(f"Translation failed: {str(e)}")
+
+
+@app.route('/health', methods=['GET'])
+def health_check():
+    """Health check endpoint"""
+    return jsonify({"status": "healthy", "model": "whisper-base"})
+
+
+@app.route('/transcribe', methods=['POST'])
+def transcribe():
+    """
+    Transcribe audio file to text, VTT, or SRT format with optional translation
+    Expects: multipart/form-data with 'audio' file, 'format' (txt/vtt/srt),
+             'translate' (0/1), and 'target_lang' (e.g., 'EN-US')
+    """
+    try:
+        # Check if audio file is present
+        if 'audio' not in request.files:
+            return jsonify({"error": "No audio file provided"}), 400
+
+        audio_file = request.files['audio']
+        output_format = request.form.get('format', 'txt').lower()
+        enable_translation = request.form.get('translate', '0') == '1'
+        target_lang = request.form.get('target_lang', 'EN-US')
+
+        if audio_file.filename == '':
+            return jsonify({"error": "Empty filename"}), 400
+
+        # Validate format
+        if output_format not in ['txt', 'vtt', 'srt']:
+            return jsonify({"error": "Invalid format. Use txt, vtt, or srt"}), 400
+
+        logger.info(f"Processing {audio_file.filename} - Format: {output_format}, Translation: {enable_translation}, Target: {target_lang}")
+
+        # Save uploaded file temporarily
+        with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(audio_file.filename)[1]) as temp_audio:
+            audio_file.save(temp_audio.name)
+            temp_audio_path = temp_audio.name
+
+        try:
+            # Transcribe with Whisper
+            logger.info(f"Transcribing {audio_file.filename}...")
+            result = model.transcribe(temp_audio_path, verbose=False)
+            logger.info("Transcription complete")
+
+            # Generate output based on format
+            if output_format == 'txt':
+                content = result['text']
+                mimetype = 'text/plain'
+                extension = 'txt'
+            elif output_format == 'vtt':
+                content = generate_vtt(result['segments'])
+                mimetype = 'text/vtt'
+                extension = 'vtt'
+            elif output_format == 'srt':
+                content = generate_srt(result['segments'])
+                mimetype = 'text/plain'
+                extension = 'srt'
+
+            # Save original output file
+            base_filename = os.path.splitext(audio_file.filename)[0]
+            output_filename = f"{base_filename}_original.{extension}"
+            output_path = os.path.join(OUTPUT_DIR, output_filename)
+
+            with open(output_path, 'w', encoding='utf-8') as f:
+                f.write(content)
+
+            response_data = {
+                "success": True,
+                "text": result['text'] if output_format == 'txt' else None,
+                "filename": output_filename,
+                "format": output_format
+            }
+
+            # Handle translation if requested
+            if enable_translation:
+                logger.info(f"Translating to {target_lang}...")
+                translated_content = translate_text(content, target_lang)
+
+                # Save translated output file
+                translated_filename = f"{base_filename}_translated.{extension}"
+                translated_path = os.path.join(OUTPUT_DIR, translated_filename)
+
+                with open(translated_path, 'w', encoding='utf-8') as f:
+                    f.write(translated_content)
+
+                response_data["translated_filename"] = translated_filename
+                response_data["translated_text"] = translated_content if output_format == 'txt' else None
+                logger.info("Translation complete")
+
+            return jsonify(response_data)
+
+        finally:
+            # Clean up temporary audio file
+            if os.path.exists(temp_audio_path):
+                os.remove(temp_audio_path)
+
+    except Exception as e:
+        logger.error(f"Error during transcription: {str(e)}")
+        return jsonify({"error": f"Transcription failed: {str(e)}"}), 500
+
+
+@app.route('/download/<filename>', methods=['GET'])
+def download_file(filename):
+    """Download a transcribed file"""
+    try:
+        file_path = os.path.join(OUTPUT_DIR, filename)
+        if not os.path.exists(file_path):
+            return jsonify({"error": "File not found"}), 404
+
+        return send_file(file_path, as_attachment=True)
+    except Exception as e:
+        logger.error(f"Error downloading file: {str(e)}")
+        return jsonify({"error": str(e)}), 500
+
+
+if __name__ == '__main__':
+    # Run on port 5010 by default
+    port = int(os.environ.get('PORT', 5010))
+    app.run(host='0.0.0.0', port=port, debug=False)
--- a/config.php
+++ b/config.php
@ -0,0 +1,13 @@
+<?php
+// Start session only if not already started
+if (session_status() === PHP_SESSION_NONE) {
+    session_start();
+}
+
+// Python API endpoint (adjust port if needed)
+define('PYTHON_API_URL', 'http://localhost:5010');
+
+// DeepL API Key
+define('DEEPL_API_KEY', '28743b40-d23f-416d-8223-9b868c9531dc');
+
+// Other configuration settings can be added here
--- a/download.php
+++ b/download.php
@ -0,0 +1,69 @@
+<?php
+/**
+ * Download handler for transcribed files
+ */
+
+// Prevent any output before headers
+ob_start();
+
+// Enable error reporting for debugging
+error_reporting(E_ALL);
+ini_set('display_errors', 0); // Don't display errors, log them
+
+if (!isset($_GET['file'])) {
+    http_response_code(400);
+    die('No file specified');
+}
+
+$filename = basename($_GET['file']); // Security: prevent directory traversal
+$filepath = __DIR__ . '/outputs/' . $filename;
+
+// Debug logging
+error_log("Download request for: " . $filename);
+error_log("Full path: " . $filepath);
+error_log("File exists: " . (file_exists($filepath) ? 'yes' : 'no'));
+
+if (!file_exists($filepath)) {
+    http_response_code(404);
+    error_log("File not found: " . $filepath);
+    die('File not found: ' . $filename);
+}
+
+// Check if file is readable
+if (!is_readable($filepath)) {
+    http_response_code(403);
+    error_log("File not readable: " . $filepath);
+    die('File not readable');
+}
+
+// Determine content type based on extension
+$extension = strtolower(pathinfo($filename, PATHINFO_EXTENSION));
+$contentTypes = [
+    'txt' => 'text/plain; charset=utf-8',
+    'vtt' => 'text/vtt; charset=utf-8',
+    'srt' => 'text/plain; charset=utf-8'  // Changed to text/plain for better compatibility
+];
+
+$contentType = $contentTypes[$extension] ?? 'application/octet-stream';
+
+// Clear all output buffers
+while (ob_get_level()) {
+    ob_end_clean();
+}
+
+// Prevent any caching
+header('Content-Description: File Transfer');
+header('Content-Type: ' . $contentType);
+header('Content-Disposition: attachment; filename="' . basename($filename) . '"');
+header('Content-Transfer-Encoding: binary');
+header('Content-Length: ' . filesize($filepath));
+header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
+header('Pragma: public');
+header('Expires: 0');
+
+// Flush system output buffer
+flush();
+
+// Output file
+readfile($filepath);
+exit;
--- a/index.php
+++ b/index.php
@ -0,0 +1,160 @@
+<?php
+require_once 'config.php';
+?>
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Voice to Text</title>
+    <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@400;700&display=swap" rel="stylesheet">
+    <link rel="stylesheet" href="style.css">
+    <script src="https://cdn.jsdelivr.net/npm/dompurify@2.3.3/dist/purify.min.js"></script>
+</head>
+<body>
+    <div class="app-container">
+        <img src="V2T.svg" alt="Voice to Text" class="logo">
+
+        <div id="initialInstruction" class="initial-instruction">
+            Before we start, select output format and upload the Voice File (Max 350 Megabytes in size)
+        </div>
+
+        <div class="format-selection">
+            <label for="outputFormat">Output Format:</label>
+            <select id="outputFormat" name="outputFormat">
+                <option value="txt">Text Document</option>
+                <option value="vtt">VTT (WebVTT)</option>
+                <option value="srt">SRT (SubRip)</option>
+            </select>
+        </div>
+
+        <div class="translation-section">
+            <div class="translation-toggle">
+                <label class="toggle-label">
+                    <input type="checkbox" id="enableTranslation" name="enableTranslation">
+                    <span class="toggle-text">Translate with DeepL</span>
+                </label>
+            </div>
+
+            <div id="languageSelector" class="language-selector" style="display: none;">
+                <label for="targetLanguage">Translate to:</label>
+                <select id="targetLanguage" name="targetLanguage">
+                    <option value="BG">Bulgarian</option>
+                    <option value="CS">Czech</option>
+                    <option value="DA">Danish</option>
+                    <option value="DE">German</option>
+                    <option value="EL">Greek</option>
+                    <option value="EN-GB">English (British)</option>
+                    <option value="EN-US" selected>English (American)</option>
+                    <option value="ES">Spanish</option>
+                    <option value="ET">Estonian</option>
+                    <option value="FI">Finnish</option>
+                    <option value="FR">French</option>
+                    <option value="HU">Hungarian</option>
+                    <option value="ID">Indonesian</option>
+                    <option value="IT">Italian</option>
+                    <option value="JA">Japanese</option>
+                    <option value="KO">Korean</option>
+                    <option value="LT">Lithuanian</option>
+                    <option value="LV">Latvian</option>
+                    <option value="NB">Norwegian (Bokmål)</option>
+                    <option value="NL">Dutch</option>
+                    <option value="PL">Polish</option>
+                    <option value="PT-BR">Portuguese (Brazilian)</option>
+                    <option value="PT-PT">Portuguese (European)</option>
+                    <option value="RO">Romanian</option>
+                    <option value="RU">Russian</option>
+                    <option value="SK">Slovak</option>
+                    <option value="SL">Slovenian</option>
+                    <option value="SV">Swedish</option>
+                    <option value="TR">Turkish</option>
+                    <option value="UK">Ukrainian</option>
+                    <option value="ZH">Chinese (simplified)</option>
+                </select>
+            </div>
+        </div>
+
+        <div class="file-upload-container">
+            <label for="fileUpload" class="file-upload-label">Upload Voice File</label>
+            <input type="file" id="fileUpload" name="voiceFile" hidden>
+        </div>
+
+        <div id="chatArea" class="chat-area"></div>
+
+        <button id="downloadButton" style="display: none;">Download Response</button>
+    </div>
+    
+    <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
+    <script>
+        $(document).ready(function() {
+            // Toggle language selector when translation is enabled/disabled
+            $('#enableTranslation').on('change', function() {
+                if ($(this).is(':checked')) {
+                    $('#languageSelector').slideDown(300);
+                } else {
+                    $('#languageSelector').slideUp(300);
+                }
+            });
+
+            $('#fileUpload').on('change', function() {
+                var file = this.files[0];
+                if (file) {
+                    var formData = new FormData();
+                    formData.append('voiceFile', file);
+                    formData.append('outputFormat', $('#outputFormat').val());
+                    formData.append('enableTranslation', $('#enableTranslation').is(':checked') ? '1' : '0');
+                    formData.append('targetLanguage', $('#targetLanguage').val());
+
+                    $('#chatArea').html('<div class="processing-container"><div class="processing-text">Processing audio file...</div><div class="progress-bar"><div class="progress-bar-fill"></div></div></div>');
+
+                    $.ajax({
+                        url: 'process.php',
+                        type: 'POST',
+                        data: formData,
+                        processData: false,
+                        contentType: false,
+                        success: function(response) {
+                            var data = JSON.parse(response);
+                            if (data.success) {
+                                if (data.fileUrl) {
+                                    var message = '<div class="message bot-message">Transcription complete!<br>';
+                                    message += '<a href="' + data.fileUrl + '" download>Download Original ' + data.format.toUpperCase() + ' file</a>';
+
+                                    if (data.translatedFileUrl) {
+                                        message += '<br><a href="' + data.translatedFileUrl + '" download>Download Translated ' + data.format.toUpperCase() + ' file</a>';
+                                    }
+
+                                    message += '</div>';
+                                    $('#chatArea').html(message);
+                                } else {
+                                    $('#chatArea').html('<div class="message bot-message">' + data.response + '</div>');
+                                    $('#downloadButton').show();
+                                }
+                            } else {
+                                $('#chatArea').html('<div class="message error-message">' + data.error + '</div>');
+                            }
+                        },
+                        error: function() {
+                            $('#chatArea').html('<div class="message error-message">An error occurred while processing the file.</div>');
+                        }
+                    });
+                }
+            });
+
+            $('#downloadButton').on('click', function() {
+                var responseText = $('.bot-message').text();
+                var blob = new Blob([responseText], { type: 'text/plain' });
+                var url = URL.createObjectURL(blob);
+                
+                var a = document.createElement('a');
+                a.href = url;
+                a.download = 'voice_to_text_response.txt';
+                document.body.appendChild(a);
+                a.click();
+                document.body.removeChild(a);
+                URL.revokeObjectURL(url);
+            });
+        });
+    </script>
+</body>
+</html>
--- a/process.php
+++ b/process.php
@ -0,0 +1,86 @@
+<?php
+session_start();
+require_once 'config.php';
+
+if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_FILES['voiceFile'])) {
+    $file = $_FILES['voiceFile'];
+    $outputFormat = isset($_POST['outputFormat']) ? $_POST['outputFormat'] : 'txt';
+    $enableTranslation = isset($_POST['enableTranslation']) ? $_POST['enableTranslation'] : '0';
+    $targetLanguage = isset($_POST['targetLanguage']) ? $_POST['targetLanguage'] : 'EN-US';
+
+    // Check file size
+    if ($file['size'] > 350 * 1024 * 1024) { // 350 MB limit
+        echo json_encode(['success' => false, 'error' => "File is too large. Maximum size is 350 MB."]);
+        exit;
+    }
+
+    // Prepare the file for sending to Python API
+    $formData = [
+        'audio' => new CURLFile($file['tmp_name'], $file['type'], $file['name']),
+        'format' => $outputFormat,
+        'translate' => $enableTranslation,
+        'target_lang' => $targetLanguage
+    ];
+
+    // Send request to Python API
+    $ch = curl_init(PYTHON_API_URL . '/transcribe');
+    curl_setopt($ch, CURLOPT_POST, 1);
+    curl_setopt($ch, CURLOPT_POSTFIELDS, $formData);
+    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
+    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // 5 minutes timeout for large files
+
+    $response = curl_exec($ch);
+
+    if (curl_errno($ch)) {
+        echo json_encode(['success' => false, 'error' => "Error processing file: " . curl_error($ch)]);
+    } else {
+        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
+
+        if ($httpCode === 200) {
+            $data = json_decode($response, true);
+
+            if (isset($data['success']) && $data['success']) {
+                // For text format, return the text directly
+                if ($outputFormat === 'txt' && isset($data['text'])) {
+                    $response = [
+                        'success' => true,
+                        'response' => nl2br(htmlspecialchars($data['text'])),
+                        'format' => $outputFormat
+                    ];
+
+                    // Add translated text if available
+                    if (isset($data['translated_text'])) {
+                        $response['translatedResponse'] = nl2br(htmlspecialchars($data['translated_text']));
+                    }
+
+                    echo json_encode($response);
+                } else {
+                    // For VTT/SRT, provide download links
+                    $downloadUrl = 'download.php?file=' . urlencode($data['filename']);
+                    $response = [
+                        'success' => true,
+                        'fileUrl' => $downloadUrl,
+                        'filename' => $data['filename'],
+                        'format' => $outputFormat
+                    ];
+
+                    // Add translated file download link if available
+                    if (isset($data['translated_filename'])) {
+                        $response['translatedFileUrl'] = 'download.php?file=' . urlencode($data['translated_filename']);
+                        $response['translatedFilename'] = $data['translated_filename'];
+                    }
+
+                    echo json_encode($response);
+                }
+            } else {
+                echo json_encode(['success' => false, 'error' => $data['error'] ?? "Unknown error occurred"]);
+            }
+        } else {
+            echo json_encode(['success' => false, 'error' => "Server error: HTTP $httpCode"]);
+        }
+    }
+
+    curl_close($ch);
+} else {
+    echo json_encode(['success' => false, 'error' => "Invalid request."]);
+}
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,6 @@
+flask>=3.0.0
+flask-cors>=4.0.0
+openai-whisper
+numpy<2.0.0
+ffmpeg-python
+deepl
--- a/setup.sh
+++ b/setup.sh
@ -0,0 +1,84 @@
+#!/bin/bash
+# Setup script for Voice to Text Whisper API
+
+echo "==================================="
+echo "Voice to Text - Setup Script"
+echo "==================================="
+echo ""
+
+# Check if Python 3 is installed
+if ! command -v python3 &> /dev/null; then
+    echo "Error: Python 3 is not installed. Please install Python 3.8 or higher."
+    exit 1
+fi
+
+PYTHON_VERSION=$(python3 -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
+echo "Python 3 found: Python $PYTHON_VERSION"
+
+# Check if Python version is too new (3.12+)
+PYTHON_MAJOR=$(python3 -c 'import sys; print(sys.version_info[0])')
+PYTHON_MINOR=$(python3 -c 'import sys; print(sys.version_info[1])')
+
+if [ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -ge 12 ]; then
+    echo ""
+    echo "WARNING: Python 3.12+ detected. Some packages may have compatibility issues."
+    echo "Recommended: Use Python 3.10 or 3.11 for best compatibility."
+    echo ""
+    read -p "Continue anyway? (y/n) " -n 1 -r
+    echo
+    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
+        exit 1
+    fi
+fi
+
+echo ""
+
+# Remove old venv if exists
+if [ -d "venv" ]; then
+    echo "Removing old virtual environment..."
+    rm -rf venv
+fi
+
+# Determine which Python to use
+if command -v python3.11 &> /dev/null; then
+    PYTHON_CMD=python3.11
+    echo "Using Python 3.11"
+elif command -v python3.10 &> /dev/null; then
+    PYTHON_CMD=python3.10
+    echo "Using Python 3.10"
+else
+    PYTHON_CMD=python3
+    echo "Using default Python 3"
+fi
+
+# Create virtual environment
+echo "Creating virtual environment..."
+$PYTHON_CMD -m venv venv
+
+# Activate virtual environment
+echo "Activating virtual environment..."
+source venv/bin/activate
+
+# Upgrade pip
+echo "Upgrading pip..."
+pip install --upgrade pip
+
+# Install dependencies
+echo "Installing dependencies (this may take a few minutes)..."
+pip install -r requirements.txt
+
+# Create outputs directory
+echo "Creating outputs directory..."
+mkdir -p outputs
+
+echo ""
+echo "==================================="
+echo "Setup complete!"
+echo "==================================="
+echo ""
+echo "To start the API server:"
+echo "  1. Activate the virtual environment: source venv/bin/activate"
+echo "  2. Run the API: python api.py"
+echo ""
+echo "The API will run on http://localhost:5010"
+echo ""
--- a/start_api.sh
+++ b/start_api.sh
@ -0,0 +1,9 @@
+#!/bin/bash
+# Start the Whisper API server
+
+# Activate virtual environment
+source venv/bin/activate
+
+# Start the API
+echo "Starting Whisper API server on http://localhost:5010..."
+python api.py
--- a/style.css
+++ b/style.css
@ -0,0 +1,386 @@
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+
+body {
+    font-family: 'Montserrat', sans-serif;
+    background: #000000;
+    min-height: 100vh;
+    padding: 20px;
+    display: flex;
+    justify-content: center;
+    align-items: center;
+}
+
+input, button, textarea, select, label {
+    font-family: 'Montserrat', sans-serif;
+}
+
+.app-container {
+    background: #1a1a1a;
+    border-radius: 20px;
+    box-shadow: 0 20px 60px rgba(255, 196, 7, 0.2);
+    border: 1px solid #333;
+    padding: 40px;
+    max-width: 800px;
+    width: 100%;
+    animation: fadeIn 0.5s ease-in;
+}
+
+@keyframes fadeIn {
+    from {
+        opacity: 0;
+        transform: translateY(20px);
+    }
+    to {
+        opacity: 1;
+        transform: translateY(0);
+    }
+}
+
+.logo {
+    width: 400px;
+    height: auto;
+    display: block;
+    margin: 0 auto 30px;
+    filter: invert(1) brightness(2);
+}
+
+.initial-instruction {
+    text-align: center;
+    font-size: 16px;
+    color: #999;
+    margin-bottom: 30px;
+    font-weight: 400;
+    line-height: 1.6;
+}
+
+.format-selection {
+    background: #0a0a0a;
+    padding: 20px;
+    border-radius: 12px;
+    margin-bottom: 25px;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    gap: 15px;
+    border: 1px solid #333;
+}
+
+.format-selection label {
+    font-weight: 600;
+    color: #FFC407;
+    font-size: 15px;
+}
+
+.format-selection select {
+    padding: 10px 20px;
+    border: 2px solid #333;
+    border-radius: 8px;
+    font-size: 15px;
+    font-weight: 500;
+    color: #FFC407;
+    background: #000;
+    cursor: pointer;
+    transition: all 0.3s ease;
+    min-width: 200px;
+}
+
+.format-selection select:hover {
+    border-color: #FFC407;
+}
+
+.format-selection select:focus {
+    outline: none;
+    border-color: #FFC407;
+    box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.2);
+}
+
+.translation-section {
+    background: #0a0a0a;
+    padding: 20px;
+    border-radius: 12px;
+    margin-bottom: 25px;
+    border: 1px solid #333;
+}
+
+.translation-toggle {
+    margin-bottom: 15px;
+}
+
+.toggle-label {
+    display: flex;
+    align-items: center;
+    cursor: pointer;
+    user-select: none;
+}
+
+.toggle-label input[type="checkbox"] {
+    width: 20px;
+    height: 20px;
+    cursor: pointer;
+    margin-right: 10px;
+    accent-color: #FFC407;
+}
+
+.toggle-text {
+    color: #FFC407;
+    font-weight: 600;
+    font-size: 15px;
+}
+
+.language-selector {
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    gap: 15px;
+    padding-top: 15px;
+    border-top: 1px solid #333;
+}
+
+.language-selector label {
+    font-weight: 600;
+    color: #999;
+    font-size: 14px;
+}
+
+.language-selector select {
+    padding: 10px 20px;
+    border: 2px solid #333;
+    border-radius: 8px;
+    font-size: 14px;
+    font-weight: 500;
+    color: #FFC407;
+    background: #000;
+    cursor: pointer;
+    transition: all 0.3s ease;
+    min-width: 220px;
+}
+
+.language-selector select:hover {
+    border-color: #FFC407;
+}
+
+.language-selector select:focus {
+    outline: none;
+    border-color: #FFC407;
+    box-shadow: 0 0 0 3px rgba(255, 196, 7, 0.2);
+}
+
+.file-upload-container {
+    text-align: center;
+    margin-bottom: 25px;
+}
+
+.file-upload-label {
+    display: inline-block;
+    padding: 15px 40px;
+    cursor: pointer;
+    background: #FFC407;
+    color: #000;
+    border-radius: 50px;
+    font-weight: 700;
+    font-size: 16px;
+    transition: all 0.3s ease;
+    box-shadow: 0 4px 15px rgba(255, 196, 7, 0.4);
+}
+
+.file-upload-label:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 6px 20px rgba(255, 196, 7, 0.6);
+    background: #ffcd2e;
+}
+
+.file-upload-label:active {
+    transform: translateY(0);
+}
+
+.file-upload-label.disabled {
+    background: #333;
+    color: #666;
+    cursor: not-allowed;
+    box-shadow: none;
+}
+
+.chat-area {
+    min-height: 200px;
+    max-height: 400px;
+    border: 2px solid #333;
+    border-radius: 12px;
+    overflow-y: auto;
+    padding: 20px;
+    background: #0a0a0a;
+    margin-bottom: 20px;
+}
+
+.chat-area:empty {
+    display: none;
+}
+
+.message {
+    padding: 15px 20px;
+    margin-bottom: 15px;
+    border-radius: 12px;
+    line-height: 1.6;
+    animation: slideIn 0.3s ease;
+}
+
+@keyframes slideIn {
+    from {
+        opacity: 0;
+        transform: translateX(-20px);
+    }
+    to {
+        opacity: 1;
+        transform: translateX(0);
+    }
+}
+
+.bot-message {
+    background: rgba(255, 196, 7, 0.1);
+    border-left: 4px solid #FFC407;
+    color: #fff;
+}
+
+.bot-message a {
+    color: #FFC407;
+    font-weight: 600;
+    text-decoration: none;
+    border-bottom: 2px solid #FFC407;
+    transition: all 0.2s ease;
+}
+
+.bot-message a:hover {
+    color: #ffcd2e;
+    border-bottom-color: #ffcd2e;
+}
+
+.error-message {
+    background: rgba(255, 0, 0, 0.1);
+    border-left: 4px solid #ff3333;
+    color: #ff6666;
+}
+
+.processing-container {
+    padding: 30px;
+    text-align: center;
+}
+
+.processing-text {
+    color: #FFC407;
+    font-weight: 600;
+    font-size: 18px;
+    margin-bottom: 20px;
+    animation: pulseAnimation 2s infinite;
+}
+
+@keyframes pulseAnimation {
+    0%, 100% { opacity: 1; }
+    50% { opacity: 0.6; }
+}
+
+.progress-bar {
+    width: 100%;
+    height: 8px;
+    background: #333;
+    border-radius: 10px;
+    overflow: hidden;
+    position: relative;
+}
+
+.progress-bar-fill {
+    height: 100%;
+    background: linear-gradient(90deg, #FFC407, #ffcd2e, #FFC407);
+    background-size: 200% 100%;
+    border-radius: 10px;
+    animation: progressAnimation 1.5s ease-in-out infinite;
+    box-shadow: 0 0 10px rgba(255, 196, 7, 0.5);
+}
+
+@keyframes progressAnimation {
+    0% {
+        width: 0%;
+        background-position: 0% 0%;
+    }
+    50% {
+        width: 70%;
+        background-position: 100% 0%;
+    }
+    100% {
+        width: 100%;
+        background-position: 200% 0%;
+    }
+}
+
+button {
+    padding: 12px 30px;
+    cursor: pointer;
+    background: #FFC407;
+    color: #000;
+    border: none;
+    border-radius: 50px;
+    font-weight: 700;
+    font-size: 15px;
+    transition: all 0.3s ease;
+    box-shadow: 0 4px 15px rgba(255, 196, 7, 0.4);
+}
+
+button:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 6px 20px rgba(255, 196, 7, 0.6);
+    background: #ffcd2e;
+}
+
+button:active {
+    transform: translateY(0);
+}
+
+#downloadButton {
+    display: none;
+    margin: 0 auto;
+}
+
+/* Responsive design */
+@media screen and (max-width: 768px) {
+    .app-container {
+        padding: 25px;
+    }
+
+    .logo {
+        width: 300px;
+    }
+
+    .initial-instruction {
+        font-size: 14px;
+    }
+
+    .format-selection {
+        flex-direction: column;
+        gap: 10px;
+    }
+
+    .format-selection select {
+        width: 100%;
+    }
+
+    .file-upload-label {
+        padding: 12px 30px;
+        font-size: 14px;
+    }
+}
+
+@media screen and (max-width: 480px) {
+    body {
+        padding: 10px;
+    }
+
+    .app-container {
+        padding: 20px;
+    }
+
+    .logo {
+        width: 250px;
+    }
+}
--- a/test_download.php
+++ b/test_download.php
@ -0,0 +1,41 @@
+<?php
+/**
+ * Test download functionality
+ */
+
+// List all files in outputs directory
+$outputDir = __DIR__ . '/outputs/';
+echo "<h2>Files in outputs directory:</h2>";
+echo "<ul>";
+
+if (is_dir($outputDir)) {
+    $files = scandir($outputDir);
+    foreach ($files as $file) {
+        if ($file !== '.' && $file !== '..' && $file !== '.DS_Store') {
+            $filepath = $outputDir . $file;
+            $size = filesize($filepath);
+            $readable = is_readable($filepath) ? 'Yes' : 'No';
+            $extension = pathinfo($file, PATHINFO_EXTENSION);
+
+            echo "<li>";
+            echo "<strong>$file</strong><br>";
+            echo "Size: " . number_format($size) . " bytes<br>";
+            echo "Readable: $readable<br>";
+            echo "Extension: $extension<br>";
+            echo "<a href='download.php?file=" . urlencode($file) . "' target='_blank'>Test Download</a>";
+            echo "</li><br>";
+        }
+    }
+} else {
+    echo "<li>Directory not found</li>";
+}
+
+echo "</ul>";
+
+// Test file operations
+echo "<h2>Directory permissions:</h2>";
+echo "Directory: $outputDir<br>";
+echo "Exists: " . (is_dir($outputDir) ? 'Yes' : 'No') . "<br>";
+echo "Readable: " . (is_readable($outputDir) ? 'Yes' : 'No') . "<br>";
+echo "Writable: " . (is_writable($outputDir) ? 'Yes' : 'No') . "<br>";
+?>