forge/README.md

# FORGE AI: Unified Generative AI Platform

**FORGE AI** is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals.

---

## 🌟 Executive Summary

Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), **FORGE AI** brings them all together. It allows for complex workflows like generating an image with **DALL-E 3**, upscaling it with **Topaz**, extending it into a video with **Google Veo**, and adding a voiceover via **ElevenLabs**—all within one interface.

---

## <20> Comprehensive Feature Matrix

### 1. 🎬 Video Generation (Multi-Provider)
The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically.

| Provider | Model | Capabilities | Optimal Use Case |
| :--- | :--- | :--- | :--- |
| **Runway** | **Gen-4 Turbo** | Image-to-Video | **High-Fidelity Animation**. Best for animating static marketing assets. Features **Smart Cropping** (auto-resize to 1280x768). |
| **Runway** | **Veo 3 / 3.1** | Text/Image-to-Video | **Versatile Generation**. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage. |
| **Google** | **Veo Native** | Text-to-Video | **Enterprise Scale**. Direct Vertex AI integration for scalable generation. |

> **Key Feature**: **Smart Aspect Ratio Handling**. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors.

### 2. 🖼️ Image Generation (The "Omni-Model" Engine)
Access the latest models without switching tools.

*   **OpenAI**:
    *   **GPT-Image-1**: The latest efficient model. Supports transparent backgrounds and variable quality.
    *   **DALL-E 3**: HD quality, vivid/natural styles.
*   **Google Imagen**:
    *   **Imagen 4.0 (Standard/Ultra/Fast)**: Supports "Prompt Enhance" and "Person Generation" safety filters.
*   **Stability AI**:
    *   **SD3.5 / SDXL**: Advanced control with **Negative Prompts** and **Image-to-Image** strength sliders.
*   **Nano Banana (Gemini)**:
    *   **Gemini 2.5 Flash / 3 Pro**: High speed, supports up to 4K resolution and 21:9 aspect ratios.
*   **Flux**: Black Forest Labs Flux Pro integration for photorealistic outputs.
*   **Ideogram**: Version 2 integration, excellent for typography.

### 3. <20> Image Utilities
*   **Professional Upscaling**:
    *   Integrated with **Topaz Photo AI SDK**.
    *   Capabilities: **Face Recovery**, **Denoising**, 2x/4x scaling.
*   **Background Removal**:
    *   **Clipping Magic**: High-precision removal.
    *   **Bria AI**: Fast, commercially safe removal.

### 4. 🔊 Audio Intelligence
*   **Text-to-Speech**:
    *   **ElevenLabs Integration**: Multilingual V2 model. High-quality voice synthesis.
    *   Configurable stability, similarity boost, and style.
*   **Voice-to-Text**:
    *   **OpenAI Whisper**: Industry-leading transcription accuracy.
    *   Supports multiple languages and timestamp generation.
*   **Sound Effects**:
    *   AI-generated SFX for video post-production.

### 5. 📝 Text & Utilities
*   **Subtitle Processor**:
    *   Automated subtitle generation (VTT/SRT).
    *   **"Burn-in" Capability**: Hardcodes subtitles directly onto video frames using FFmpeg.
*   **Prompt Studio**:
    *   Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions.
    *   Style presets: Cinematic, Anime, Photorealistic, etc.
*   **Markdown/Mermaid**:
    *   Renders technical diagrams (Flowcharts, Gantt) from text descriptions.

---

## 🏗️ Technical Architecture

FORGE AI is built on a **Microservices Architecture** using **Docker Compose**.

### Backend (`forge-backend`)
*   **Framework**: **FastAPI** (Python 3.11). High-performance, async-first API.
*   **Task Queue**: **Celery** with **Redis**. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing.
*   **Database ORM**: **SQLAlchemy**. Interface for PostgreSQL.
*   **Validation**: **Pydantic**. Strict usage of data contracts ensures API reliability.
*   **File Handling**: Direct handling of binary assets, streaming uploads to disk storage.

### Frontend (`forge-frontend`)
*   **Framework**: **Next.js 14** (App Router). Server-Side Rendering (SSR) for performance.
*   **UI Library**: **React** + **Tailwind CSS**.
*   **Components**: Custom design system using **ShadCN/UI** primitives (Dialogs, Selects, Toasts).
*   **State Management**: React Hooks for polling job status and updating UI progress bars.

### Data Persistence
*   **PostgreSQL 16**:
    *   `jobs`: Stores parameters, status, and API metadata for every request.
    *   `assets`: Tracks file paths, MIME types, metadata (width/height/duration).
    *   `users`: Authentication and profile data.
*   **Redis**: In-memory message broker for Celery and caching layer.
*   **Docker Volumes**:
    *   `postgres_data`: Persistent DB storage.
    *   `assets_data`: Shared volume for storing generated media files.

---

## ⚡️ Scalability & Performance

FORGE AI is architected to handle high user concurrency (e.g., 200+ simultaneous users) without degrading API performance.

*   **Asynchronous Job Queue**: All heavy compute tasks (Video Generation, Upscaling) are offloaded to **Celery** workers via **Redis**. The API responds immediately with a `Queued` status, ensuring the interface remains snappy.
*   **Horizontal Scaling**:
    *   The `forge-worker` service can be scaled horizontally to process more jobs in parallel.
    *   Command: `docker-compose up -d --scale worker=3` (Starts 3 worker containers).
*   **Fault Tolerance**: If a worker crashes or an API fails, the job status is tracked in Postgres, and broken jobs can be automatically retried or flagged.

---

## 💾 Database Schema Overview

| Table | Description | Key Fields |
| :--- | :--- | :--- |
| `jobs` | The central unit of work. | `id`, `module`, `action`, `status` (pending/failed/completed), `input_data` (JSON), `api_provider` |
| `assets` | Files managed by the system. | `id`, `file_path`, `thumbnail_path`, `mime_type`, `source_job_id` |
| `users` | User accounts. | `id`, `email`, `hashed_password`, `role` |

---

## <20> Configuration (.env)

The system is highly configurable. Key variables include:

### API Keys (Critical)
*   `RUNWAY_API_KEY`: For Gen-4 Turbo / Veo 3.
*   `GOOGLE_API_KEY` / `GOOGLE_PROJECT_ID`: For Imagen / Vertex AI Veo.
*   `OPENAI_API_KEY`: For DALL-E, GPT, Whisper.
*   `ELEVENLABS_API_KEY`: For TTS.
*   `TOPAZ_API_KEY`: For Upscaling.

### Infrastructure
*   `DATABASE_URL`: Postgres connection string.
*   `REDIS_URL`: Redis connection string.
*   `CELERY_BROKER_URL`: Usually same as Redis.

---

## 🚀 Deployment & Operation

### Development Mode
```bash
# Start the full stack
docker-compose up -d --build

# View logs
docker-compose logs -f forge-backend
```

### Access Points
*   **UI**: `http://localhost:3000`
*   **API Documentation (Swagger UI)**: `http://localhost:8000/docs`
*   **Database (Internal)**: Port `5432`

---

## <20> Troubleshooting Common Issues

### "Gen-4 Turbo - Validation Failed: Ratio"
*   **Cause**: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768).
*   **Solution**: The backend now includes a **Smart Crop** pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images.

### "422 Unprocessable Entity"
*   **Cause**: Usually missing required fields in the API payload.
*   **Solution**: We recently relaxed the `prompt` requirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache).

---

## © 2025 FORGE AI Platforms
*Unified Creativity.*