Update README with comprehensive documentation

2025-12-10 21:23:48 -05:00 · 2025-12-10 21:23:48 -05:00 · 5ee2082b88
commit 5ee2082b88
parent 5fdbf3c6cd
1 changed files with 145 additions and 49 deletions
--- a/README.md
+++ b/README.md
@ -1,63 +1,159 @@
-# FORGE AI Platform
+# FORGE AI: Unified Generative AI Platform

-**FORGE AI** is an advanced, unified generative AI platform designed for creative professionals. It integrates state-of-the-art AI models for video generation, image upscaling, background removal, and audio processing into a single, cohesive interface.
-
-## 🚀 Key Features
-
-### 🎬 Video Generation
-*   **Runway Integration**:
-    *   **Gen-4 Turbo (Image-to-Video)**: High-fidelity generation with native auto-cropping and advanced camera controls.
-    *   **Veo 3 & 3.1 (Runway)**: Generation using text or image inputs with native 720p support.
-*   **Google Veo Integration (Native)**: Access Google's Veo models directly via Vertex AI.
-*   **Smart Processing**: Automatic aspect ratio handling and image resizing to meet strict model requirements.
-
-### 🖼️ Image Tools
-*   **Upscaling**: Professional-grade upscaling using **Topaz Photo AI** integration (Face Recovery, Denoising).
-*   **Background Removal**: Multi-provider support (**Clipping Magic**, **Bria AI**) for precise subject isolation.
-*   **Generation**: Multi-model image generation (OpenAI DALL-E 3, Stable Diffusion, etc.).
-
-### 🔊 Audio & Utilities
-*   **Voice-to-Text**: Transcription using OpenAI Whisper.
-*   **Text-to-Speech**: High-quality voice synthesis via ElevenLabs.
-*   **Subtitle Processor**: Automatic subtitle generation and burning for videos.
-*   **Prompt Studio**: AI-powered prompt enhancement and management.
+**FORGE AI** is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals.

 ---

-## 🏗️ Architecture
+## 🌟 Executive Summary

-FORGE AI is built as a containerized microservices application using Docker Compose.
-
-### Tech Stack
-*   **Frontend**: Next.js 14 (React), TypeScript, Tailwind CSS. Served via `forge-frontend`.
-*   **Backend**: FastAPI (Python 3.11). Handles API orchestration, job management, and third-party integrations. Served via `forge-backend`.
-*   **Database**: PostgreSQL 16. Stores Jobs, Assets, Users, and Projects.
-*   **Cache/Queue**: Redis. Manages Celery background tasks and caching.
-*   **Reverse Proxy**: Nginx. Routes traffic and handles static assets.
-
-### Data Flow
-1.  **User Request**: User interacts with the Next.js UI.
-2.  **API Call**: Frontend sends request to `forge-backend` (FastAPI).
-3.  **Job Creation**: Backend validates input (Pydantic) and creates a `Job` record in PostgreSQL.
-4.  **Async Processing**: complex tasks (Video Gen, Upscaling) are queued in Redis/Celery.
-5.  **External APIs**: Worker nodes call APIs (Runway, Google, Topaz, etc.).
-6.  **Asset Storage**: Resulting files are stored in the `assets/` volume and indexed in the DB.
-7.  **Notification**: Frontend polls or receives socket updates (planned) for job completion.
+Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), **FORGE AI** brings them all together. It allows for complex workflows like generating an image with **DALL-E 3**, upscaling it with **Topaz**, extending it into a video with **Google Veo**, and adding a voiceover via **ElevenLabs**—all within one interface.

 ---

-## 🔒 Security & Configuration
-*   **Environment Variables**: extensive configuration via `.env` files.
-*   **Database Security**: User/Password authentication for Postgres.
-*   **Volume Management**: Persistent storage for Database (`postgres_data`) and Assets (`assets_data`).
+## <20> Comprehensive Feature Matrix
+
+### 1. 🎬 Video Generation (Multi-Provider)
+The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically.
+
+| Provider | Model | Capabilities | Optimal Use Case |
+| :--- | :--- | :--- | :--- |
+| **Runway** | **Gen-4 Turbo** | Image-to-Video | **High-Fidelity Animation**. Best for animating static marketing assets. Features **Smart Cropping** (auto-resize to 1280x768). |
+| **Runway** | **Veo 3 / 3.1** | Text/Image-to-Video | **Versatile Generation**. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage. |
+| **Google** | **Veo Native** | Text-to-Video | **Enterprise Scale**. Direct Vertex AI integration for scalable generation. |
+
+> **Key Feature**: **Smart Aspect Ratio Handling**. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors.
+
+### 2. 🖼️ Image Generation (The "Omni-Model" Engine)
+Access the latest models without switching tools.
+
+*   **OpenAI**:
+    *   **GPT-Image-1**: The latest efficient model. Supports transparent backgrounds and variable quality.
+    *   **DALL-E 3**: HD quality, vivid/natural styles.
+*   **Google Imagen**:
+    *   **Imagen 4.0 (Standard/Ultra/Fast)**: Supports "Prompt Enhance" and "Person Generation" safety filters.
+*   **Stability AI**:
+    *   **SD3.5 / SDXL**: Advanced control with **Negative Prompts** and **Image-to-Image** strength sliders.
+*   **Nano Banana (Gemini)**:
+    *   **Gemini 2.5 Flash / 3 Pro**: High speed, supports up to 4K resolution and 21:9 aspect ratios.
+*   **Flux**: Black Forest Labs Flux Pro integration for photorealistic outputs.
+*   **Ideogram**: Version 2 integration, excellent for typography.
+
+### 3. <20> Image Utilities
+*   **Professional Upscaling**:
+    *   Integrated with **Topaz Photo AI SDK**.
+    *   Capabilities: **Face Recovery**, **Denoising**, 2x/4x scaling.
+*   **Background Removal**:
+    *   **Clipping Magic**: High-precision removal.
+    *   **Bria AI**: Fast, commercially safe removal.
+
+### 4. 🔊 Audio Intelligence
+*   **Text-to-Speech**:
+    *   **ElevenLabs Integration**: Multilingual V2 model. High-quality voice synthesis.
+    *   Configurable stability, similarity boost, and style.
+*   **Voice-to-Text**:
+    *   **OpenAI Whisper**: Industry-leading transcription accuracy.
+    *   Supports multiple languages and timestamp generation.
+*   **Sound Effects**:
+    *   AI-generated SFX for video post-production.
+
+### 5. 📝 Text & Utilities
+*   **Subtitle Processor**:
+    *   Automated subtitle generation (VTT/SRT).
+    *   **"Burn-in" Capability**: Hardcodes subtitles directly onto video frames using FFmpeg.
+*   **Prompt Studio**:
+    *   Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions.
+    *   Style presets: Cinematic, Anime, Photorealistic, etc.
+*   **Markdown/Mermaid**:
+    *   Renders technical diagrams (Flowcharts, Gantt) from text descriptions.

 ---

-## 📚 Documentation
-*   [Installation Guide](./INSTALL.md) - How to set up and run FORGE AI.
-*   [API Documentation](./backend/README.md) - Details on backend endpoints.
-*   [Frontend Guide](./frontend/README.md) - UI development/components.
+## 🏗️ Technical Architecture
+
+FORGE AI is built on a **Microservices Architecture** using **Docker Compose**.
+
+### Backend (`forge-backend`)
+*   **Framework**: **FastAPI** (Python 3.11). High-performance, async-first API.
+*   **Task Queue**: **Celery** with **Redis**. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing.
+*   **Database ORM**: **SQLAlchemy**. Interface for PostgreSQL.
+*   **Validation**: **Pydantic**. Strict usage of data contracts ensures API reliability.
+*   **File Handling**: Direct handling of binary assets, streaming uploads to disk storage.
+
+### Frontend (`forge-frontend`)
+*   **Framework**: **Next.js 14** (App Router). Server-Side Rendering (SSR) for performance.
+*   **UI Library**: **React** + **Tailwind CSS**.
+*   **Components**: Custom design system using **ShadCN/UI** primitives (Dialogs, Selects, Toasts).
+*   **State Management**: React Hooks for polling job status and updating UI progress bars.
+
+### Data Persistence
+*   **PostgreSQL 16**:
+    *   `jobs`: Stores parameters, status, and API metadata for every request.
+    *   `assets`: Tracks file paths, MIME types, metadata (width/height/duration).
+    *   `users`: Authentication and profile data.
+*   **Redis**: In-memory message broker for Celery and caching layer.
+*   **Docker Volumes**:
+    *   `postgres_data`: Persistent DB storage.
+    *   `assets_data`: Shared volume for storing generated media files.

 ---

-## © 2025 BTG Unified Platform
+## 💾 Database Schema Overview
+
+| Table | Description | Key Fields |
+| :--- | :--- | :--- |
+| `jobs` | The central unit of work. | `id`, `module`, `action`, `status` (pending/failed/completed), `input_data` (JSON), `api_provider` |
+| `assets` | Files managed by the system. | `id`, `file_path`, `thumbnail_path`, `mime_type`, `source_job_id` |
+| `users` | User accounts. | `id`, `email`, `hashed_password`, `role` |
+
+---
+
+## <20> Configuration (.env)
+
+The system is highly configurable. Key variables include:
+
+### API Keys (Critical)
+*   `RUNWAY_API_KEY`: For Gen-4 Turbo / Veo 3.
+*   `GOOGLE_API_KEY` / `GOOGLE_PROJECT_ID`: For Imagen / Vertex AI Veo.
+*   `OPENAI_API_KEY`: For DALL-E, GPT, Whisper.
+*   `ELEVENLABS_API_KEY`: For TTS.
+*   `TOPAZ_API_KEY`: For Upscaling.
+
+### Infrastructure
+*   `DATABASE_URL`: Postgres connection string.
+*   `REDIS_URL`: Redis connection string.
+*   `CELERY_BROKER_URL`: Usually same as Redis.
+
+---
+
+## 🚀 Deployment & Operation
+
+### Development Mode
+```bash
+# Start the full stack
+docker-compose up -d --build
+
+# View logs
+docker-compose logs -f forge-backend
+```
+
+### Access Points
+*   **UI**: `http://localhost:3000`
+*   **API Documentation (Swagger UI)**: `http://localhost:8000/docs`
+*   **Database (Internal)**: Port `5432`
+
+---
+
+## <20> Troubleshooting Common Issues
+
+### "Gen-4 Turbo - Validation Failed: Ratio"
+*   **Cause**: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768).
+*   **Solution**: The backend now includes a **Smart Crop** pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images.
+
+### "422 Unprocessable Entity"
+*   **Cause**: Usually missing required fields in the API payload.
+*   **Solution**: We recently relaxed the `prompt` requirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache).
+
+---
+
+## © 2025 FORGE AI Platforms
+*Unified Creativity.*