diff --git a/README.md b/README.md index 4b88045..99730d1 100644 --- a/README.md +++ b/README.md @@ -1,63 +1,159 @@ -# FORGE AI Platform +# FORGE AI: Unified Generative AI Platform -**FORGE AI** is an advanced, unified generative AI platform designed for creative professionals. It integrates state-of-the-art AI models for video generation, image upscaling, background removal, and audio processing into a single, cohesive interface. - -## πŸš€ Key Features - -### 🎬 Video Generation -* **Runway Integration**: - * **Gen-4 Turbo (Image-to-Video)**: High-fidelity generation with native auto-cropping and advanced camera controls. - * **Veo 3 & 3.1 (Runway)**: Generation using text or image inputs with native 720p support. -* **Google Veo Integration (Native)**: Access Google's Veo models directly via Vertex AI. -* **Smart Processing**: Automatic aspect ratio handling and image resizing to meet strict model requirements. - -### πŸ–ΌοΈ Image Tools -* **Upscaling**: Professional-grade upscaling using **Topaz Photo AI** integration (Face Recovery, Denoising). -* **Background Removal**: Multi-provider support (**Clipping Magic**, **Bria AI**) for precise subject isolation. -* **Generation**: Multi-model image generation (OpenAI DALL-E 3, Stable Diffusion, etc.). - -### πŸ”Š Audio & Utilities -* **Voice-to-Text**: Transcription using OpenAI Whisper. -* **Text-to-Speech**: High-quality voice synthesis via ElevenLabs. -* **Subtitle Processor**: Automatic subtitle generation and burning for videos. -* **Prompt Studio**: AI-powered prompt enhancement and management. +**FORGE AI** is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals. --- -## πŸ—οΈ Architecture +## 🌟 Executive Summary -FORGE AI is built as a containerized microservices application using Docker Compose. - -### Tech Stack -* **Frontend**: Next.js 14 (React), TypeScript, Tailwind CSS. Served via `forge-frontend`. -* **Backend**: FastAPI (Python 3.11). Handles API orchestration, job management, and third-party integrations. Served via `forge-backend`. -* **Database**: PostgreSQL 16. Stores Jobs, Assets, Users, and Projects. -* **Cache/Queue**: Redis. Manages Celery background tasks and caching. -* **Reverse Proxy**: Nginx. Routes traffic and handles static assets. - -### Data Flow -1. **User Request**: User interacts with the Next.js UI. -2. **API Call**: Frontend sends request to `forge-backend` (FastAPI). -3. **Job Creation**: Backend validates input (Pydantic) and creates a `Job` record in PostgreSQL. -4. **Async Processing**: complex tasks (Video Gen, Upscaling) are queued in Redis/Celery. -5. **External APIs**: Worker nodes call APIs (Runway, Google, Topaz, etc.). -6. **Asset Storage**: Resulting files are stored in the `assets/` volume and indexed in the DB. -7. **Notification**: Frontend polls or receives socket updates (planned) for job completion. +Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), **FORGE AI** brings them all together. It allows for complex workflows like generating an image with **DALL-E 3**, upscaling it with **Topaz**, extending it into a video with **Google Veo**, and adding a voiceover via **ElevenLabs**β€”all within one interface. --- -## πŸ”’ Security & Configuration -* **Environment Variables**: extensive configuration via `.env` files. -* **Database Security**: User/Password authentication for Postgres. -* **Volume Management**: Persistent storage for Database (`postgres_data`) and Assets (`assets_data`). +## οΏ½ Comprehensive Feature Matrix + +### 1. 🎬 Video Generation (Multi-Provider) +The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically. + +| Provider | Model | Capabilities | Optimal Use Case | +| :--- | :--- | :--- | :--- | +| **Runway** | **Gen-4 Turbo** | Image-to-Video | **High-Fidelity Animation**. Best for animating static marketing assets. Features **Smart Cropping** (auto-resize to 1280x768). | +| **Runway** | **Veo 3 / 3.1** | Text/Image-to-Video | **Versatile Generation**. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage. | +| **Google** | **Veo Native** | Text-to-Video | **Enterprise Scale**. Direct Vertex AI integration for scalable generation. | + +> **Key Feature**: **Smart Aspect Ratio Handling**. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors. + +### 2. πŸ–ΌοΈ Image Generation (The "Omni-Model" Engine) +Access the latest models without switching tools. + +* **OpenAI**: + * **GPT-Image-1**: The latest efficient model. Supports transparent backgrounds and variable quality. + * **DALL-E 3**: HD quality, vivid/natural styles. +* **Google Imagen**: + * **Imagen 4.0 (Standard/Ultra/Fast)**: Supports "Prompt Enhance" and "Person Generation" safety filters. +* **Stability AI**: + * **SD3.5 / SDXL**: Advanced control with **Negative Prompts** and **Image-to-Image** strength sliders. +* **Nano Banana (Gemini)**: + * **Gemini 2.5 Flash / 3 Pro**: High speed, supports up to 4K resolution and 21:9 aspect ratios. +* **Flux**: Black Forest Labs Flux Pro integration for photorealistic outputs. +* **Ideogram**: Version 2 integration, excellent for typography. + +### 3. οΏ½ Image Utilities +* **Professional Upscaling**: + * Integrated with **Topaz Photo AI SDK**. + * Capabilities: **Face Recovery**, **Denoising**, 2x/4x scaling. +* **Background Removal**: + * **Clipping Magic**: High-precision removal. + * **Bria AI**: Fast, commercially safe removal. + +### 4. πŸ”Š Audio Intelligence +* **Text-to-Speech**: + * **ElevenLabs Integration**: Multilingual V2 model. High-quality voice synthesis. + * Configurable stability, similarity boost, and style. +* **Voice-to-Text**: + * **OpenAI Whisper**: Industry-leading transcription accuracy. + * Supports multiple languages and timestamp generation. +* **Sound Effects**: + * AI-generated SFX for video post-production. + +### 5. πŸ“ Text & Utilities +* **Subtitle Processor**: + * Automated subtitle generation (VTT/SRT). + * **"Burn-in" Capability**: Hardcodes subtitles directly onto video frames using FFmpeg. +* **Prompt Studio**: + * Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions. + * Style presets: Cinematic, Anime, Photorealistic, etc. +* **Markdown/Mermaid**: + * Renders technical diagrams (Flowcharts, Gantt) from text descriptions. --- -## πŸ“š Documentation -* [Installation Guide](./INSTALL.md) - How to set up and run FORGE AI. -* [API Documentation](./backend/README.md) - Details on backend endpoints. -* [Frontend Guide](./frontend/README.md) - UI development/components. +## πŸ—οΈ Technical Architecture + +FORGE AI is built on a **Microservices Architecture** using **Docker Compose**. + +### Backend (`forge-backend`) +* **Framework**: **FastAPI** (Python 3.11). High-performance, async-first API. +* **Task Queue**: **Celery** with **Redis**. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing. +* **Database ORM**: **SQLAlchemy**. Interface for PostgreSQL. +* **Validation**: **Pydantic**. Strict usage of data contracts ensures API reliability. +* **File Handling**: Direct handling of binary assets, streaming uploads to disk storage. + +### Frontend (`forge-frontend`) +* **Framework**: **Next.js 14** (App Router). Server-Side Rendering (SSR) for performance. +* **UI Library**: **React** + **Tailwind CSS**. +* **Components**: Custom design system using **ShadCN/UI** primitives (Dialogs, Selects, Toasts). +* **State Management**: React Hooks for polling job status and updating UI progress bars. + +### Data Persistence +* **PostgreSQL 16**: + * `jobs`: Stores parameters, status, and API metadata for every request. + * `assets`: Tracks file paths, MIME types, metadata (width/height/duration). + * `users`: Authentication and profile data. +* **Redis**: In-memory message broker for Celery and caching layer. +* **Docker Volumes**: + * `postgres_data`: Persistent DB storage. + * `assets_data`: Shared volume for storing generated media files. --- -## Β© 2025 BTG Unified Platform +## πŸ’Ύ Database Schema Overview + +| Table | Description | Key Fields | +| :--- | :--- | :--- | +| `jobs` | The central unit of work. | `id`, `module`, `action`, `status` (pending/failed/completed), `input_data` (JSON), `api_provider` | +| `assets` | Files managed by the system. | `id`, `file_path`, `thumbnail_path`, `mime_type`, `source_job_id` | +| `users` | User accounts. | `id`, `email`, `hashed_password`, `role` | + +--- + +## οΏ½ Configuration (.env) + +The system is highly configurable. Key variables include: + +### API Keys (Critical) +* `RUNWAY_API_KEY`: For Gen-4 Turbo / Veo 3. +* `GOOGLE_API_KEY` / `GOOGLE_PROJECT_ID`: For Imagen / Vertex AI Veo. +* `OPENAI_API_KEY`: For DALL-E, GPT, Whisper. +* `ELEVENLABS_API_KEY`: For TTS. +* `TOPAZ_API_KEY`: For Upscaling. + +### Infrastructure +* `DATABASE_URL`: Postgres connection string. +* `REDIS_URL`: Redis connection string. +* `CELERY_BROKER_URL`: Usually same as Redis. + +--- + +## πŸš€ Deployment & Operation + +### Development Mode +```bash +# Start the full stack +docker-compose up -d --build + +# View logs +docker-compose logs -f forge-backend +``` + +### Access Points +* **UI**: `http://localhost:3000` +* **API Documentation (Swagger UI)**: `http://localhost:8000/docs` +* **Database (Internal)**: Port `5432` + +--- + +## οΏ½ Troubleshooting Common Issues + +### "Gen-4 Turbo - Validation Failed: Ratio" +* **Cause**: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768). +* **Solution**: The backend now includes a **Smart Crop** pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images. + +### "422 Unprocessable Entity" +* **Cause**: Usually missing required fields in the API payload. +* **Solution**: We recently relaxed the `prompt` requirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache). + +--- + +## Β© 2025 FORGE AI Platforms +*Unified Creativity.*