No description
Find a file
2025-12-10 21:28:39 -05:00
backend Fix video generation for Runway (Veo3/Gen4) 2025-12-10 20:49:15 -05:00
docker Initial commit - FORGE AI unified platform 2025-12-09 20:39:00 -05:00
frontend Fix video generation for Runway (Veo3/Gen4) 2025-12-10 20:49:15 -05:00
nginx Initial commit - FORGE AI unified platform 2025-12-09 20:39:00 -05:00
OLD_DOCS Documentation Overhaul: Created comprehensive README and INSTALL guides, archived old docs 2025-12-10 21:20:53 -05:00
.env.example Initial commit - FORGE AI unified platform 2025-12-09 20:39:00 -05:00
.gitignore Backup: Work in progress on Frame Extractor and general updates 2025-12-10 17:37:05 -05:00
AUTONOMOUS_TEST_REPORT.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
COMPLETE_API_SPECIFICATION.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
COMPREHENSIVE_TODO_LIST.md Add text tools to navigation menu 2025-12-10 09:42:18 -05:00
docker-compose.yml Fix: Leonardo/Nano Banana integration, add Topaz logging/debug scripts, implement TIF Clipping Path 2025-12-10 13:32:19 -05:00
FINAL_SESSION_REPORT.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
FINAL_STATUS_FOR_USER.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
INSTALL.md Documentation Overhaul: Created comprehensive README and INSTALL guides, archived old docs 2025-12-10 21:20:53 -05:00
QUICK_START.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
README.md Docs: Add Scalability and Queuing section to README 2025-12-10 21:28:39 -05:00
REMAINING_WORK.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
SESSION_SUMMARY_AND_NEXT_STEPS.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
TASKS.md Add tasks documentation for remaining work 2025-12-09 21:15:04 -05:00
TEST_RESULTS.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00
WELCOME_BACK.md Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed 2025-12-10 09:38:35 -05:00

FORGE AI: Unified Generative AI Platform

FORGE AI is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals.


🌟 Executive Summary

Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), FORGE AI brings them all together. It allows for complex workflows like generating an image with DALL-E 3, upscaling it with Topaz, extending it into a video with Google Veo, and adding a voiceover via ElevenLabs—all within one interface.


<EFBFBD> Comprehensive Feature Matrix

1. 🎬 Video Generation (Multi-Provider)

The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically.

Provider Model Capabilities Optimal Use Case
Runway Gen-4 Turbo Image-to-Video High-Fidelity Animation. Best for animating static marketing assets. Features Smart Cropping (auto-resize to 1280x768).
Runway Veo 3 / 3.1 Text/Image-to-Video Versatile Generation. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage.
Google Veo Native Text-to-Video Enterprise Scale. Direct Vertex AI integration for scalable generation.

Key Feature: Smart Aspect Ratio Handling. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors.

2. 🖼️ Image Generation (The "Omni-Model" Engine)

Access the latest models without switching tools.

  • OpenAI:
    • GPT-Image-1: The latest efficient model. Supports transparent backgrounds and variable quality.
    • DALL-E 3: HD quality, vivid/natural styles.
  • Google Imagen:
    • Imagen 4.0 (Standard/Ultra/Fast): Supports "Prompt Enhance" and "Person Generation" safety filters.
  • Stability AI:
    • SD3.5 / SDXL: Advanced control with Negative Prompts and Image-to-Image strength sliders.
  • Nano Banana (Gemini):
    • Gemini 2.5 Flash / 3 Pro: High speed, supports up to 4K resolution and 21:9 aspect ratios.
  • Flux: Black Forest Labs Flux Pro integration for photorealistic outputs.
  • Ideogram: Version 2 integration, excellent for typography.

3. <20> Image Utilities

  • Professional Upscaling:
    • Integrated with Topaz Photo AI SDK.
    • Capabilities: Face Recovery, Denoising, 2x/4x scaling.
  • Background Removal:
    • Clipping Magic: High-precision removal.
    • Bria AI: Fast, commercially safe removal.

4. 🔊 Audio Intelligence

  • Text-to-Speech:
    • ElevenLabs Integration: Multilingual V2 model. High-quality voice synthesis.
    • Configurable stability, similarity boost, and style.
  • Voice-to-Text:
    • OpenAI Whisper: Industry-leading transcription accuracy.
    • Supports multiple languages and timestamp generation.
  • Sound Effects:
    • AI-generated SFX for video post-production.

5. 📝 Text & Utilities

  • Subtitle Processor:
    • Automated subtitle generation (VTT/SRT).
    • "Burn-in" Capability: Hardcodes subtitles directly onto video frames using FFmpeg.
  • Prompt Studio:
    • Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions.
    • Style presets: Cinematic, Anime, Photorealistic, etc.
  • Markdown/Mermaid:
    • Renders technical diagrams (Flowcharts, Gantt) from text descriptions.

🏗️ Technical Architecture

FORGE AI is built on a Microservices Architecture using Docker Compose.

Backend (forge-backend)

  • Framework: FastAPI (Python 3.11). High-performance, async-first API.
  • Task Queue: Celery with Redis. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing.
  • Database ORM: SQLAlchemy. Interface for PostgreSQL.
  • Validation: Pydantic. Strict usage of data contracts ensures API reliability.
  • File Handling: Direct handling of binary assets, streaming uploads to disk storage.

Frontend (forge-frontend)

  • Framework: Next.js 14 (App Router). Server-Side Rendering (SSR) for performance.
  • UI Library: React + Tailwind CSS.
  • Components: Custom design system using ShadCN/UI primitives (Dialogs, Selects, Toasts).
  • State Management: React Hooks for polling job status and updating UI progress bars.

Data Persistence

  • PostgreSQL 16:
    • jobs: Stores parameters, status, and API metadata for every request.
    • assets: Tracks file paths, MIME types, metadata (width/height/duration).
    • users: Authentication and profile data.
  • Redis: In-memory message broker for Celery and caching layer.
  • Docker Volumes:
    • postgres_data: Persistent DB storage.
    • assets_data: Shared volume for storing generated media files.

Scalability & Performance

FORGE AI is architected to handle high user concurrency (e.g., 200+ simultaneous users) without degrading API performance.

  • Asynchronous Job Queue: All heavy compute tasks (Video Generation, Upscaling) are offloaded to Celery workers via Redis. The API responds immediately with a Queued status, ensuring the interface remains snappy.
  • Horizontal Scaling:
    • The forge-worker service can be scaled horizontally to process more jobs in parallel.
    • Command: docker-compose up -d --scale worker=3 (Starts 3 worker containers).
  • Fault Tolerance: If a worker crashes or an API fails, the job status is tracked in Postgres, and broken jobs can be automatically retried or flagged.

💾 Database Schema Overview

Table Description Key Fields
jobs The central unit of work. id, module, action, status (pending/failed/completed), input_data (JSON), api_provider
assets Files managed by the system. id, file_path, thumbnail_path, mime_type, source_job_id
users User accounts. id, email, hashed_password, role

<EFBFBD> Configuration (.env)

The system is highly configurable. Key variables include:

API Keys (Critical)

  • RUNWAY_API_KEY: For Gen-4 Turbo / Veo 3.
  • GOOGLE_API_KEY / GOOGLE_PROJECT_ID: For Imagen / Vertex AI Veo.
  • OPENAI_API_KEY: For DALL-E, GPT, Whisper.
  • ELEVENLABS_API_KEY: For TTS.
  • TOPAZ_API_KEY: For Upscaling.

Infrastructure

  • DATABASE_URL: Postgres connection string.
  • REDIS_URL: Redis connection string.
  • CELERY_BROKER_URL: Usually same as Redis.

🚀 Deployment & Operation

Development Mode

# Start the full stack
docker-compose up -d --build

# View logs
docker-compose logs -f forge-backend

Access Points

  • UI: http://localhost:3000
  • API Documentation (Swagger UI): http://localhost:8000/docs
  • Database (Internal): Port 5432

<EFBFBD> Troubleshooting Common Issues

"Gen-4 Turbo - Validation Failed: Ratio"

  • Cause: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768).
  • Solution: The backend now includes a Smart Crop pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images.

"422 Unprocessable Entity"

  • Cause: Usually missing required fields in the API payload.
  • Solution: We recently relaxed the prompt requirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache).

© 2025 FORGE AI Platforms

Unified Creativity.