No description

Find a file

DJP 136f92f6f2 Docs: Add Scalability and Queuing section to README		2025-12-10 21:28:39 -05:00
backend	Fix video generation for Runway (Veo3/Gen4)	2025-12-10 20:49:15 -05:00
docker	Initial commit - FORGE AI unified platform	2025-12-09 20:39:00 -05:00
frontend	Fix video generation for Runway (Veo3/Gen4)	2025-12-10 20:49:15 -05:00
nginx	Initial commit - FORGE AI unified platform	2025-12-09 20:39:00 -05:00
OLD_DOCS	Documentation Overhaul: Created comprehensive README and INSTALL guides, archived old docs	2025-12-10 21:20:53 -05:00
.env.example	Initial commit - FORGE AI unified platform	2025-12-09 20:39:00 -05:00
.gitignore	Backup: Work in progress on Frame Extractor and general updates	2025-12-10 17:37:05 -05:00
AUTONOMOUS_TEST_REPORT.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
COMPLETE_API_SPECIFICATION.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
COMPREHENSIVE_TODO_LIST.md	Add text tools to navigation menu	2025-12-10 09:42:18 -05:00
docker-compose.yml	Fix: Leonardo/Nano Banana integration, add Topaz logging/debug scripts, implement TIF Clipping Path	2025-12-10 13:32:19 -05:00
FINAL_SESSION_REPORT.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
FINAL_STATUS_FOR_USER.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
INSTALL.md	Documentation Overhaul: Created comprehensive README and INSTALL guides, archived old docs	2025-12-10 21:20:53 -05:00
QUICK_START.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
README.md	Docs: Add Scalability and Queuing section to README	2025-12-10 21:28:39 -05:00
REMAINING_WORK.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
SESSION_SUMMARY_AND_NEXT_STEPS.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
TASKS.md	Add tasks documentation for remaining work	2025-12-09 21:15:04 -05:00
TEST_RESULTS.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00
WELCOME_BACK.md	Complete platform overhaul: dynamic UI, 9 providers, all bugs fixed	2025-12-10 09:38:35 -05:00

README.md

FORGE AI: Unified Generative AI Platform

FORGE AI is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals.

🌟 Executive Summary

Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), FORGE AI brings them all together. It allows for complex workflows like generating an image with DALL-E 3, upscaling it with Topaz, extending it into a video with Google Veo, and adding a voiceover via ElevenLabs—all within one interface.

<EFBFBD> Comprehensive Feature Matrix

1. 🎬 Video Generation (Multi-Provider)

The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically.

Provider	Model	Capabilities	Optimal Use Case
Runway	Gen-4 Turbo	Image-to-Video	High-Fidelity Animation. Best for animating static marketing assets. Features Smart Cropping (auto-resize to 1280x768).
Runway	Veo 3 / 3.1	Text/Image-to-Video	Versatile Generation. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage.
Google	Veo Native	Text-to-Video	Enterprise Scale. Direct Vertex AI integration for scalable generation.

Key Feature: Smart Aspect Ratio Handling. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors.

2. 🖼️ Image Generation (The "Omni-Model" Engine)

Access the latest models without switching tools.

OpenAI:
- GPT-Image-1: The latest efficient model. Supports transparent backgrounds and variable quality.
- DALL-E 3: HD quality, vivid/natural styles.
Google Imagen:
- Imagen 4.0 (Standard/Ultra/Fast): Supports "Prompt Enhance" and "Person Generation" safety filters.
Stability AI:
- SD3.5 / SDXL: Advanced control with Negative Prompts and Image-to-Image strength sliders.
Nano Banana (Gemini):
- Gemini 2.5 Flash / 3 Pro: High speed, supports up to 4K resolution and 21:9 aspect ratios.
Flux: Black Forest Labs Flux Pro integration for photorealistic outputs.
Ideogram: Version 2 integration, excellent for typography.

3. <20> Image Utilities

Professional Upscaling:
- Integrated with Topaz Photo AI SDK.
- Capabilities: Face Recovery, Denoising, 2x/4x scaling.
Background Removal:
- Clipping Magic: High-precision removal.
- Bria AI: Fast, commercially safe removal.

4. 🔊 Audio Intelligence

Text-to-Speech:
- ElevenLabs Integration: Multilingual V2 model. High-quality voice synthesis.
- Configurable stability, similarity boost, and style.
Voice-to-Text:
- OpenAI Whisper: Industry-leading transcription accuracy.
- Supports multiple languages and timestamp generation.
Sound Effects:
- AI-generated SFX for video post-production.

5. 📝 Text & Utilities

Subtitle Processor:
- Automated subtitle generation (VTT/SRT).
- "Burn-in" Capability: Hardcodes subtitles directly onto video frames using FFmpeg.
Prompt Studio:
- Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions.
- Style presets: Cinematic, Anime, Photorealistic, etc.
Markdown/Mermaid:
- Renders technical diagrams (Flowcharts, Gantt) from text descriptions.

🏗️ Technical Architecture

FORGE AI is built on a Microservices Architecture using Docker Compose.

Backend (`forge-backend`)

Framework: FastAPI (Python 3.11). High-performance, async-first API.
Task Queue: Celery with Redis. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing.
Database ORM: SQLAlchemy. Interface for PostgreSQL.
Validation: Pydantic. Strict usage of data contracts ensures API reliability.
File Handling: Direct handling of binary assets, streaming uploads to disk storage.

Frontend (`forge-frontend`)

Framework: Next.js 14 (App Router). Server-Side Rendering (SSR) for performance.
UI Library: React + Tailwind CSS.
Components: Custom design system using ShadCN/UI primitives (Dialogs, Selects, Toasts).
State Management: React Hooks for polling job status and updating UI progress bars.

Data Persistence

PostgreSQL 16:
- jobs: Stores parameters, status, and API metadata for every request.
- assets: Tracks file paths, MIME types, metadata (width/height/duration).
- users: Authentication and profile data.
Redis: In-memory message broker for Celery and caching layer.
Docker Volumes:
- postgres_data: Persistent DB storage.
- assets_data: Shared volume for storing generated media files.

⚡️ Scalability & Performance

FORGE AI is architected to handle high user concurrency (e.g., 200+ simultaneous users) without degrading API performance.

Asynchronous Job Queue: All heavy compute tasks (Video Generation, Upscaling) are offloaded to Celery workers via Redis. The API responds immediately with a Queued status, ensuring the interface remains snappy.
Horizontal Scaling:
- The forge-worker service can be scaled horizontally to process more jobs in parallel.
- Command: docker-compose up -d --scale worker=3 (Starts 3 worker containers).
Fault Tolerance: If a worker crashes or an API fails, the job status is tracked in Postgres, and broken jobs can be automatically retried or flagged.

💾 Database Schema Overview

Table	Description	Key Fields
`jobs`	The central unit of work.	`id`, `module`, `action`, `status` (pending/failed/completed), `input_data` (JSON), `api_provider`
`assets`	Files managed by the system.	`id`, `file_path`, `thumbnail_path`, `mime_type`, `source_job_id`
`users`	User accounts.	`id`, `email`, `hashed_password`, `role`

<EFBFBD> Configuration (.env)

The system is highly configurable. Key variables include:

API Keys (Critical)

RUNWAY_API_KEY: For Gen-4 Turbo / Veo 3.
GOOGLE_API_KEY / GOOGLE_PROJECT_ID: For Imagen / Vertex AI Veo.
OPENAI_API_KEY: For DALL-E, GPT, Whisper.
ELEVENLABS_API_KEY: For TTS.
TOPAZ_API_KEY: For Upscaling.

Infrastructure

DATABASE_URL: Postgres connection string.
REDIS_URL: Redis connection string.
CELERY_BROKER_URL: Usually same as Redis.

🚀 Deployment & Operation

Development Mode

# Start the full stack
docker-compose up -d --build

# View logs
docker-compose logs -f forge-backend

Access Points

UI: http://localhost:3000
API Documentation (Swagger UI): http://localhost:8000/docs
Database (Internal): Port 5432

<EFBFBD> Troubleshooting Common Issues

"Gen-4 Turbo - Validation Failed: Ratio"

Cause: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768).
Solution: The backend now includes a Smart Crop pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images.

"422 Unprocessable Entity"

Cause: Usually missing required fields in the API payload.
Solution: We recently relaxed the prompt requirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache).

Unified Creativity.

README.md Unescape Escape