| backend | ||
| docker | ||
| frontend | ||
| nginx | ||
| .env.example | ||
| .gitignore | ||
| docker-compose.yml | ||
| forge-ai.code-workspace | ||
| INSTALL.md | ||
| README.md | ||
FORGE AI: Unified Generative AI Platform
FORGE AI is an enterprise-grade, microservices-based platform designed to unify the world's most powerful generative AI models into a single, cohesive workflow for various media types. It provides a robust backend for orchestration and a modern, responsive frontend for creative professionals.
🌟 Executive Summary
Instead of managing subscription islands (Runway, Midjourney, Topaz, ElevenLabs), FORGE AI brings them all together. It allows for complex workflows like generating an image with DALL-E 3, upscaling it with Topaz, extending it into a video with Google Veo, and adding a voiceover via ElevenLabs—all within one interface.
<EFBFBD> Comprehensive Feature Matrix
1. 🎬 Video Generation (Multi-Provider)
The video module abstracts complexity between different providers, handling authentication, file upload, polling, and result retrieval automatically.
| Provider | Model | Capabilities | Optimal Use Case |
|---|---|---|---|
| Runway | Gen-4 Turbo | Image-to-Video | High-Fidelity Animation. Best for animating static marketing assets. Features Smart Cropping (auto-resize to 1280x768). |
| Runway | Veo 3 / 3.1 | Text/Image-to-Video | Versatile Generation. Native 720p/1080p, 8-second clips. Good for general-purpose stock footage. |
| Veo Native | Text-to-Video | Enterprise Scale. Direct Vertex AI integration for scalable generation. |
Key Feature: Smart Aspect Ratio Handling. The backend automatically detects the required aspect ratio for the selected model (e.g., Gen-4's strict 1280:768 requirement) and resizes/crops input images on the fly to prevent API errors.
2. 🖼️ Image Generation (The "Omni-Model" Engine)
Access the latest models without switching tools.
- OpenAI:
- GPT-Image-1: The latest efficient model. Supports transparent backgrounds and variable quality.
- DALL-E 3: HD quality, vivid/natural styles.
- Google Imagen:
- Imagen 4.0 (Standard/Ultra/Fast): Supports "Prompt Enhance" and "Person Generation" safety filters.
- Stability AI:
- SD3.5 / SDXL: Advanced control with Negative Prompts and Image-to-Image strength sliders.
- Nano Banana (Gemini):
- Gemini 2.5 Flash / 3 Pro: High speed, supports up to 4K resolution and 21:9 aspect ratios.
- Flux: Black Forest Labs Flux Pro integration for photorealistic outputs.
- Ideogram: Version 2 integration, excellent for typography.
3. <20> Image Utilities
- Professional Upscaling:
- Integrated with Topaz Photo AI SDK.
- Capabilities: Face Recovery, Denoising, 2x/4x scaling.
- Background Removal:
- Clipping Magic: High-precision removal.
- Bria AI: Fast, commercially safe removal.
4. 🔊 Audio Intelligence
- Text-to-Speech:
- ElevenLabs Integration: Multilingual V2 model. High-quality voice synthesis.
- Configurable stability, similarity boost, and style.
- Voice-to-Text:
- OpenAI Whisper: Industry-leading transcription accuracy.
- Supports multiple languages and timestamp generation.
- Sound Effects:
- AI-generated SFX for video post-production.
5. 📝 Text & Utilities
- Subtitle Processor:
- Automated subtitle generation (VTT/SRT).
- "Burn-in" Capability: Hardcodes subtitles directly onto video frames using FFmpeg.
- Prompt Studio:
- Uses LLMs (Gemini/GPT-4) to refine simple prompts into detailed, artistic descriptions.
- Style presets: Cinematic, Anime, Photorealistic, etc.
- Markdown/Mermaid:
- Renders technical diagrams (Flowcharts, Gantt) from text descriptions.
🏗️ Technical Architecture
FORGE AI is built on a Microservices Architecture using Docker Compose.
Backend (forge-backend)
- Framework: FastAPI (Python 3.11). High-performance, async-first API.
- Task Queue: Celery with Redis. Handles long-running jobs (video gen can take minutes) cleanly, decoupling HTTP requests from processing.
- Database ORM: SQLAlchemy. Interface for PostgreSQL.
- Validation: Pydantic. Strict usage of data contracts ensures API reliability.
- File Handling: Direct handling of binary assets, streaming uploads to disk storage.
Frontend (forge-frontend)
- Framework: Next.js 14 (App Router). Server-Side Rendering (SSR) for performance.
- UI Library: React + Tailwind CSS.
- Components: Custom design system using ShadCN/UI primitives (Dialogs, Selects, Toasts).
- State Management: React Hooks for polling job status and updating UI progress bars.
Data Persistence
- PostgreSQL 16:
jobs: Stores parameters, status, and API metadata for every request.assets: Tracks file paths, MIME types, metadata (width/height/duration).users: Authentication and profile data.
- Redis: In-memory message broker for Celery and caching layer.
- Docker Volumes:
postgres_data: Persistent DB storage.assets_data: Shared volume for storing generated media files.
⚡️ Scalability & Performance
FORGE AI is architected to handle high user concurrency (e.g., 200+ simultaneous users) without degrading API performance.
- Asynchronous Job Queue: All heavy compute tasks (Video Generation, Upscaling) are offloaded to Celery workers via Redis. The API responds immediately with a
Queuedstatus, ensuring the interface remains snappy. - Horizontal Scaling:
- The
forge-workerservice can be scaled horizontally to process more jobs in parallel. - Command:
docker-compose up -d --scale worker=3(Starts 3 worker containers).
- The
- Fault Tolerance: If a worker crashes or an API fails, the job status is tracked in Postgres, and broken jobs can be automatically retried or flagged.
💾 Database Schema Overview
| Table | Description | Key Fields |
|---|---|---|
jobs |
The central unit of work. | id, module, action, status (pending/failed/completed), input_data (JSON), api_provider |
assets |
Files managed by the system. | id, file_path, thumbnail_path, mime_type, source_job_id |
users |
User accounts. | id, email, hashed_password, role |
<EFBFBD> Configuration (.env)
The system is highly configurable. Key variables include:
API Keys (Critical)
RUNWAY_API_KEY: For Gen-4 Turbo / Veo 3.GOOGLE_API_KEY/GOOGLE_PROJECT_ID: For Imagen / Vertex AI Veo.OPENAI_API_KEY: For DALL-E, GPT, Whisper.ELEVENLABS_API_KEY: For TTS.TOPAZ_API_KEY: For Upscaling.
Infrastructure
DATABASE_URL: Postgres connection string.REDIS_URL: Redis connection string.CELERY_BROKER_URL: Usually same as Redis.
🚀 Deployment & Operation
Development Mode
# Start the full stack
docker-compose up -d --build
# View logs
docker-compose logs -f forge-backend
Access Points
- UI:
http://localhost:3000 - API Documentation (Swagger UI):
http://localhost:8000/docs - Database (Internal): Port
5432
💾 Database Storage & Access
PostgreSQL Database Storage
The database files are stored in a Docker volume for persistence across container restarts.
Docker Volume: forge-ai_postgres_data
Physical Location: /var/lib/docker/volumes/forge-ai_postgres_data/_data
Note: On macOS, this is inside Docker Desktop's Linux VM and managed automatically.
Database Connection Info
- Host:
localhost(from your Mac) - Port:
5452(mapped from container's 5432) - Database:
forge_ai - User:
forge_user - Password:
forge_secure_password_2024
Accessing the Database
1. Via Docker exec (Interactive SQL):
docker exec -it forge-postgres psql -U forge_user -d forge_ai
2. Backup the Database:
docker exec forge-postgres pg_dump -U forge_user forge_ai > backup.sql
3. Restore from Backup:
cat backup.sql | docker exec -i forge-postgres psql -U forge_user -d forge_ai
4. View Volume Information:
docker volume inspect forge-ai_postgres_data
Other Storage Locations
- Uploaded Files/Assets:
./storage(in your project directory) - Redis Data:
forge-ai_redis_dataDocker volume - Celery Task Queue: Managed by Redis (in-memory)
⚠️ Troubleshooting Common Issues
"Gen-4 Turbo - Validation Failed: Ratio"
- Cause: Runway's Gen-4 Turbo API is extremely strict about input image resolution (must be 1280x768).
- Solution: The backend now includes a Smart Crop pre-processor. It automatically resizes and crops your input to the exact pixel dimensions required. You do not need to manually edit images.
"422 Unprocessable Entity"
- Cause: Usually missing required fields in the API payload.
- Solution: We recently relaxed the
promptrequirement for Image-to-Video jobs. Ensure your frontend is sending the correct structure (refresh browser to clear cache).
© 2025 FORGE AI Platforms
Unified Creativity.