DJP 4deed84ba0 Initial commit: Nano AI Image Generator

- Complete working image generation app using Imagen 3
- PHP backend with Gemini API integration
- Dark themed UI with prompt enhancement
- Session management and logging system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

2025-12-16 08:35:02 -05:00

17 KiB

Raw Permalink Blame History

Nano Banana Pro - AI Implementation Guide

How to Build an Iterative Image Generation & Editing System with Google Gemini

🎯 CRITICAL CONCEPT: This is NOT a standard image generation API

⚠️ WATCHOUT #1: Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney.

Key Differences:

Uses generateContent endpoint (not a dedicated image API)
Images returned as base64 in JSON (embedded in response)
Editing = Sending previous image back (as base64 in request)
Very aggressive content filters (IMAGE_RECITATION errors)
No direct image URLs (everything is base64)

📐 SYSTEM ARCHITECTURE

User Input (Prompt/Upload)
    ↓
JavaScript Frontend (converts file to base64)
    ↓
PHP Backend API (api.php)
    ↓
Session Storage (stores base64 + MIME type)
    ↓
Google Gemini API (processes with previous image if editing)
    ↓
Extract base64 from response
    ↓
Store in session
    ↓
Display in browser (data URI)

🔑 CRITICAL IMPLEMENTATION DETAILS

1. THE REQUEST FORMAT (MOST IMPORTANT!)

⚠️ WATCHOUT #2: The request structure is VERY specific. Get this wrong and you get 500 errors.

For NEW image generation:

{
    "contents": [
        {
            "parts": [
                {"text": "Your detailed creative prompt here"}
            ]
        }
    ],
    "generationConfig": {
        "responseModalities": ["IMAGE"],
        "imageConfig": {
            "aspectRatio": "16:9",
            "imageSize": "2K"
        }
    }
}

For EDITING existing image:

{
    "contents": [
        {
            "parts": [
                {
                    "inline_data": {
                        "mime_type": "image/jpeg",
                        "data": "base64_string_here"
                    }
                },
                {"text": "Edit instruction prompt"}
            ]
        }
    ],
    "generationConfig": {
        "responseModalities": ["IMAGE"],
        "imageConfig": {
            "aspectRatio": "16:9",
            "imageSize": "2K"
        }
    }
}

⚠️ CRITICAL:

Image MUST come BEFORE text in the parts array
Use inline_data (snake_case) not inlineData
MIME type should be image/jpeg (what Gemini returns)
Base64 must be clean (no whitespace, no data URI prefix)

2. THE RESPONSE FORMAT

⚠️ WATCHOUT #3: The response structure has TWO possible formats!

Success Response (with image):

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "inlineData": {
                            "mimeType": "image/jpeg",
                            "data": "base64_image_data"
                        }
                    }
                ]
            },
            "finishReason": "STOP"
        }
    ]
}

⚠️ NOTICE: Response uses inlineData (camelCase), but request uses inline_data (snake_case)!

Blocked Response (IMAGE_RECITATION):

{
    "candidates": [
        {
            "content": [],
            "finishReason": "IMAGE_RECITATION",
            "finishMessage": "Unable to show the generated image..."
        }
    ]
}

⚠️ CRITICAL: ALWAYS check finishReason BEFORE trying to extract image data!

3. MIME TYPE HANDLING

⚠️ WATCHOUT #4: MIME type mismatches break image display!

// WRONG - hardcoded PNG:
$_SESSION['current_image'] = $base64;
echo '<img src="data:image/png;base64,' . $base64 . '">';

// RIGHT - store and use actual MIME type:
$_SESSION['current_image'] = $base64;
$_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg"
echo '<img src="data:' . $mimeType . ';base64,' . $base64 . '">';

Why this matters:

Gemini returns image/jpeg
If you display as image/png, browser may fail to render
Store BOTH base64 data AND MIME type

4. BASE64 DATA HANDLING

⚠️ WATCHOUT #5: Base64 data must be CLEAN!

// WRONG - includes data URI prefix:
const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..."

// RIGHT - strip the prefix:
const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..."

Validation:

// Clean whitespace
$inputImage = preg_replace('/\s+/', '', $inputImage);

// Validate format
if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) {
    throw new Exception("Invalid base64 format");
}

5. THE EDITING FLOW

⚠️ WATCHOUT #6: Session management is CRITICAL for editing to work!

// Step 1: Generate first image
$response = $api->generateImage("cyberpunk city", "16:9", "2K", null);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];

// Step 2: Edit existing image
$previousImage = $_SESSION['current_image']; // Get from session
$response = $api->generateImage("add rain", "16:9", "2K", $previousImage);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64']; // Update session

Flow:

Store generated image in session
On edit request, retrieve from session
Send as inline_data in request
Store new result back to session
Repeat for each edit

6. ERROR HANDLING - THE TRICKY PART

⚠️ WATCHOUT #7: Multiple error types, each needs specific handling!

// Check finishReason FIRST
if (isset($response['candidates'][0]['finishReason'])) {
    $reason = $response['candidates'][0]['finishReason'];

    if ($reason === 'IMAGE_RECITATION') {
        throw new Exception('Blocked by content filter. Use more creative prompts.');
    }

    if ($reason === 'SAFETY') {
        throw new Exception('Blocked by safety filters.');
    }

    // Only proceed if STOP
    if ($reason !== 'STOP') {
        throw new Exception('Generation failed: ' . $reason);
    }
}

// Then extract image
foreach ($response['candidates'][0]['content']['parts'] as $part) {
    if (isset($part['inlineData']['data'])) {
        return $part['inlineData'];
    }
}

Common Errors:

Error	HTTP Code	Cause	Solution
IMAGE_RECITATION	200	Prompt too generic	Use creative, detailed prompts
Internal error	500	API temporary issue	Retry with exponential backoff
RESOURCE_EXHAUSTED	429	Rate limit	Wait 30s between requests
INVALID_ARGUMENT	400	Bad request format	Check base64 encoding

7. PROMPT ENGINEERING

⚠️ WATCHOUT #8: Simple prompts WILL fail!

// ❌ WILL FAIL (IMAGE_RECITATION):
"a red circle"
"a blue square"
"a tree"
"a car"

// ✅ WILL WORK:
"a vintage red sports car racing through a neon-lit cyberpunk city at night"
"a magical forest with glowing blue mushrooms and fireflies at twilight"
"a futuristic cityscape with flying vehicles and holographic billboards"

Rules:

Minimum 10 words
Include adjectives (vintage, glowing, futuristic)
Add context (at night, in rain, during sunset)
Avoid single objects
Be creative and specific

8. FILE UPLOAD HANDLING

⚠️ WATCHOUT #9: File conversion must be done client-side!

// Convert file to base64 (client-side)
function fileToBase64(file) {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = () => {
            // CRITICAL: Remove data URI prefix!
            const base64 = reader.result.split(',')[1];
            resolve(base64);
        };
        reader.onerror = reject;
        reader.readAsDataURL(file);
    });
}

// Usage
const file = uploadInput.files[0];
const base64 = await fileToBase64(file);
formData.append('uploadedImage', base64);
formData.append('uploadedImageType', file.type);

Backend handling:

if ($uploadedImage) {
    // Store uploaded image
    $_SESSION['current_image'] = $uploadedImage;
    $_SESSION['current_image_mime'] = $uploadedImageType;

    // If prompt provided, apply it
    if ($prompt) {
        $response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage);
        // Update with edited version
    }
}

9. SESSION MANAGEMENT

⚠️ WATCHOUT #10: Session structure is critical!

// Initialize (MUST be done before any output)
session_start();

// Required session variables
$_SESSION['current_image'] = null;           // Base64 string
$_SESSION['current_image_mime'] = 'image/png'; // MIME type
$_SESSION['conversation_history'] = [];      // Array of prompts
$_SESSION['image_history'] = [];            // Array of previous images

// Reset (clear everything)
$_SESSION['conversation_history'] = [];
$_SESSION['current_image'] = null;
$_SESSION['current_image_mime'] = 'image/png';
$_SESSION['image_history'] = [];

10. API CONFIGURATION

⚠️ WATCHOUT #11: Endpoint and model name are specific!

// CORRECT endpoint:
$url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent";

// Header format:
'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer'

// Timeout:
CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW

Model name: gemini-3-pro-image-preview

May change in future
Check Google's docs if errors persist

🐛 DEBUGGING CHECKLIST

When things don't work, check IN THIS ORDER:

1. Is the request format correct?

error_log("Request payload: " . json_encode($payload));

2. Is the response structure what you expect?

error_log("Response structure: " . json_encode($response));

3. Check finishReason:

$reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN';
error_log("Finish reason: " . $reason);

4. Verify base64 data:

error_log("Base64 length: " . strlen($base64));
error_log("First 50 chars: " . substr($base64, 0, 50));

5. Check MIME type matching:

error_log("Stored MIME: " . $_SESSION['current_image_mime']);
error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']);

🎓 COMMON MISTAKES TO AVOID

Mistake #1: Wrong request structure

// ❌ WRONG - text before image:
{"parts": [{"text": "..."}, {"inline_data": {...}}]}

// ✅ RIGHT - image before text:
{"parts": [{"inline_data": {...}}, {"text": "..."}]}

Mistake #2: Not checking finishReason

// ❌ WRONG - directly accessing parts:
$image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data'];

// ✅ RIGHT - check finishReason first:
if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') {
    // Handle blocked content
}

Mistake #3: Hardcoded MIME types

// ❌ WRONG:
echo '<img src="data:image/png;base64,...">'; // Assumes PNG

// ✅ RIGHT:
echo '<img src="data:' . $mimeType . ';base64,...">'; // Uses actual type

Mistake #4: Not cleaning base64

// ❌ WRONG:
const base64 = reader.result; // Includes "data:image/png;base64,"

// ✅ RIGHT:
const base64 = reader.result.split(',')[1]; // Only base64 part

Mistake #5: Missing error handling

// ❌ WRONG:
$response = curl_exec($ch);
return json_decode($response);

// ✅ RIGHT:
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpCode !== 200) {
    // Handle errors
}

📊 DATA FLOW DIAGRAM

┌─────────────────┐
│  User Action    │
│  (Prompt/Upload)│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   JavaScript    │
│ - Validate      │
│ - Convert file  │
│ - Build FormData│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   api.php       │
│ - Get session   │
│ - Build request │
│ - Call Gemini   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Gemini API     │
│ - Process       │
│ - Check filters │
│ - Generate      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Extract Response│
│ - Check finish  │
│ - Get base64    │
│ - Get MIME type │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Store Session  │
│ - current_image │
│ - image_mime    │
│ - history       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Return to JS   │
│ - Success flag  │
│ - Reload page   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Display Image  │
│ - Data URI      │
│ - Correct MIME  │
└─────────────────┘

🔧 TESTING STRATEGY

Test 1: Basic Generation

Prompt: "A futuristic motorcycle in a neon-lit city"
Expected: Image generated successfully

Test 2: Simple Edit

1. Generate: "A red sports car"
2. Edit: "add rain and reflections"
Expected: Car now has rain

Test 3: Upload

1. Upload: photo.jpg
2. No prompt
Expected: Photo stored, ready for editing

Test 4: Upload + Edit

1. Upload: landscape.jpg
2. Prompt: "make it look like a watercolor painting"
Expected: Transformed image

Test 5: Error Handling

Prompt: "a blue square"
Expected: IMAGE_RECITATION error with helpful message

🚨 CRITICAL SUCCESS FACTORS

You MUST get these right or the system will NOT work:

✅ Request format - Image before text, correct structure
✅ Response parsing - Check finishReason first
✅ MIME type handling - Store and use dynamically
✅ Base64 cleaning - No whitespace, no prefixes
✅ Session management - Store both data and MIME type
✅ Error handling - Different errors need different responses
✅ Prompt quality - Detailed, creative prompts only
✅ File upload - Client-side base64 conversion
✅ API timeout - 120 seconds minimum
✅ Retry logic - For temporary API failures

📝 QUICK REFERENCE

Essential Code Patterns

Check finishReason:

$reason = $response['candidates'][0]['finishReason'] ?? null;
if ($reason !== 'STOP') {
    // Handle error
}

Extract image:

foreach ($response['candidates'][0]['content']['parts'] as $part) {
    if (isset($part['inlineData']['data'])) {
        return [
            'base64' => $part['inlineData']['data'],
            'mime_type' => $part['inlineData']['mimeType']
        ];
    }
}

Store in session:

$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];

Display image:

<img src="data:<?php echo $_SESSION['current_image_mime']; ?>;base64,<?php echo $_SESSION['current_image']; ?>">

🎯 IMPLEMENTATION CHECKLIST

Before considering the implementation complete:

Image generation works with detailed prompts
Image editing works (sends previous image)
IMAGE_RECITATION errors handled gracefully
MIME type stored and used correctly
File upload converts to base64 properly
Session persists across requests
Error messages are helpful
Debug panel shows request/response
Simple prompts show helpful error
Retry logic works for 500 errors
Rate limiting handled
Base64 data validated
Conversation history tracked
Reset clears session properly

💡 TIPS FOR AI ASSISTANTS

When helping users implement this:

Show the request JSON first - Most problems are here
Emphasize finishReason checking - Critical for error handling
Explain MIME type importance - Common source of display issues
Warn about simple prompts - Will trigger IMAGE_RECITATION
Test with detailed prompts - "red circle" will fail
Check session management - Editing requires proper storage
Validate base64 format - Clean data is essential
Add debug logging - Makes troubleshooting easier
Handle all error types - Different errors need different solutions
Test the full flow - Generate → Edit → Edit

📚 REFERENCES

API Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent
Model: gemini-3-pro-image-preview
Auth Header: x-goog-api-key: YOUR_KEY
Response Format: JSON with base64 in inlineData
Request Format: JSON with inline_data for editing

⚡ FINAL NOTES

This implementation is working and stable when these rules are followed:

Use creative, detailed prompts (10+ words)
Check finishReason before extracting image
Store and use correct MIME types
Clean base64 data (no whitespace/prefixes)
Manage session properly for editing
Handle all error types specifically
Implement retry logic for temporary failures
Validate uploaded files before processing

The system works reliably when these patterns are followed exactly.

Generated from working implementation - December 2024

17 KiB Raw Permalink Blame History