nano-pro/AI_IMPLEMENTATION_GUIDE.md
DJP 4deed84ba0 Initial commit: Nano AI Image Generator
- Complete working image generation app using Imagen 3
- PHP backend with Gemini API integration
- Dark themed UI with prompt enhancement
- Session management and logging system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2025-12-16 08:35:02 -05:00

17 KiB

Nano Banana Pro - AI Implementation Guide

How to Build an Iterative Image Generation & Editing System with Google Gemini


🎯 CRITICAL CONCEPT: This is NOT a standard image generation API

⚠️ WATCHOUT #1: Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney.

Key Differences:

  1. Uses generateContent endpoint (not a dedicated image API)
  2. Images returned as base64 in JSON (embedded in response)
  3. Editing = Sending previous image back (as base64 in request)
  4. Very aggressive content filters (IMAGE_RECITATION errors)
  5. No direct image URLs (everything is base64)

📐 SYSTEM ARCHITECTURE

User Input (Prompt/Upload)
    ↓
JavaScript Frontend (converts file to base64)
    ↓
PHP Backend API (api.php)
    ↓
Session Storage (stores base64 + MIME type)
    ↓
Google Gemini API (processes with previous image if editing)
    ↓
Extract base64 from response
    ↓
Store in session
    ↓
Display in browser (data URI)

🔑 CRITICAL IMPLEMENTATION DETAILS

1. THE REQUEST FORMAT (MOST IMPORTANT!)

⚠️ WATCHOUT #2: The request structure is VERY specific. Get this wrong and you get 500 errors.

For NEW image generation:

{
    "contents": [
        {
            "parts": [
                {"text": "Your detailed creative prompt here"}
            ]
        }
    ],
    "generationConfig": {
        "responseModalities": ["IMAGE"],
        "imageConfig": {
            "aspectRatio": "16:9",
            "imageSize": "2K"
        }
    }
}

For EDITING existing image:

{
    "contents": [
        {
            "parts": [
                {
                    "inline_data": {
                        "mime_type": "image/jpeg",
                        "data": "base64_string_here"
                    }
                },
                {"text": "Edit instruction prompt"}
            ]
        }
    ],
    "generationConfig": {
        "responseModalities": ["IMAGE"],
        "imageConfig": {
            "aspectRatio": "16:9",
            "imageSize": "2K"
        }
    }
}

⚠️ CRITICAL:

  • Image MUST come BEFORE text in the parts array
  • Use inline_data (snake_case) not inlineData
  • MIME type should be image/jpeg (what Gemini returns)
  • Base64 must be clean (no whitespace, no data URI prefix)

2. THE RESPONSE FORMAT

⚠️ WATCHOUT #3: The response structure has TWO possible formats!

Success Response (with image):

{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "inlineData": {
                            "mimeType": "image/jpeg",
                            "data": "base64_image_data"
                        }
                    }
                ]
            },
            "finishReason": "STOP"
        }
    ]
}

⚠️ NOTICE: Response uses inlineData (camelCase), but request uses inline_data (snake_case)!

Blocked Response (IMAGE_RECITATION):

{
    "candidates": [
        {
            "content": [],
            "finishReason": "IMAGE_RECITATION",
            "finishMessage": "Unable to show the generated image..."
        }
    ]
}

⚠️ CRITICAL: ALWAYS check finishReason BEFORE trying to extract image data!


3. MIME TYPE HANDLING

⚠️ WATCHOUT #4: MIME type mismatches break image display!

// WRONG - hardcoded PNG:
$_SESSION['current_image'] = $base64;
echo '<img src="data:image/png;base64,' . $base64 . '">';

// RIGHT - store and use actual MIME type:
$_SESSION['current_image'] = $base64;
$_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg"
echo '<img src="data:' . $mimeType . ';base64,' . $base64 . '">';

Why this matters:

  • Gemini returns image/jpeg
  • If you display as image/png, browser may fail to render
  • Store BOTH base64 data AND MIME type

4. BASE64 DATA HANDLING

⚠️ WATCHOUT #5: Base64 data must be CLEAN!

// WRONG - includes data URI prefix:
const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..."

// RIGHT - strip the prefix:
const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..."

Validation:

// Clean whitespace
$inputImage = preg_replace('/\s+/', '', $inputImage);

// Validate format
if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) {
    throw new Exception("Invalid base64 format");
}

5. THE EDITING FLOW

⚠️ WATCHOUT #6: Session management is CRITICAL for editing to work!

// Step 1: Generate first image
$response = $api->generateImage("cyberpunk city", "16:9", "2K", null);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];

// Step 2: Edit existing image
$previousImage = $_SESSION['current_image']; // Get from session
$response = $api->generateImage("add rain", "16:9", "2K", $previousImage);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64']; // Update session

Flow:

  1. Store generated image in session
  2. On edit request, retrieve from session
  3. Send as inline_data in request
  4. Store new result back to session
  5. Repeat for each edit

6. ERROR HANDLING - THE TRICKY PART

⚠️ WATCHOUT #7: Multiple error types, each needs specific handling!

// Check finishReason FIRST
if (isset($response['candidates'][0]['finishReason'])) {
    $reason = $response['candidates'][0]['finishReason'];

    if ($reason === 'IMAGE_RECITATION') {
        throw new Exception('Blocked by content filter. Use more creative prompts.');
    }

    if ($reason === 'SAFETY') {
        throw new Exception('Blocked by safety filters.');
    }

    // Only proceed if STOP
    if ($reason !== 'STOP') {
        throw new Exception('Generation failed: ' . $reason);
    }
}

// Then extract image
foreach ($response['candidates'][0]['content']['parts'] as $part) {
    if (isset($part['inlineData']['data'])) {
        return $part['inlineData'];
    }
}

Common Errors:

Error HTTP Code Cause Solution
IMAGE_RECITATION 200 Prompt too generic Use creative, detailed prompts
Internal error 500 API temporary issue Retry with exponential backoff
RESOURCE_EXHAUSTED 429 Rate limit Wait 30s between requests
INVALID_ARGUMENT 400 Bad request format Check base64 encoding

7. PROMPT ENGINEERING

⚠️ WATCHOUT #8: Simple prompts WILL fail!

// ❌ WILL FAIL (IMAGE_RECITATION):
"a red circle"
"a blue square"
"a tree"
"a car"

// ✅ WILL WORK:
"a vintage red sports car racing through a neon-lit cyberpunk city at night"
"a magical forest with glowing blue mushrooms and fireflies at twilight"
"a futuristic cityscape with flying vehicles and holographic billboards"

Rules:

  • Minimum 10 words
  • Include adjectives (vintage, glowing, futuristic)
  • Add context (at night, in rain, during sunset)
  • Avoid single objects
  • Be creative and specific

8. FILE UPLOAD HANDLING

⚠️ WATCHOUT #9: File conversion must be done client-side!

// Convert file to base64 (client-side)
function fileToBase64(file) {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = () => {
            // CRITICAL: Remove data URI prefix!
            const base64 = reader.result.split(',')[1];
            resolve(base64);
        };
        reader.onerror = reject;
        reader.readAsDataURL(file);
    });
}

// Usage
const file = uploadInput.files[0];
const base64 = await fileToBase64(file);
formData.append('uploadedImage', base64);
formData.append('uploadedImageType', file.type);

Backend handling:

if ($uploadedImage) {
    // Store uploaded image
    $_SESSION['current_image'] = $uploadedImage;
    $_SESSION['current_image_mime'] = $uploadedImageType;

    // If prompt provided, apply it
    if ($prompt) {
        $response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage);
        // Update with edited version
    }
}

9. SESSION MANAGEMENT

⚠️ WATCHOUT #10: Session structure is critical!

// Initialize (MUST be done before any output)
session_start();

// Required session variables
$_SESSION['current_image'] = null;           // Base64 string
$_SESSION['current_image_mime'] = 'image/png'; // MIME type
$_SESSION['conversation_history'] = [];      // Array of prompts
$_SESSION['image_history'] = [];            // Array of previous images

// Reset (clear everything)
$_SESSION['conversation_history'] = [];
$_SESSION['current_image'] = null;
$_SESSION['current_image_mime'] = 'image/png';
$_SESSION['image_history'] = [];

10. API CONFIGURATION

⚠️ WATCHOUT #11: Endpoint and model name are specific!

// CORRECT endpoint:
$url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent";

// Header format:
'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer'

// Timeout:
CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW

Model name: gemini-3-pro-image-preview

  • May change in future
  • Check Google's docs if errors persist

🐛 DEBUGGING CHECKLIST

When things don't work, check IN THIS ORDER:

1. Is the request format correct?

error_log("Request payload: " . json_encode($payload));

2. Is the response structure what you expect?

error_log("Response structure: " . json_encode($response));

3. Check finishReason:

$reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN';
error_log("Finish reason: " . $reason);

4. Verify base64 data:

error_log("Base64 length: " . strlen($base64));
error_log("First 50 chars: " . substr($base64, 0, 50));

5. Check MIME type matching:

error_log("Stored MIME: " . $_SESSION['current_image_mime']);
error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']);

🎓 COMMON MISTAKES TO AVOID

Mistake #1: Wrong request structure

// ❌ WRONG - text before image:
{"parts": [{"text": "..."}, {"inline_data": {...}}]}

// ✅ RIGHT - image before text:
{"parts": [{"inline_data": {...}}, {"text": "..."}]}

Mistake #2: Not checking finishReason

// ❌ WRONG - directly accessing parts:
$image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data'];

// ✅ RIGHT - check finishReason first:
if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') {
    // Handle blocked content
}

Mistake #3: Hardcoded MIME types

// ❌ WRONG:
echo '<img src="data:image/png;base64,...">'; // Assumes PNG

// ✅ RIGHT:
echo '<img src="data:' . $mimeType . ';base64,...">'; // Uses actual type

Mistake #4: Not cleaning base64

// ❌ WRONG:
const base64 = reader.result; // Includes "data:image/png;base64,"

// ✅ RIGHT:
const base64 = reader.result.split(',')[1]; // Only base64 part

Mistake #5: Missing error handling

// ❌ WRONG:
$response = curl_exec($ch);
return json_decode($response);

// ✅ RIGHT:
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpCode !== 200) {
    // Handle errors
}

📊 DATA FLOW DIAGRAM

┌─────────────────┐
│  User Action    │
│  (Prompt/Upload)│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   JavaScript    │
│ - Validate      │
│ - Convert file  │
│ - Build FormData│
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   api.php       │
│ - Get session   │
│ - Build request │
│ - Call Gemini   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Gemini API     │
│ - Process       │
│ - Check filters │
│ - Generate      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Extract Response│
│ - Check finish  │
│ - Get base64    │
│ - Get MIME type │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Store Session  │
│ - current_image │
│ - image_mime    │
│ - history       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Return to JS   │
│ - Success flag  │
│ - Reload page   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Display Image  │
│ - Data URI      │
│ - Correct MIME  │
└─────────────────┘

🔧 TESTING STRATEGY

Test 1: Basic Generation

Prompt: "A futuristic motorcycle in a neon-lit city"
Expected: Image generated successfully

Test 2: Simple Edit

1. Generate: "A red sports car"
2. Edit: "add rain and reflections"
Expected: Car now has rain

Test 3: Upload

1. Upload: photo.jpg
2. No prompt
Expected: Photo stored, ready for editing

Test 4: Upload + Edit

1. Upload: landscape.jpg
2. Prompt: "make it look like a watercolor painting"
Expected: Transformed image

Test 5: Error Handling

Prompt: "a blue square"
Expected: IMAGE_RECITATION error with helpful message

🚨 CRITICAL SUCCESS FACTORS

You MUST get these right or the system will NOT work:

  1. Request format - Image before text, correct structure
  2. Response parsing - Check finishReason first
  3. MIME type handling - Store and use dynamically
  4. Base64 cleaning - No whitespace, no prefixes
  5. Session management - Store both data and MIME type
  6. Error handling - Different errors need different responses
  7. Prompt quality - Detailed, creative prompts only
  8. File upload - Client-side base64 conversion
  9. API timeout - 120 seconds minimum
  10. Retry logic - For temporary API failures

📝 QUICK REFERENCE

Essential Code Patterns

Check finishReason:

$reason = $response['candidates'][0]['finishReason'] ?? null;
if ($reason !== 'STOP') {
    // Handle error
}

Extract image:

foreach ($response['candidates'][0]['content']['parts'] as $part) {
    if (isset($part['inlineData']['data'])) {
        return [
            'base64' => $part['inlineData']['data'],
            'mime_type' => $part['inlineData']['mimeType']
        ];
    }
}

Store in session:

$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];

Display image:

<img src="data:<?php echo $_SESSION['current_image_mime']; ?>;base64,<?php echo $_SESSION['current_image']; ?>">

🎯 IMPLEMENTATION CHECKLIST

Before considering the implementation complete:

  • Image generation works with detailed prompts
  • Image editing works (sends previous image)
  • IMAGE_RECITATION errors handled gracefully
  • MIME type stored and used correctly
  • File upload converts to base64 properly
  • Session persists across requests
  • Error messages are helpful
  • Debug panel shows request/response
  • Simple prompts show helpful error
  • Retry logic works for 500 errors
  • Rate limiting handled
  • Base64 data validated
  • Conversation history tracked
  • Reset clears session properly

💡 TIPS FOR AI ASSISTANTS

When helping users implement this:

  1. Show the request JSON first - Most problems are here
  2. Emphasize finishReason checking - Critical for error handling
  3. Explain MIME type importance - Common source of display issues
  4. Warn about simple prompts - Will trigger IMAGE_RECITATION
  5. Test with detailed prompts - "red circle" will fail
  6. Check session management - Editing requires proper storage
  7. Validate base64 format - Clean data is essential
  8. Add debug logging - Makes troubleshooting easier
  9. Handle all error types - Different errors need different solutions
  10. Test the full flow - Generate → Edit → Edit

📚 REFERENCES

  • API Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent
  • Model: gemini-3-pro-image-preview
  • Auth Header: x-goog-api-key: YOUR_KEY
  • Response Format: JSON with base64 in inlineData
  • Request Format: JSON with inline_data for editing

FINAL NOTES

This implementation is working and stable when these rules are followed:

  1. Use creative, detailed prompts (10+ words)
  2. Check finishReason before extracting image
  3. Store and use correct MIME types
  4. Clean base64 data (no whitespace/prefixes)
  5. Manage session properly for editing
  6. Handle all error types specifically
  7. Implement retry logic for temporary failures
  8. Validate uploaded files before processing

The system works reliably when these patterns are followed exactly.


Generated from working implementation - December 2024