# Nano Banana Pro - AI Implementation Guide ## How to Build an Iterative Image Generation & Editing System with Google Gemini --- ## 🎯 CRITICAL CONCEPT: This is NOT a standard image generation API **⚠️ WATCHOUT #1:** Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney. ### Key Differences: 1. **Uses `generateContent` endpoint** (not a dedicated image API) 2. **Images returned as base64 in JSON** (embedded in response) 3. **Editing = Sending previous image back** (as base64 in request) 4. **Very aggressive content filters** (IMAGE_RECITATION errors) 5. **No direct image URLs** (everything is base64) --- ## πŸ“ SYSTEM ARCHITECTURE ``` User Input (Prompt/Upload) ↓ JavaScript Frontend (converts file to base64) ↓ PHP Backend API (api.php) ↓ Session Storage (stores base64 + MIME type) ↓ Google Gemini API (processes with previous image if editing) ↓ Extract base64 from response ↓ Store in session ↓ Display in browser (data URI) ``` --- ## πŸ”‘ CRITICAL IMPLEMENTATION DETAILS ### 1. THE REQUEST FORMAT (MOST IMPORTANT!) **⚠️ WATCHOUT #2:** The request structure is VERY specific. Get this wrong and you get 500 errors. #### For NEW image generation: ```json { "contents": [ { "parts": [ {"text": "Your detailed creative prompt here"} ] } ], "generationConfig": { "responseModalities": ["IMAGE"], "imageConfig": { "aspectRatio": "16:9", "imageSize": "2K" } } } ``` #### For EDITING existing image: ```json { "contents": [ { "parts": [ { "inline_data": { "mime_type": "image/jpeg", "data": "base64_string_here" } }, {"text": "Edit instruction prompt"} ] } ], "generationConfig": { "responseModalities": ["IMAGE"], "imageConfig": { "aspectRatio": "16:9", "imageSize": "2K" } } } ``` **⚠️ CRITICAL:** - Image MUST come BEFORE text in the parts array - Use `inline_data` (snake_case) not `inlineData` - MIME type should be `image/jpeg` (what Gemini returns) - Base64 must be clean (no whitespace, no data URI prefix) --- ### 2. THE RESPONSE FORMAT **⚠️ WATCHOUT #3:** The response structure has TWO possible formats! #### Success Response (with image): ```json { "candidates": [ { "content": { "parts": [ { "inlineData": { "mimeType": "image/jpeg", "data": "base64_image_data" } } ] }, "finishReason": "STOP" } ] } ``` **⚠️ NOTICE:** Response uses `inlineData` (camelCase), but request uses `inline_data` (snake_case)! #### Blocked Response (IMAGE_RECITATION): ```json { "candidates": [ { "content": [], "finishReason": "IMAGE_RECITATION", "finishMessage": "Unable to show the generated image..." } ] } ``` **⚠️ CRITICAL:** ALWAYS check `finishReason` BEFORE trying to extract image data! --- ### 3. MIME TYPE HANDLING **⚠️ WATCHOUT #4:** MIME type mismatches break image display! ```php // WRONG - hardcoded PNG: $_SESSION['current_image'] = $base64; echo ''; // RIGHT - store and use actual MIME type: $_SESSION['current_image'] = $base64; $_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg" echo ''; ``` **Why this matters:** - Gemini returns `image/jpeg` - If you display as `image/png`, browser may fail to render - Store BOTH base64 data AND MIME type --- ### 4. BASE64 DATA HANDLING **⚠️ WATCHOUT #5:** Base64 data must be CLEAN! ```javascript // WRONG - includes data URI prefix: const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..." // RIGHT - strip the prefix: const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..." ``` **Validation:** ```php // Clean whitespace $inputImage = preg_replace('/\s+/', '', $inputImage); // Validate format if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) { throw new Exception("Invalid base64 format"); } ``` --- ### 5. THE EDITING FLOW **⚠️ WATCHOUT #6:** Session management is CRITICAL for editing to work! ```php // Step 1: Generate first image $response = $api->generateImage("cyberpunk city", "16:9", "2K", null); $imageData = extractImageData($response); $_SESSION['current_image'] = $imageData['base64']; $_SESSION['current_image_mime'] = $imageData['mime_type']; // Step 2: Edit existing image $previousImage = $_SESSION['current_image']; // Get from session $response = $api->generateImage("add rain", "16:9", "2K", $previousImage); $imageData = extractImageData($response); $_SESSION['current_image'] = $imageData['base64']; // Update session ``` **Flow:** 1. Store generated image in session 2. On edit request, retrieve from session 3. Send as `inline_data` in request 4. Store new result back to session 5. Repeat for each edit --- ### 6. ERROR HANDLING - THE TRICKY PART **⚠️ WATCHOUT #7:** Multiple error types, each needs specific handling! ```php // Check finishReason FIRST if (isset($response['candidates'][0]['finishReason'])) { $reason = $response['candidates'][0]['finishReason']; if ($reason === 'IMAGE_RECITATION') { throw new Exception('Blocked by content filter. Use more creative prompts.'); } if ($reason === 'SAFETY') { throw new Exception('Blocked by safety filters.'); } // Only proceed if STOP if ($reason !== 'STOP') { throw new Exception('Generation failed: ' . $reason); } } // Then extract image foreach ($response['candidates'][0]['content']['parts'] as $part) { if (isset($part['inlineData']['data'])) { return $part['inlineData']; } } ``` **Common Errors:** | Error | HTTP Code | Cause | Solution | |-------|-----------|-------|----------| | IMAGE_RECITATION | 200 | Prompt too generic | Use creative, detailed prompts | | Internal error | 500 | API temporary issue | Retry with exponential backoff | | RESOURCE_EXHAUSTED | 429 | Rate limit | Wait 30s between requests | | INVALID_ARGUMENT | 400 | Bad request format | Check base64 encoding | --- ### 7. PROMPT ENGINEERING **⚠️ WATCHOUT #8:** Simple prompts WILL fail! ```javascript // ❌ WILL FAIL (IMAGE_RECITATION): "a red circle" "a blue square" "a tree" "a car" // βœ… WILL WORK: "a vintage red sports car racing through a neon-lit cyberpunk city at night" "a magical forest with glowing blue mushrooms and fireflies at twilight" "a futuristic cityscape with flying vehicles and holographic billboards" ``` **Rules:** - Minimum 10 words - Include adjectives (vintage, glowing, futuristic) - Add context (at night, in rain, during sunset) - Avoid single objects - Be creative and specific --- ### 8. FILE UPLOAD HANDLING **⚠️ WATCHOUT #9:** File conversion must be done client-side! ```javascript // Convert file to base64 (client-side) function fileToBase64(file) { return new Promise((resolve, reject) => { const reader = new FileReader(); reader.onload = () => { // CRITICAL: Remove data URI prefix! const base64 = reader.result.split(',')[1]; resolve(base64); }; reader.onerror = reject; reader.readAsDataURL(file); }); } // Usage const file = uploadInput.files[0]; const base64 = await fileToBase64(file); formData.append('uploadedImage', base64); formData.append('uploadedImageType', file.type); ``` **Backend handling:** ```php if ($uploadedImage) { // Store uploaded image $_SESSION['current_image'] = $uploadedImage; $_SESSION['current_image_mime'] = $uploadedImageType; // If prompt provided, apply it if ($prompt) { $response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage); // Update with edited version } } ``` --- ### 9. SESSION MANAGEMENT **⚠️ WATCHOUT #10:** Session structure is critical! ```php // Initialize (MUST be done before any output) session_start(); // Required session variables $_SESSION['current_image'] = null; // Base64 string $_SESSION['current_image_mime'] = 'image/png'; // MIME type $_SESSION['conversation_history'] = []; // Array of prompts $_SESSION['image_history'] = []; // Array of previous images // Reset (clear everything) $_SESSION['conversation_history'] = []; $_SESSION['current_image'] = null; $_SESSION['current_image_mime'] = 'image/png'; $_SESSION['image_history'] = []; ``` --- ### 10. API CONFIGURATION **⚠️ WATCHOUT #11:** Endpoint and model name are specific! ```php // CORRECT endpoint: $url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent"; // Header format: 'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer' // Timeout: CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW ``` **Model name:** `gemini-3-pro-image-preview` - May change in future - Check Google's docs if errors persist --- ## πŸ› DEBUGGING CHECKLIST When things don't work, check IN THIS ORDER: ### 1. Is the request format correct? ```php error_log("Request payload: " . json_encode($payload)); ``` ### 2. Is the response structure what you expect? ```php error_log("Response structure: " . json_encode($response)); ``` ### 3. Check finishReason: ```php $reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN'; error_log("Finish reason: " . $reason); ``` ### 4. Verify base64 data: ```php error_log("Base64 length: " . strlen($base64)); error_log("First 50 chars: " . substr($base64, 0, 50)); ``` ### 5. Check MIME type matching: ```php error_log("Stored MIME: " . $_SESSION['current_image_mime']); error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']); ``` --- ## πŸŽ“ COMMON MISTAKES TO AVOID ### Mistake #1: Wrong request structure ```json // ❌ WRONG - text before image: {"parts": [{"text": "..."}, {"inline_data": {...}}]} // βœ… RIGHT - image before text: {"parts": [{"inline_data": {...}}, {"text": "..."}]} ``` ### Mistake #2: Not checking finishReason ```php // ❌ WRONG - directly accessing parts: $image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data']; // βœ… RIGHT - check finishReason first: if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') { // Handle blocked content } ``` ### Mistake #3: Hardcoded MIME types ```php // ❌ WRONG: echo ''; // Assumes PNG // βœ… RIGHT: echo ''; // Uses actual type ``` ### Mistake #4: Not cleaning base64 ```javascript // ❌ WRONG: const base64 = reader.result; // Includes "data:image/png;base64," // βœ… RIGHT: const base64 = reader.result.split(',')[1]; // Only base64 part ``` ### Mistake #5: Missing error handling ```php // ❌ WRONG: $response = curl_exec($ch); return json_decode($response); // βœ… RIGHT: $response = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); if ($httpCode !== 200) { // Handle errors } ``` --- ## πŸ“Š DATA FLOW DIAGRAM ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ User Action β”‚ β”‚ (Prompt/Upload)β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ JavaScript β”‚ β”‚ - Validate β”‚ β”‚ - Convert file β”‚ β”‚ - Build FormDataβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ api.php β”‚ β”‚ - Get session β”‚ β”‚ - Build request β”‚ β”‚ - Call Gemini β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Gemini API β”‚ β”‚ - Process β”‚ β”‚ - Check filters β”‚ β”‚ - Generate β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Extract Responseβ”‚ β”‚ - Check finish β”‚ β”‚ - Get base64 β”‚ β”‚ - Get MIME type β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Store Session β”‚ β”‚ - current_image β”‚ β”‚ - image_mime β”‚ β”‚ - history β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Return to JS β”‚ β”‚ - Success flag β”‚ β”‚ - Reload page β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Display Image β”‚ β”‚ - Data URI β”‚ β”‚ - Correct MIME β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ”§ TESTING STRATEGY ### Test 1: Basic Generation ``` Prompt: "A futuristic motorcycle in a neon-lit city" Expected: Image generated successfully ``` ### Test 2: Simple Edit ``` 1. Generate: "A red sports car" 2. Edit: "add rain and reflections" Expected: Car now has rain ``` ### Test 3: Upload ``` 1. Upload: photo.jpg 2. No prompt Expected: Photo stored, ready for editing ``` ### Test 4: Upload + Edit ``` 1. Upload: landscape.jpg 2. Prompt: "make it look like a watercolor painting" Expected: Transformed image ``` ### Test 5: Error Handling ``` Prompt: "a blue square" Expected: IMAGE_RECITATION error with helpful message ``` --- ## 🚨 CRITICAL SUCCESS FACTORS **You MUST get these right or the system will NOT work:** 1. βœ… **Request format** - Image before text, correct structure 2. βœ… **Response parsing** - Check finishReason first 3. βœ… **MIME type handling** - Store and use dynamically 4. βœ… **Base64 cleaning** - No whitespace, no prefixes 5. βœ… **Session management** - Store both data and MIME type 6. βœ… **Error handling** - Different errors need different responses 7. βœ… **Prompt quality** - Detailed, creative prompts only 8. βœ… **File upload** - Client-side base64 conversion 9. βœ… **API timeout** - 120 seconds minimum 10. βœ… **Retry logic** - For temporary API failures --- ## πŸ“ QUICK REFERENCE ### Essential Code Patterns **Check finishReason:** ```php $reason = $response['candidates'][0]['finishReason'] ?? null; if ($reason !== 'STOP') { // Handle error } ``` **Extract image:** ```php foreach ($response['candidates'][0]['content']['parts'] as $part) { if (isset($part['inlineData']['data'])) { return [ 'base64' => $part['inlineData']['data'], 'mime_type' => $part['inlineData']['mimeType'] ]; } } ``` **Store in session:** ```php $_SESSION['current_image'] = $imageData['base64']; $_SESSION['current_image_mime'] = $imageData['mime_type']; ``` **Display image:** ```php ``` --- ## 🎯 IMPLEMENTATION CHECKLIST Before considering the implementation complete: - [ ] Image generation works with detailed prompts - [ ] Image editing works (sends previous image) - [ ] IMAGE_RECITATION errors handled gracefully - [ ] MIME type stored and used correctly - [ ] File upload converts to base64 properly - [ ] Session persists across requests - [ ] Error messages are helpful - [ ] Debug panel shows request/response - [ ] Simple prompts show helpful error - [ ] Retry logic works for 500 errors - [ ] Rate limiting handled - [ ] Base64 data validated - [ ] Conversation history tracked - [ ] Reset clears session properly --- ## πŸ’‘ TIPS FOR AI ASSISTANTS When helping users implement this: 1. **Show the request JSON first** - Most problems are here 2. **Emphasize finishReason checking** - Critical for error handling 3. **Explain MIME type importance** - Common source of display issues 4. **Warn about simple prompts** - Will trigger IMAGE_RECITATION 5. **Test with detailed prompts** - "red circle" will fail 6. **Check session management** - Editing requires proper storage 7. **Validate base64 format** - Clean data is essential 8. **Add debug logging** - Makes troubleshooting easier 9. **Handle all error types** - Different errors need different solutions 10. **Test the full flow** - Generate β†’ Edit β†’ Edit --- ## πŸ“š REFERENCES - **API Endpoint:** `https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent` - **Model:** `gemini-3-pro-image-preview` - **Auth Header:** `x-goog-api-key: YOUR_KEY` - **Response Format:** JSON with base64 in `inlineData` - **Request Format:** JSON with `inline_data` for editing --- ## ⚑ FINAL NOTES This implementation is **working and stable** when these rules are followed: 1. Use creative, detailed prompts (10+ words) 2. Check `finishReason` before extracting image 3. Store and use correct MIME types 4. Clean base64 data (no whitespace/prefixes) 5. Manage session properly for editing 6. Handle all error types specifically 7. Implement retry logic for temporary failures 8. Validate uploaded files before processing **The system works reliably when these patterns are followed exactly.** --- *Generated from working implementation - December 2024*