- Complete working image generation app using Imagen 3 - PHP backend with Gemini API integration - Dark themed UI with prompt enhancement - Session management and logging system 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
17 KiB
Nano Banana Pro - AI Implementation Guide
How to Build an Iterative Image Generation & Editing System with Google Gemini
🎯 CRITICAL CONCEPT: This is NOT a standard image generation API
⚠️ WATCHOUT #1: Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney.
Key Differences:
- Uses
generateContentendpoint (not a dedicated image API) - Images returned as base64 in JSON (embedded in response)
- Editing = Sending previous image back (as base64 in request)
- Very aggressive content filters (IMAGE_RECITATION errors)
- No direct image URLs (everything is base64)
📐 SYSTEM ARCHITECTURE
User Input (Prompt/Upload)
↓
JavaScript Frontend (converts file to base64)
↓
PHP Backend API (api.php)
↓
Session Storage (stores base64 + MIME type)
↓
Google Gemini API (processes with previous image if editing)
↓
Extract base64 from response
↓
Store in session
↓
Display in browser (data URI)
🔑 CRITICAL IMPLEMENTATION DETAILS
1. THE REQUEST FORMAT (MOST IMPORTANT!)
⚠️ WATCHOUT #2: The request structure is VERY specific. Get this wrong and you get 500 errors.
For NEW image generation:
{
"contents": [
{
"parts": [
{"text": "Your detailed creative prompt here"}
]
}
],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}
For EDITING existing image:
{
"contents": [
{
"parts": [
{
"inline_data": {
"mime_type": "image/jpeg",
"data": "base64_string_here"
}
},
{"text": "Edit instruction prompt"}
]
}
],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}
⚠️ CRITICAL:
- Image MUST come BEFORE text in the parts array
- Use
inline_data(snake_case) notinlineData - MIME type should be
image/jpeg(what Gemini returns) - Base64 must be clean (no whitespace, no data URI prefix)
2. THE RESPONSE FORMAT
⚠️ WATCHOUT #3: The response structure has TWO possible formats!
Success Response (with image):
{
"candidates": [
{
"content": {
"parts": [
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "base64_image_data"
}
}
]
},
"finishReason": "STOP"
}
]
}
⚠️ NOTICE: Response uses inlineData (camelCase), but request uses inline_data (snake_case)!
Blocked Response (IMAGE_RECITATION):
{
"candidates": [
{
"content": [],
"finishReason": "IMAGE_RECITATION",
"finishMessage": "Unable to show the generated image..."
}
]
}
⚠️ CRITICAL: ALWAYS check finishReason BEFORE trying to extract image data!
3. MIME TYPE HANDLING
⚠️ WATCHOUT #4: MIME type mismatches break image display!
// WRONG - hardcoded PNG:
$_SESSION['current_image'] = $base64;
echo '<img src="data:image/png;base64,' . $base64 . '">';
// RIGHT - store and use actual MIME type:
$_SESSION['current_image'] = $base64;
$_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg"
echo '<img src="data:' . $mimeType . ';base64,' . $base64 . '">';
Why this matters:
- Gemini returns
image/jpeg - If you display as
image/png, browser may fail to render - Store BOTH base64 data AND MIME type
4. BASE64 DATA HANDLING
⚠️ WATCHOUT #5: Base64 data must be CLEAN!
// WRONG - includes data URI prefix:
const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..."
// RIGHT - strip the prefix:
const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..."
Validation:
// Clean whitespace
$inputImage = preg_replace('/\s+/', '', $inputImage);
// Validate format
if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) {
throw new Exception("Invalid base64 format");
}
5. THE EDITING FLOW
⚠️ WATCHOUT #6: Session management is CRITICAL for editing to work!
// Step 1: Generate first image
$response = $api->generateImage("cyberpunk city", "16:9", "2K", null);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];
// Step 2: Edit existing image
$previousImage = $_SESSION['current_image']; // Get from session
$response = $api->generateImage("add rain", "16:9", "2K", $previousImage);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64']; // Update session
Flow:
- Store generated image in session
- On edit request, retrieve from session
- Send as
inline_datain request - Store new result back to session
- Repeat for each edit
6. ERROR HANDLING - THE TRICKY PART
⚠️ WATCHOUT #7: Multiple error types, each needs specific handling!
// Check finishReason FIRST
if (isset($response['candidates'][0]['finishReason'])) {
$reason = $response['candidates'][0]['finishReason'];
if ($reason === 'IMAGE_RECITATION') {
throw new Exception('Blocked by content filter. Use more creative prompts.');
}
if ($reason === 'SAFETY') {
throw new Exception('Blocked by safety filters.');
}
// Only proceed if STOP
if ($reason !== 'STOP') {
throw new Exception('Generation failed: ' . $reason);
}
}
// Then extract image
foreach ($response['candidates'][0]['content']['parts'] as $part) {
if (isset($part['inlineData']['data'])) {
return $part['inlineData'];
}
}
Common Errors:
| Error | HTTP Code | Cause | Solution |
|---|---|---|---|
| IMAGE_RECITATION | 200 | Prompt too generic | Use creative, detailed prompts |
| Internal error | 500 | API temporary issue | Retry with exponential backoff |
| RESOURCE_EXHAUSTED | 429 | Rate limit | Wait 30s between requests |
| INVALID_ARGUMENT | 400 | Bad request format | Check base64 encoding |
7. PROMPT ENGINEERING
⚠️ WATCHOUT #8: Simple prompts WILL fail!
// ❌ WILL FAIL (IMAGE_RECITATION):
"a red circle"
"a blue square"
"a tree"
"a car"
// ✅ WILL WORK:
"a vintage red sports car racing through a neon-lit cyberpunk city at night"
"a magical forest with glowing blue mushrooms and fireflies at twilight"
"a futuristic cityscape with flying vehicles and holographic billboards"
Rules:
- Minimum 10 words
- Include adjectives (vintage, glowing, futuristic)
- Add context (at night, in rain, during sunset)
- Avoid single objects
- Be creative and specific
8. FILE UPLOAD HANDLING
⚠️ WATCHOUT #9: File conversion must be done client-side!
// Convert file to base64 (client-side)
function fileToBase64(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
// CRITICAL: Remove data URI prefix!
const base64 = reader.result.split(',')[1];
resolve(base64);
};
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
// Usage
const file = uploadInput.files[0];
const base64 = await fileToBase64(file);
formData.append('uploadedImage', base64);
formData.append('uploadedImageType', file.type);
Backend handling:
if ($uploadedImage) {
// Store uploaded image
$_SESSION['current_image'] = $uploadedImage;
$_SESSION['current_image_mime'] = $uploadedImageType;
// If prompt provided, apply it
if ($prompt) {
$response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage);
// Update with edited version
}
}
9. SESSION MANAGEMENT
⚠️ WATCHOUT #10: Session structure is critical!
// Initialize (MUST be done before any output)
session_start();
// Required session variables
$_SESSION['current_image'] = null; // Base64 string
$_SESSION['current_image_mime'] = 'image/png'; // MIME type
$_SESSION['conversation_history'] = []; // Array of prompts
$_SESSION['image_history'] = []; // Array of previous images
// Reset (clear everything)
$_SESSION['conversation_history'] = [];
$_SESSION['current_image'] = null;
$_SESSION['current_image_mime'] = 'image/png';
$_SESSION['image_history'] = [];
10. API CONFIGURATION
⚠️ WATCHOUT #11: Endpoint and model name are specific!
// CORRECT endpoint:
$url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent";
// Header format:
'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer'
// Timeout:
CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW
Model name: gemini-3-pro-image-preview
- May change in future
- Check Google's docs if errors persist
🐛 DEBUGGING CHECKLIST
When things don't work, check IN THIS ORDER:
1. Is the request format correct?
error_log("Request payload: " . json_encode($payload));
2. Is the response structure what you expect?
error_log("Response structure: " . json_encode($response));
3. Check finishReason:
$reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN';
error_log("Finish reason: " . $reason);
4. Verify base64 data:
error_log("Base64 length: " . strlen($base64));
error_log("First 50 chars: " . substr($base64, 0, 50));
5. Check MIME type matching:
error_log("Stored MIME: " . $_SESSION['current_image_mime']);
error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']);
🎓 COMMON MISTAKES TO AVOID
Mistake #1: Wrong request structure
// ❌ WRONG - text before image:
{"parts": [{"text": "..."}, {"inline_data": {...}}]}
// ✅ RIGHT - image before text:
{"parts": [{"inline_data": {...}}, {"text": "..."}]}
Mistake #2: Not checking finishReason
// ❌ WRONG - directly accessing parts:
$image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data'];
// ✅ RIGHT - check finishReason first:
if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') {
// Handle blocked content
}
Mistake #3: Hardcoded MIME types
// ❌ WRONG:
echo '<img src="data:image/png;base64,...">'; // Assumes PNG
// ✅ RIGHT:
echo '<img src="data:' . $mimeType . ';base64,...">'; // Uses actual type
Mistake #4: Not cleaning base64
// ❌ WRONG:
const base64 = reader.result; // Includes "data:image/png;base64,"
// ✅ RIGHT:
const base64 = reader.result.split(',')[1]; // Only base64 part
Mistake #5: Missing error handling
// ❌ WRONG:
$response = curl_exec($ch);
return json_decode($response);
// ✅ RIGHT:
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpCode !== 200) {
// Handle errors
}
📊 DATA FLOW DIAGRAM
┌─────────────────┐
│ User Action │
│ (Prompt/Upload)│
└────────┬────────┘
│
▼
┌─────────────────┐
│ JavaScript │
│ - Validate │
│ - Convert file │
│ - Build FormData│
└────────┬────────┘
│
▼
┌─────────────────┐
│ api.php │
│ - Get session │
│ - Build request │
│ - Call Gemini │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Gemini API │
│ - Process │
│ - Check filters │
│ - Generate │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Extract Response│
│ - Check finish │
│ - Get base64 │
│ - Get MIME type │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Store Session │
│ - current_image │
│ - image_mime │
│ - history │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Return to JS │
│ - Success flag │
│ - Reload page │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Display Image │
│ - Data URI │
│ - Correct MIME │
└─────────────────┘
🔧 TESTING STRATEGY
Test 1: Basic Generation
Prompt: "A futuristic motorcycle in a neon-lit city"
Expected: Image generated successfully
Test 2: Simple Edit
1. Generate: "A red sports car"
2. Edit: "add rain and reflections"
Expected: Car now has rain
Test 3: Upload
1. Upload: photo.jpg
2. No prompt
Expected: Photo stored, ready for editing
Test 4: Upload + Edit
1. Upload: landscape.jpg
2. Prompt: "make it look like a watercolor painting"
Expected: Transformed image
Test 5: Error Handling
Prompt: "a blue square"
Expected: IMAGE_RECITATION error with helpful message
🚨 CRITICAL SUCCESS FACTORS
You MUST get these right or the system will NOT work:
- ✅ Request format - Image before text, correct structure
- ✅ Response parsing - Check finishReason first
- ✅ MIME type handling - Store and use dynamically
- ✅ Base64 cleaning - No whitespace, no prefixes
- ✅ Session management - Store both data and MIME type
- ✅ Error handling - Different errors need different responses
- ✅ Prompt quality - Detailed, creative prompts only
- ✅ File upload - Client-side base64 conversion
- ✅ API timeout - 120 seconds minimum
- ✅ Retry logic - For temporary API failures
📝 QUICK REFERENCE
Essential Code Patterns
Check finishReason:
$reason = $response['candidates'][0]['finishReason'] ?? null;
if ($reason !== 'STOP') {
// Handle error
}
Extract image:
foreach ($response['candidates'][0]['content']['parts'] as $part) {
if (isset($part['inlineData']['data'])) {
return [
'base64' => $part['inlineData']['data'],
'mime_type' => $part['inlineData']['mimeType']
];
}
}
Store in session:
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];
Display image:
<img src="data:<?php echo $_SESSION['current_image_mime']; ?>;base64,<?php echo $_SESSION['current_image']; ?>">
🎯 IMPLEMENTATION CHECKLIST
Before considering the implementation complete:
- Image generation works with detailed prompts
- Image editing works (sends previous image)
- IMAGE_RECITATION errors handled gracefully
- MIME type stored and used correctly
- File upload converts to base64 properly
- Session persists across requests
- Error messages are helpful
- Debug panel shows request/response
- Simple prompts show helpful error
- Retry logic works for 500 errors
- Rate limiting handled
- Base64 data validated
- Conversation history tracked
- Reset clears session properly
💡 TIPS FOR AI ASSISTANTS
When helping users implement this:
- Show the request JSON first - Most problems are here
- Emphasize finishReason checking - Critical for error handling
- Explain MIME type importance - Common source of display issues
- Warn about simple prompts - Will trigger IMAGE_RECITATION
- Test with detailed prompts - "red circle" will fail
- Check session management - Editing requires proper storage
- Validate base64 format - Clean data is essential
- Add debug logging - Makes troubleshooting easier
- Handle all error types - Different errors need different solutions
- Test the full flow - Generate → Edit → Edit
📚 REFERENCES
- API Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent - Model:
gemini-3-pro-image-preview - Auth Header:
x-goog-api-key: YOUR_KEY - Response Format: JSON with base64 in
inlineData - Request Format: JSON with
inline_datafor editing
⚡ FINAL NOTES
This implementation is working and stable when these rules are followed:
- Use creative, detailed prompts (10+ words)
- Check
finishReasonbefore extracting image - Store and use correct MIME types
- Clean base64 data (no whitespace/prefixes)
- Manage session properly for editing
- Handle all error types specifically
- Implement retry logic for temporary failures
- Validate uploaded files before processing
The system works reliably when these patterns are followed exactly.
Generated from working implementation - December 2024