# Nano Banana Pro - AI Implementation Guide
## How to Build an Iterative Image Generation & Editing System with Google Gemini
---
## π― CRITICAL CONCEPT: This is NOT a standard image generation API
**β οΈ WATCHOUT #1:** Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney.
### Key Differences:
1. **Uses `generateContent` endpoint** (not a dedicated image API)
2. **Images returned as base64 in JSON** (embedded in response)
3. **Editing = Sending previous image back** (as base64 in request)
4. **Very aggressive content filters** (IMAGE_RECITATION errors)
5. **No direct image URLs** (everything is base64)
---
## π SYSTEM ARCHITECTURE
```
User Input (Prompt/Upload)
β
JavaScript Frontend (converts file to base64)
β
PHP Backend API (api.php)
β
Session Storage (stores base64 + MIME type)
β
Google Gemini API (processes with previous image if editing)
β
Extract base64 from response
β
Store in session
β
Display in browser (data URI)
```
---
## π CRITICAL IMPLEMENTATION DETAILS
### 1. THE REQUEST FORMAT (MOST IMPORTANT!)
**β οΈ WATCHOUT #2:** The request structure is VERY specific. Get this wrong and you get 500 errors.
#### For NEW image generation:
```json
{
"contents": [
{
"parts": [
{"text": "Your detailed creative prompt here"}
]
}
],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}
```
#### For EDITING existing image:
```json
{
"contents": [
{
"parts": [
{
"inline_data": {
"mime_type": "image/jpeg",
"data": "base64_string_here"
}
},
{"text": "Edit instruction prompt"}
]
}
],
"generationConfig": {
"responseModalities": ["IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}
```
**β οΈ CRITICAL:**
- Image MUST come BEFORE text in the parts array
- Use `inline_data` (snake_case) not `inlineData`
- MIME type should be `image/jpeg` (what Gemini returns)
- Base64 must be clean (no whitespace, no data URI prefix)
---
### 2. THE RESPONSE FORMAT
**β οΈ WATCHOUT #3:** The response structure has TWO possible formats!
#### Success Response (with image):
```json
{
"candidates": [
{
"content": {
"parts": [
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "base64_image_data"
}
}
]
},
"finishReason": "STOP"
}
]
}
```
**β οΈ NOTICE:** Response uses `inlineData` (camelCase), but request uses `inline_data` (snake_case)!
#### Blocked Response (IMAGE_RECITATION):
```json
{
"candidates": [
{
"content": [],
"finishReason": "IMAGE_RECITATION",
"finishMessage": "Unable to show the generated image..."
}
]
}
```
**β οΈ CRITICAL:** ALWAYS check `finishReason` BEFORE trying to extract image data!
---
### 3. MIME TYPE HANDLING
**β οΈ WATCHOUT #4:** MIME type mismatches break image display!
```php
// WRONG - hardcoded PNG:
$_SESSION['current_image'] = $base64;
echo '
';
// RIGHT - store and use actual MIME type:
$_SESSION['current_image'] = $base64;
$_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg"
echo '
';
```
**Why this matters:**
- Gemini returns `image/jpeg`
- If you display as `image/png`, browser may fail to render
- Store BOTH base64 data AND MIME type
---
### 4. BASE64 DATA HANDLING
**β οΈ WATCHOUT #5:** Base64 data must be CLEAN!
```javascript
// WRONG - includes data URI prefix:
const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..."
// RIGHT - strip the prefix:
const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..."
```
**Validation:**
```php
// Clean whitespace
$inputImage = preg_replace('/\s+/', '', $inputImage);
// Validate format
if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) {
throw new Exception("Invalid base64 format");
}
```
---
### 5. THE EDITING FLOW
**β οΈ WATCHOUT #6:** Session management is CRITICAL for editing to work!
```php
// Step 1: Generate first image
$response = $api->generateImage("cyberpunk city", "16:9", "2K", null);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];
// Step 2: Edit existing image
$previousImage = $_SESSION['current_image']; // Get from session
$response = $api->generateImage("add rain", "16:9", "2K", $previousImage);
$imageData = extractImageData($response);
$_SESSION['current_image'] = $imageData['base64']; // Update session
```
**Flow:**
1. Store generated image in session
2. On edit request, retrieve from session
3. Send as `inline_data` in request
4. Store new result back to session
5. Repeat for each edit
---
### 6. ERROR HANDLING - THE TRICKY PART
**β οΈ WATCHOUT #7:** Multiple error types, each needs specific handling!
```php
// Check finishReason FIRST
if (isset($response['candidates'][0]['finishReason'])) {
$reason = $response['candidates'][0]['finishReason'];
if ($reason === 'IMAGE_RECITATION') {
throw new Exception('Blocked by content filter. Use more creative prompts.');
}
if ($reason === 'SAFETY') {
throw new Exception('Blocked by safety filters.');
}
// Only proceed if STOP
if ($reason !== 'STOP') {
throw new Exception('Generation failed: ' . $reason);
}
}
// Then extract image
foreach ($response['candidates'][0]['content']['parts'] as $part) {
if (isset($part['inlineData']['data'])) {
return $part['inlineData'];
}
}
```
**Common Errors:**
| Error | HTTP Code | Cause | Solution |
|-------|-----------|-------|----------|
| IMAGE_RECITATION | 200 | Prompt too generic | Use creative, detailed prompts |
| Internal error | 500 | API temporary issue | Retry with exponential backoff |
| RESOURCE_EXHAUSTED | 429 | Rate limit | Wait 30s between requests |
| INVALID_ARGUMENT | 400 | Bad request format | Check base64 encoding |
---
### 7. PROMPT ENGINEERING
**β οΈ WATCHOUT #8:** Simple prompts WILL fail!
```javascript
// β WILL FAIL (IMAGE_RECITATION):
"a red circle"
"a blue square"
"a tree"
"a car"
// β
WILL WORK:
"a vintage red sports car racing through a neon-lit cyberpunk city at night"
"a magical forest with glowing blue mushrooms and fireflies at twilight"
"a futuristic cityscape with flying vehicles and holographic billboards"
```
**Rules:**
- Minimum 10 words
- Include adjectives (vintage, glowing, futuristic)
- Add context (at night, in rain, during sunset)
- Avoid single objects
- Be creative and specific
---
### 8. FILE UPLOAD HANDLING
**β οΈ WATCHOUT #9:** File conversion must be done client-side!
```javascript
// Convert file to base64 (client-side)
function fileToBase64(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
// CRITICAL: Remove data URI prefix!
const base64 = reader.result.split(',')[1];
resolve(base64);
};
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
// Usage
const file = uploadInput.files[0];
const base64 = await fileToBase64(file);
formData.append('uploadedImage', base64);
formData.append('uploadedImageType', file.type);
```
**Backend handling:**
```php
if ($uploadedImage) {
// Store uploaded image
$_SESSION['current_image'] = $uploadedImage;
$_SESSION['current_image_mime'] = $uploadedImageType;
// If prompt provided, apply it
if ($prompt) {
$response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage);
// Update with edited version
}
}
```
---
### 9. SESSION MANAGEMENT
**β οΈ WATCHOUT #10:** Session structure is critical!
```php
// Initialize (MUST be done before any output)
session_start();
// Required session variables
$_SESSION['current_image'] = null; // Base64 string
$_SESSION['current_image_mime'] = 'image/png'; // MIME type
$_SESSION['conversation_history'] = []; // Array of prompts
$_SESSION['image_history'] = []; // Array of previous images
// Reset (clear everything)
$_SESSION['conversation_history'] = [];
$_SESSION['current_image'] = null;
$_SESSION['current_image_mime'] = 'image/png';
$_SESSION['image_history'] = [];
```
---
### 10. API CONFIGURATION
**β οΈ WATCHOUT #11:** Endpoint and model name are specific!
```php
// CORRECT endpoint:
$url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent";
// Header format:
'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer'
// Timeout:
CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW
```
**Model name:** `gemini-3-pro-image-preview`
- May change in future
- Check Google's docs if errors persist
---
## π DEBUGGING CHECKLIST
When things don't work, check IN THIS ORDER:
### 1. Is the request format correct?
```php
error_log("Request payload: " . json_encode($payload));
```
### 2. Is the response structure what you expect?
```php
error_log("Response structure: " . json_encode($response));
```
### 3. Check finishReason:
```php
$reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN';
error_log("Finish reason: " . $reason);
```
### 4. Verify base64 data:
```php
error_log("Base64 length: " . strlen($base64));
error_log("First 50 chars: " . substr($base64, 0, 50));
```
### 5. Check MIME type matching:
```php
error_log("Stored MIME: " . $_SESSION['current_image_mime']);
error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']);
```
---
## π COMMON MISTAKES TO AVOID
### Mistake #1: Wrong request structure
```json
// β WRONG - text before image:
{"parts": [{"text": "..."}, {"inline_data": {...}}]}
// β
RIGHT - image before text:
{"parts": [{"inline_data": {...}}, {"text": "..."}]}
```
### Mistake #2: Not checking finishReason
```php
// β WRONG - directly accessing parts:
$image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data'];
// β
RIGHT - check finishReason first:
if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') {
// Handle blocked content
}
```
### Mistake #3: Hardcoded MIME types
```php
// β WRONG:
echo '
'; // Assumes PNG
// β
RIGHT:
echo '
'; // Uses actual type
```
### Mistake #4: Not cleaning base64
```javascript
// β WRONG:
const base64 = reader.result; // Includes "data:image/png;base64,"
// β
RIGHT:
const base64 = reader.result.split(',')[1]; // Only base64 part
```
### Mistake #5: Missing error handling
```php
// β WRONG:
$response = curl_exec($ch);
return json_decode($response);
// β
RIGHT:
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpCode !== 200) {
// Handle errors
}
```
---
## π DATA FLOW DIAGRAM
```
βββββββββββββββββββ
β User Action β
β (Prompt/Upload)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β JavaScript β
β - Validate β
β - Convert file β
β - Build FormDataβ
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β api.php β
β - Get session β
β - Build request β
β - Call Gemini β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Gemini API β
β - Process β
β - Check filters β
β - Generate β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Extract Responseβ
β - Check finish β
β - Get base64 β
β - Get MIME type β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Store Session β
β - current_image β
β - image_mime β
β - history β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Return to JS β
β - Success flag β
β - Reload page β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Display Image β
β - Data URI β
β - Correct MIME β
βββββββββββββββββββ
```
---
## π§ TESTING STRATEGY
### Test 1: Basic Generation
```
Prompt: "A futuristic motorcycle in a neon-lit city"
Expected: Image generated successfully
```
### Test 2: Simple Edit
```
1. Generate: "A red sports car"
2. Edit: "add rain and reflections"
Expected: Car now has rain
```
### Test 3: Upload
```
1. Upload: photo.jpg
2. No prompt
Expected: Photo stored, ready for editing
```
### Test 4: Upload + Edit
```
1. Upload: landscape.jpg
2. Prompt: "make it look like a watercolor painting"
Expected: Transformed image
```
### Test 5: Error Handling
```
Prompt: "a blue square"
Expected: IMAGE_RECITATION error with helpful message
```
---
## π¨ CRITICAL SUCCESS FACTORS
**You MUST get these right or the system will NOT work:**
1. β
**Request format** - Image before text, correct structure
2. β
**Response parsing** - Check finishReason first
3. β
**MIME type handling** - Store and use dynamically
4. β
**Base64 cleaning** - No whitespace, no prefixes
5. β
**Session management** - Store both data and MIME type
6. β
**Error handling** - Different errors need different responses
7. β
**Prompt quality** - Detailed, creative prompts only
8. β
**File upload** - Client-side base64 conversion
9. β
**API timeout** - 120 seconds minimum
10. β
**Retry logic** - For temporary API failures
---
## π QUICK REFERENCE
### Essential Code Patterns
**Check finishReason:**
```php
$reason = $response['candidates'][0]['finishReason'] ?? null;
if ($reason !== 'STOP') {
// Handle error
}
```
**Extract image:**
```php
foreach ($response['candidates'][0]['content']['parts'] as $part) {
if (isset($part['inlineData']['data'])) {
return [
'base64' => $part['inlineData']['data'],
'mime_type' => $part['inlineData']['mimeType']
];
}
}
```
**Store in session:**
```php
$_SESSION['current_image'] = $imageData['base64'];
$_SESSION['current_image_mime'] = $imageData['mime_type'];
```
**Display image:**
```php
```
---
## π― IMPLEMENTATION CHECKLIST
Before considering the implementation complete:
- [ ] Image generation works with detailed prompts
- [ ] Image editing works (sends previous image)
- [ ] IMAGE_RECITATION errors handled gracefully
- [ ] MIME type stored and used correctly
- [ ] File upload converts to base64 properly
- [ ] Session persists across requests
- [ ] Error messages are helpful
- [ ] Debug panel shows request/response
- [ ] Simple prompts show helpful error
- [ ] Retry logic works for 500 errors
- [ ] Rate limiting handled
- [ ] Base64 data validated
- [ ] Conversation history tracked
- [ ] Reset clears session properly
---
## π‘ TIPS FOR AI ASSISTANTS
When helping users implement this:
1. **Show the request JSON first** - Most problems are here
2. **Emphasize finishReason checking** - Critical for error handling
3. **Explain MIME type importance** - Common source of display issues
4. **Warn about simple prompts** - Will trigger IMAGE_RECITATION
5. **Test with detailed prompts** - "red circle" will fail
6. **Check session management** - Editing requires proper storage
7. **Validate base64 format** - Clean data is essential
8. **Add debug logging** - Makes troubleshooting easier
9. **Handle all error types** - Different errors need different solutions
10. **Test the full flow** - Generate β Edit β Edit
---
## π REFERENCES
- **API Endpoint:** `https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent`
- **Model:** `gemini-3-pro-image-preview`
- **Auth Header:** `x-goog-api-key: YOUR_KEY`
- **Response Format:** JSON with base64 in `inlineData`
- **Request Format:** JSON with `inline_data` for editing
---
## β‘ FINAL NOTES
This implementation is **working and stable** when these rules are followed:
1. Use creative, detailed prompts (10+ words)
2. Check `finishReason` before extracting image
3. Store and use correct MIME types
4. Clean base64 data (no whitespace/prefixes)
5. Manage session properly for editing
6. Handle all error types specifically
7. Implement retry logic for temporary failures
8. Validate uploaded files before processing
**The system works reliably when these patterns are followed exactly.**
---
*Generated from working implementation - December 2024*