- Complete working image generation app using Imagen 3 - PHP backend with Gemini API integration - Dark themed UI with prompt enhancement - Session management and logging system 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
685 lines
17 KiB
Markdown
685 lines
17 KiB
Markdown
# Nano Banana Pro - AI Implementation Guide
|
|
## How to Build an Iterative Image Generation & Editing System with Google Gemini
|
|
|
|
---
|
|
|
|
## 🎯 CRITICAL CONCEPT: This is NOT a standard image generation API
|
|
|
|
**⚠️ WATCHOUT #1:** Google Gemini's image generation works COMPLETELY differently from DALL-E, Stable Diffusion, or Midjourney.
|
|
|
|
### Key Differences:
|
|
1. **Uses `generateContent` endpoint** (not a dedicated image API)
|
|
2. **Images returned as base64 in JSON** (embedded in response)
|
|
3. **Editing = Sending previous image back** (as base64 in request)
|
|
4. **Very aggressive content filters** (IMAGE_RECITATION errors)
|
|
5. **No direct image URLs** (everything is base64)
|
|
|
|
---
|
|
|
|
## 📐 SYSTEM ARCHITECTURE
|
|
|
|
```
|
|
User Input (Prompt/Upload)
|
|
↓
|
|
JavaScript Frontend (converts file to base64)
|
|
↓
|
|
PHP Backend API (api.php)
|
|
↓
|
|
Session Storage (stores base64 + MIME type)
|
|
↓
|
|
Google Gemini API (processes with previous image if editing)
|
|
↓
|
|
Extract base64 from response
|
|
↓
|
|
Store in session
|
|
↓
|
|
Display in browser (data URI)
|
|
```
|
|
|
|
---
|
|
|
|
## 🔑 CRITICAL IMPLEMENTATION DETAILS
|
|
|
|
### 1. THE REQUEST FORMAT (MOST IMPORTANT!)
|
|
|
|
**⚠️ WATCHOUT #2:** The request structure is VERY specific. Get this wrong and you get 500 errors.
|
|
|
|
#### For NEW image generation:
|
|
```json
|
|
{
|
|
"contents": [
|
|
{
|
|
"parts": [
|
|
{"text": "Your detailed creative prompt here"}
|
|
]
|
|
}
|
|
],
|
|
"generationConfig": {
|
|
"responseModalities": ["IMAGE"],
|
|
"imageConfig": {
|
|
"aspectRatio": "16:9",
|
|
"imageSize": "2K"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### For EDITING existing image:
|
|
```json
|
|
{
|
|
"contents": [
|
|
{
|
|
"parts": [
|
|
{
|
|
"inline_data": {
|
|
"mime_type": "image/jpeg",
|
|
"data": "base64_string_here"
|
|
}
|
|
},
|
|
{"text": "Edit instruction prompt"}
|
|
]
|
|
}
|
|
],
|
|
"generationConfig": {
|
|
"responseModalities": ["IMAGE"],
|
|
"imageConfig": {
|
|
"aspectRatio": "16:9",
|
|
"imageSize": "2K"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**⚠️ CRITICAL:**
|
|
- Image MUST come BEFORE text in the parts array
|
|
- Use `inline_data` (snake_case) not `inlineData`
|
|
- MIME type should be `image/jpeg` (what Gemini returns)
|
|
- Base64 must be clean (no whitespace, no data URI prefix)
|
|
|
|
---
|
|
|
|
### 2. THE RESPONSE FORMAT
|
|
|
|
**⚠️ WATCHOUT #3:** The response structure has TWO possible formats!
|
|
|
|
#### Success Response (with image):
|
|
```json
|
|
{
|
|
"candidates": [
|
|
{
|
|
"content": {
|
|
"parts": [
|
|
{
|
|
"inlineData": {
|
|
"mimeType": "image/jpeg",
|
|
"data": "base64_image_data"
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"finishReason": "STOP"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**⚠️ NOTICE:** Response uses `inlineData` (camelCase), but request uses `inline_data` (snake_case)!
|
|
|
|
#### Blocked Response (IMAGE_RECITATION):
|
|
```json
|
|
{
|
|
"candidates": [
|
|
{
|
|
"content": [],
|
|
"finishReason": "IMAGE_RECITATION",
|
|
"finishMessage": "Unable to show the generated image..."
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**⚠️ CRITICAL:** ALWAYS check `finishReason` BEFORE trying to extract image data!
|
|
|
|
---
|
|
|
|
### 3. MIME TYPE HANDLING
|
|
|
|
**⚠️ WATCHOUT #4:** MIME type mismatches break image display!
|
|
|
|
```php
|
|
// WRONG - hardcoded PNG:
|
|
$_SESSION['current_image'] = $base64;
|
|
echo '<img src="data:image/png;base64,' . $base64 . '">';
|
|
|
|
// RIGHT - store and use actual MIME type:
|
|
$_SESSION['current_image'] = $base64;
|
|
$_SESSION['current_image_mime'] = $mimeType; // e.g., "image/jpeg"
|
|
echo '<img src="data:' . $mimeType . ';base64,' . $base64 . '">';
|
|
```
|
|
|
|
**Why this matters:**
|
|
- Gemini returns `image/jpeg`
|
|
- If you display as `image/png`, browser may fail to render
|
|
- Store BOTH base64 data AND MIME type
|
|
|
|
---
|
|
|
|
### 4. BASE64 DATA HANDLING
|
|
|
|
**⚠️ WATCHOUT #5:** Base64 data must be CLEAN!
|
|
|
|
```javascript
|
|
// WRONG - includes data URI prefix:
|
|
const base64 = reader.result; // "data:image/jpeg;base64,/9j/4AAQ..."
|
|
|
|
// RIGHT - strip the prefix:
|
|
const base64 = reader.result.split(',')[1]; // "/9j/4AAQ..."
|
|
```
|
|
|
|
**Validation:**
|
|
```php
|
|
// Clean whitespace
|
|
$inputImage = preg_replace('/\s+/', '', $inputImage);
|
|
|
|
// Validate format
|
|
if (!preg_match('/^[A-Za-z0-9+\/]+={0,2}$/', $inputImage)) {
|
|
throw new Exception("Invalid base64 format");
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5. THE EDITING FLOW
|
|
|
|
**⚠️ WATCHOUT #6:** Session management is CRITICAL for editing to work!
|
|
|
|
```php
|
|
// Step 1: Generate first image
|
|
$response = $api->generateImage("cyberpunk city", "16:9", "2K", null);
|
|
$imageData = extractImageData($response);
|
|
$_SESSION['current_image'] = $imageData['base64'];
|
|
$_SESSION['current_image_mime'] = $imageData['mime_type'];
|
|
|
|
// Step 2: Edit existing image
|
|
$previousImage = $_SESSION['current_image']; // Get from session
|
|
$response = $api->generateImage("add rain", "16:9", "2K", $previousImage);
|
|
$imageData = extractImageData($response);
|
|
$_SESSION['current_image'] = $imageData['base64']; // Update session
|
|
```
|
|
|
|
**Flow:**
|
|
1. Store generated image in session
|
|
2. On edit request, retrieve from session
|
|
3. Send as `inline_data` in request
|
|
4. Store new result back to session
|
|
5. Repeat for each edit
|
|
|
|
---
|
|
|
|
### 6. ERROR HANDLING - THE TRICKY PART
|
|
|
|
**⚠️ WATCHOUT #7:** Multiple error types, each needs specific handling!
|
|
|
|
```php
|
|
// Check finishReason FIRST
|
|
if (isset($response['candidates'][0]['finishReason'])) {
|
|
$reason = $response['candidates'][0]['finishReason'];
|
|
|
|
if ($reason === 'IMAGE_RECITATION') {
|
|
throw new Exception('Blocked by content filter. Use more creative prompts.');
|
|
}
|
|
|
|
if ($reason === 'SAFETY') {
|
|
throw new Exception('Blocked by safety filters.');
|
|
}
|
|
|
|
// Only proceed if STOP
|
|
if ($reason !== 'STOP') {
|
|
throw new Exception('Generation failed: ' . $reason);
|
|
}
|
|
}
|
|
|
|
// Then extract image
|
|
foreach ($response['candidates'][0]['content']['parts'] as $part) {
|
|
if (isset($part['inlineData']['data'])) {
|
|
return $part['inlineData'];
|
|
}
|
|
}
|
|
```
|
|
|
|
**Common Errors:**
|
|
|
|
| Error | HTTP Code | Cause | Solution |
|
|
|-------|-----------|-------|----------|
|
|
| IMAGE_RECITATION | 200 | Prompt too generic | Use creative, detailed prompts |
|
|
| Internal error | 500 | API temporary issue | Retry with exponential backoff |
|
|
| RESOURCE_EXHAUSTED | 429 | Rate limit | Wait 30s between requests |
|
|
| INVALID_ARGUMENT | 400 | Bad request format | Check base64 encoding |
|
|
|
|
---
|
|
|
|
### 7. PROMPT ENGINEERING
|
|
|
|
**⚠️ WATCHOUT #8:** Simple prompts WILL fail!
|
|
|
|
```javascript
|
|
// ❌ WILL FAIL (IMAGE_RECITATION):
|
|
"a red circle"
|
|
"a blue square"
|
|
"a tree"
|
|
"a car"
|
|
|
|
// ✅ WILL WORK:
|
|
"a vintage red sports car racing through a neon-lit cyberpunk city at night"
|
|
"a magical forest with glowing blue mushrooms and fireflies at twilight"
|
|
"a futuristic cityscape with flying vehicles and holographic billboards"
|
|
```
|
|
|
|
**Rules:**
|
|
- Minimum 10 words
|
|
- Include adjectives (vintage, glowing, futuristic)
|
|
- Add context (at night, in rain, during sunset)
|
|
- Avoid single objects
|
|
- Be creative and specific
|
|
|
|
---
|
|
|
|
### 8. FILE UPLOAD HANDLING
|
|
|
|
**⚠️ WATCHOUT #9:** File conversion must be done client-side!
|
|
|
|
```javascript
|
|
// Convert file to base64 (client-side)
|
|
function fileToBase64(file) {
|
|
return new Promise((resolve, reject) => {
|
|
const reader = new FileReader();
|
|
reader.onload = () => {
|
|
// CRITICAL: Remove data URI prefix!
|
|
const base64 = reader.result.split(',')[1];
|
|
resolve(base64);
|
|
};
|
|
reader.onerror = reject;
|
|
reader.readAsDataURL(file);
|
|
});
|
|
}
|
|
|
|
// Usage
|
|
const file = uploadInput.files[0];
|
|
const base64 = await fileToBase64(file);
|
|
formData.append('uploadedImage', base64);
|
|
formData.append('uploadedImageType', file.type);
|
|
```
|
|
|
|
**Backend handling:**
|
|
```php
|
|
if ($uploadedImage) {
|
|
// Store uploaded image
|
|
$_SESSION['current_image'] = $uploadedImage;
|
|
$_SESSION['current_image_mime'] = $uploadedImageType;
|
|
|
|
// If prompt provided, apply it
|
|
if ($prompt) {
|
|
$response = $api->generateImage($prompt, $aspectRatio, $imageSize, $uploadedImage);
|
|
// Update with edited version
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 9. SESSION MANAGEMENT
|
|
|
|
**⚠️ WATCHOUT #10:** Session structure is critical!
|
|
|
|
```php
|
|
// Initialize (MUST be done before any output)
|
|
session_start();
|
|
|
|
// Required session variables
|
|
$_SESSION['current_image'] = null; // Base64 string
|
|
$_SESSION['current_image_mime'] = 'image/png'; // MIME type
|
|
$_SESSION['conversation_history'] = []; // Array of prompts
|
|
$_SESSION['image_history'] = []; // Array of previous images
|
|
|
|
// Reset (clear everything)
|
|
$_SESSION['conversation_history'] = [];
|
|
$_SESSION['current_image'] = null;
|
|
$_SESSION['current_image_mime'] = 'image/png';
|
|
$_SESSION['image_history'] = [];
|
|
```
|
|
|
|
---
|
|
|
|
### 10. API CONFIGURATION
|
|
|
|
**⚠️ WATCHOUT #11:** Endpoint and model name are specific!
|
|
|
|
```php
|
|
// CORRECT endpoint:
|
|
$url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent";
|
|
|
|
// Header format:
|
|
'x-goog-api-key: YOUR_API_KEY' // NOT 'Authorization: Bearer'
|
|
|
|
// Timeout:
|
|
CURLOPT_TIMEOUT => 120 // 2 minutes - image generation is SLOW
|
|
```
|
|
|
|
**Model name:** `gemini-3-pro-image-preview`
|
|
- May change in future
|
|
- Check Google's docs if errors persist
|
|
|
|
---
|
|
|
|
## 🐛 DEBUGGING CHECKLIST
|
|
|
|
When things don't work, check IN THIS ORDER:
|
|
|
|
### 1. Is the request format correct?
|
|
```php
|
|
error_log("Request payload: " . json_encode($payload));
|
|
```
|
|
|
|
### 2. Is the response structure what you expect?
|
|
```php
|
|
error_log("Response structure: " . json_encode($response));
|
|
```
|
|
|
|
### 3. Check finishReason:
|
|
```php
|
|
$reason = $response['candidates'][0]['finishReason'] ?? 'UNKNOWN';
|
|
error_log("Finish reason: " . $reason);
|
|
```
|
|
|
|
### 4. Verify base64 data:
|
|
```php
|
|
error_log("Base64 length: " . strlen($base64));
|
|
error_log("First 50 chars: " . substr($base64, 0, 50));
|
|
```
|
|
|
|
### 5. Check MIME type matching:
|
|
```php
|
|
error_log("Stored MIME: " . $_SESSION['current_image_mime']);
|
|
error_log("Response MIME: " . $response['candidates'][0]['content']['parts'][0]['inlineData']['mimeType']);
|
|
```
|
|
|
|
---
|
|
|
|
## 🎓 COMMON MISTAKES TO AVOID
|
|
|
|
### Mistake #1: Wrong request structure
|
|
```json
|
|
// ❌ WRONG - text before image:
|
|
{"parts": [{"text": "..."}, {"inline_data": {...}}]}
|
|
|
|
// ✅ RIGHT - image before text:
|
|
{"parts": [{"inline_data": {...}}, {"text": "..."}]}
|
|
```
|
|
|
|
### Mistake #2: Not checking finishReason
|
|
```php
|
|
// ❌ WRONG - directly accessing parts:
|
|
$image = $response['candidates'][0]['content']['parts'][0]['inlineData']['data'];
|
|
|
|
// ✅ RIGHT - check finishReason first:
|
|
if ($response['candidates'][0]['finishReason'] === 'IMAGE_RECITATION') {
|
|
// Handle blocked content
|
|
}
|
|
```
|
|
|
|
### Mistake #3: Hardcoded MIME types
|
|
```php
|
|
// ❌ WRONG:
|
|
echo '<img src="data:image/png;base64,...">'; // Assumes PNG
|
|
|
|
// ✅ RIGHT:
|
|
echo '<img src="data:' . $mimeType . ';base64,...">'; // Uses actual type
|
|
```
|
|
|
|
### Mistake #4: Not cleaning base64
|
|
```javascript
|
|
// ❌ WRONG:
|
|
const base64 = reader.result; // Includes "data:image/png;base64,"
|
|
|
|
// ✅ RIGHT:
|
|
const base64 = reader.result.split(',')[1]; // Only base64 part
|
|
```
|
|
|
|
### Mistake #5: Missing error handling
|
|
```php
|
|
// ❌ WRONG:
|
|
$response = curl_exec($ch);
|
|
return json_decode($response);
|
|
|
|
// ✅ RIGHT:
|
|
$response = curl_exec($ch);
|
|
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
|
|
if ($httpCode !== 200) {
|
|
// Handle errors
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 DATA FLOW DIAGRAM
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ User Action │
|
|
│ (Prompt/Upload)│
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ JavaScript │
|
|
│ - Validate │
|
|
│ - Convert file │
|
|
│ - Build FormData│
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ api.php │
|
|
│ - Get session │
|
|
│ - Build request │
|
|
│ - Call Gemini │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Gemini API │
|
|
│ - Process │
|
|
│ - Check filters │
|
|
│ - Generate │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Extract Response│
|
|
│ - Check finish │
|
|
│ - Get base64 │
|
|
│ - Get MIME type │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Store Session │
|
|
│ - current_image │
|
|
│ - image_mime │
|
|
│ - history │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Return to JS │
|
|
│ - Success flag │
|
|
│ - Reload page │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Display Image │
|
|
│ - Data URI │
|
|
│ - Correct MIME │
|
|
└─────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 TESTING STRATEGY
|
|
|
|
### Test 1: Basic Generation
|
|
```
|
|
Prompt: "A futuristic motorcycle in a neon-lit city"
|
|
Expected: Image generated successfully
|
|
```
|
|
|
|
### Test 2: Simple Edit
|
|
```
|
|
1. Generate: "A red sports car"
|
|
2. Edit: "add rain and reflections"
|
|
Expected: Car now has rain
|
|
```
|
|
|
|
### Test 3: Upload
|
|
```
|
|
1. Upload: photo.jpg
|
|
2. No prompt
|
|
Expected: Photo stored, ready for editing
|
|
```
|
|
|
|
### Test 4: Upload + Edit
|
|
```
|
|
1. Upload: landscape.jpg
|
|
2. Prompt: "make it look like a watercolor painting"
|
|
Expected: Transformed image
|
|
```
|
|
|
|
### Test 5: Error Handling
|
|
```
|
|
Prompt: "a blue square"
|
|
Expected: IMAGE_RECITATION error with helpful message
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 CRITICAL SUCCESS FACTORS
|
|
|
|
**You MUST get these right or the system will NOT work:**
|
|
|
|
1. ✅ **Request format** - Image before text, correct structure
|
|
2. ✅ **Response parsing** - Check finishReason first
|
|
3. ✅ **MIME type handling** - Store and use dynamically
|
|
4. ✅ **Base64 cleaning** - No whitespace, no prefixes
|
|
5. ✅ **Session management** - Store both data and MIME type
|
|
6. ✅ **Error handling** - Different errors need different responses
|
|
7. ✅ **Prompt quality** - Detailed, creative prompts only
|
|
8. ✅ **File upload** - Client-side base64 conversion
|
|
9. ✅ **API timeout** - 120 seconds minimum
|
|
10. ✅ **Retry logic** - For temporary API failures
|
|
|
|
---
|
|
|
|
## 📝 QUICK REFERENCE
|
|
|
|
### Essential Code Patterns
|
|
|
|
**Check finishReason:**
|
|
```php
|
|
$reason = $response['candidates'][0]['finishReason'] ?? null;
|
|
if ($reason !== 'STOP') {
|
|
// Handle error
|
|
}
|
|
```
|
|
|
|
**Extract image:**
|
|
```php
|
|
foreach ($response['candidates'][0]['content']['parts'] as $part) {
|
|
if (isset($part['inlineData']['data'])) {
|
|
return [
|
|
'base64' => $part['inlineData']['data'],
|
|
'mime_type' => $part['inlineData']['mimeType']
|
|
];
|
|
}
|
|
}
|
|
```
|
|
|
|
**Store in session:**
|
|
```php
|
|
$_SESSION['current_image'] = $imageData['base64'];
|
|
$_SESSION['current_image_mime'] = $imageData['mime_type'];
|
|
```
|
|
|
|
**Display image:**
|
|
```php
|
|
<img src="data:<?php echo $_SESSION['current_image_mime']; ?>;base64,<?php echo $_SESSION['current_image']; ?>">
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 IMPLEMENTATION CHECKLIST
|
|
|
|
Before considering the implementation complete:
|
|
|
|
- [ ] Image generation works with detailed prompts
|
|
- [ ] Image editing works (sends previous image)
|
|
- [ ] IMAGE_RECITATION errors handled gracefully
|
|
- [ ] MIME type stored and used correctly
|
|
- [ ] File upload converts to base64 properly
|
|
- [ ] Session persists across requests
|
|
- [ ] Error messages are helpful
|
|
- [ ] Debug panel shows request/response
|
|
- [ ] Simple prompts show helpful error
|
|
- [ ] Retry logic works for 500 errors
|
|
- [ ] Rate limiting handled
|
|
- [ ] Base64 data validated
|
|
- [ ] Conversation history tracked
|
|
- [ ] Reset clears session properly
|
|
|
|
---
|
|
|
|
## 💡 TIPS FOR AI ASSISTANTS
|
|
|
|
When helping users implement this:
|
|
|
|
1. **Show the request JSON first** - Most problems are here
|
|
2. **Emphasize finishReason checking** - Critical for error handling
|
|
3. **Explain MIME type importance** - Common source of display issues
|
|
4. **Warn about simple prompts** - Will trigger IMAGE_RECITATION
|
|
5. **Test with detailed prompts** - "red circle" will fail
|
|
6. **Check session management** - Editing requires proper storage
|
|
7. **Validate base64 format** - Clean data is essential
|
|
8. **Add debug logging** - Makes troubleshooting easier
|
|
9. **Handle all error types** - Different errors need different solutions
|
|
10. **Test the full flow** - Generate → Edit → Edit
|
|
|
|
---
|
|
|
|
## 📚 REFERENCES
|
|
|
|
- **API Endpoint:** `https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent`
|
|
- **Model:** `gemini-3-pro-image-preview`
|
|
- **Auth Header:** `x-goog-api-key: YOUR_KEY`
|
|
- **Response Format:** JSON with base64 in `inlineData`
|
|
- **Request Format:** JSON with `inline_data` for editing
|
|
|
|
---
|
|
|
|
## ⚡ FINAL NOTES
|
|
|
|
This implementation is **working and stable** when these rules are followed:
|
|
|
|
1. Use creative, detailed prompts (10+ words)
|
|
2. Check `finishReason` before extracting image
|
|
3. Store and use correct MIME types
|
|
4. Clean base64 data (no whitespace/prefixes)
|
|
5. Manage session properly for editing
|
|
6. Handle all error types specifically
|
|
7. Implement retry logic for temporary failures
|
|
8. Validate uploaded files before processing
|
|
|
|
**The system works reliably when these patterns are followed exactly.**
|
|
|
|
---
|
|
|
|
*Generated from working implementation - December 2024*
|