master_adapt_detect/README.md

505 lines
14 KiB
Markdown

# Master Adapt Detect
A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.
## Overview
This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:
1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini
3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search
4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis
## Key Features
### Detection Capabilities
- **Multi-strategy detection** - Choose from 4 different detection engines
- **Panel counting** - Automatic detection of number of panels in layouts
- **Censorship detection** - Identifies censored vs uncensored content with CEN refinement
- **Smart matching** - Handles cropped, scaled, rotated, and transformed images
- **Confidence scoring** - Provides match confidence based on panel count and detected matches
### Hybrid Mode (Primary Feature)
- **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection
- **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex
- **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division
- **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching
- **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching
- **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed
### Processing Options
- **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination
- **Memory management** - Dynamic worker adjustment based on system resources
- **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring
- **Batch processing** - Process hundreds of layouts efficiently
- **Progress tracking** - Real-time progress updates with ETA
## Installation
### Prerequisites
- Python 3.8+
- OpenCV
- Google Cloud credentials (for Vector mode)
- OpenAI API key (for OpenAI/Hybrid modes)
- Google AI API key (for Gemini mode)
### Setup
```bash
# Clone the repository
git clone <repository-url>
cd master_adapt_detect
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# GOOGLE_API_KEY=your_google_ai_key
# GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
```
## Usage
### Command Line Interface
The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes.
```bash
# Basic usage - hybrid mode with test
python cli.py --test --hybrid
# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid
# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts
# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time
# Vector mode with similarity search
python cli.py --all --vector
# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
```
### Detection Modes
#### Hybrid Mode (Recommended)
Best balance of speed, cost, and accuracy.
```bash
# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2
# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple
# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced
# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode
# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time
```
#### OpenAI Mode
Full AI-powered detection with optional refinement.
```bash
# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai
# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time
# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement
```
#### Vector Mode
Semantic similarity using embeddings.
```bash
# Process with vector embeddings
python cli.py --all --vector
# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8
```
#### Gemini Mode
Google Gemini 2.5 Pro detection.
```bash
# Standard Gemini detection
python cli.py --limit 10 --gemini
```
### Key Options
**Detection Mode:**
- `--hybrid` - Hybrid detection mode (default)
- `--openai` - OpenAI detection mode
- `--vector` - Vector similarity mode
- `--gemini` - Gemini detection mode
**Processing:**
- `--test` - Test with 1 layout
- `--limit N` - Process first N layouts
- `--all` - Process all layouts
- `--specific-file FILE` - Process specific file
**Hybrid Options:**
- `--panel-threshold N` - Panel threshold for routing (default: 2)
- `--split-simple` - Use simple even division splitting
- `--split-advanced` - Use advanced edge detection splitting
- `--vector-mode` - Use vector similarity instead of inlier analysis
- `--fallback-one-at-a-time` - Enable OpenAI fallback
- `--parallel-layouts` - Enable parallel layout processing
- `--no-truncation` - Disable match truncation to panel count
**Cost Tracking:**
- `--enable-cost-tracking` - Enable cost tracking (disabled by default)
- `--cost-report` - Generate detailed cost report
- `--cost-estimate N` - Estimate monthly cost for N layouts
**Worker Configuration:**
- `--openai-workers N` - OpenAI worker count (default: auto)
- `--local-workers N` - Local analysis workers (default: auto)
- `--layout-workers N` - Parallel layout workers (default: auto)
**Other:**
- `--output NAME` - Custom output filename
- `--help` - Show all options
## Architecture
### Core Components
#### Detection Engines
1. **HybridImageDetector** (`hybrid_detector.py`)
- Main hybrid detection implementation
- Routes layouts based on panel count
- Integrates OpenAI, local analysis, and splitting
- Handles parallel processing coordination
2. **OpenAIImageDetector** (`openai_detector.py`)
- OpenAI O3 mini integration
- Panel counting and censorship detection
- One-at-a-time and batch detection modes
- CEN refinement for censored content
3. **VectorDetector** (`vector_detector.py`)
- Google Vertex AI multimodal embeddings
- Cosine similarity matching
- Embedding caching for performance
4. **GeminiDetector** (`gemini_detector.py`)
- Google Gemini 2.5 Pro integration
- Visual reasoning and analysis
#### Panel Splitting
1. **PanelSplitter** (`panel_splitter.py`)
- Multi-method panel splitting
- Optimized Canny edge detection
- Hough line transform for separators
- Tuned for 14-panel detection
2. **AdvancedPanelSplitter** (`advanced_splitter.py`)
- Edge detection and gutter analysis
- Sobel gradient detection
- Configurable percentile thresholds
3. **SimplePanelSplitter** (`simple_splitter.py`)
- Simple even division
- Fast horizontal splitting
- Grid layout support
#### Supporting Systems
1. **Cost Calculator** (`cost_calculator.py`)
- Tracks OpenAI API usage
- Per-layout and session cost tracking
- Monthly cost estimation
- Detailed JSON reports
2. **Memory Manager** (`memory_manager.py`)
- Prevents memory exhaustion
- Dynamic worker adjustment
- System resource monitoring
3. **Logging Config** (`logging_config.py`)
- Dual output (terminal + file)
- Crash tracking
- System diagnostics
4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`)
- Serial execution of inlier analysis
- Task queue management
- Prevents system overload
### Workflow
#### Hybrid Mode Workflow
1. **OpenAI Analysis** (1 API call)
- Count panels in layout
- Detect censorship status
- Consolidated analysis
2. **Detection Routing**
- ≤ panel_threshold: Direct local/vector analysis
- > panel_threshold: Split + local/vector analysis
3. **Local Analysis** (no API calls)
- OpenCV AKAZE feature detection
- Multiprocessing for speed
- RANSAC homography estimation
- Inlier-based confidence scoring
4. **Post-Processing**
- CEN refinement (if enabled)
- Deduplication
- Truncation to panel count
- Confidence scoring
5. **Optional Fallback** (if enabled)
- Triggers when matches < panels
- OpenAI one-at-a-time detection
- Additional API calls only when needed
## Directory Structure
```
master_adapt_detect/
├── cli.py # Main command-line interface
├── hybrid_detector.py # Hybrid detection engine
├── openai_detector.py # OpenAI detection engine
├── vector_detector.py # Vector similarity engine
├── gemini_detector.py # Gemini detection engine
├── panel_splitter.py # Traditional panel splitter
├── advanced_splitter.py # Advanced edge detection splitter
├── simple_splitter.py # Simple even division splitter
├── cost_calculator.py # Cost tracking system
├── memory_manager.py # Memory management
├── logging_config.py # Logging configuration
├── requirements.txt # Python dependencies
├── .env # API keys (not in git)
├── master_images/ # Master images to detect (41 images)
├── layouts/ # Layout images to process (299+ images)
├── results/ # JSON output files
└── embeddings_cache/ # Cached vector embeddings
```
## Output Format
Results are saved as JSON files with detailed metadata.
### Example Output
```json
{
"metadata": {
"total_layouts_processed": 10,
"total_master_images": 41,
"provider": "hybrid",
"model": "openai_o3_plus_local_analysis",
"panel_threshold": 2,
"processing_mode": "hybrid"
},
"results": {
"6814786": {
"layout_filename": "6814786.jpg",
"detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
"detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
"detection_method": "local_inlier_analysis",
"panel_count": 2,
"confidence_score": 100.0,
"panel_analysis": {
"panel_count": 2,
"confidence": "high"
},
"censorship_analysis": {
"is_censored": false,
"confidence": "high"
}
}
}
}
```
## Cost Tracking
Cost tracking monitors OpenAI API usage and provides detailed reports.
### Enable Cost Tracking
```bash
# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking
# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
```
### Cost Report Output
- **Session summary** - Total cost, tokens, API calls
- **Per-layout breakdown** - Cost for each layout
- **Operation analysis** - Cost by operation type
- **Monthly estimates** - Projected monthly/annual costs
- **JSON reports** - Detailed cost data in `results/`
See `COST_TRACKING_README.md` for complete documentation.
## Performance
### Hybrid Mode Benefits
- **97.6% cost reduction** vs OpenAI one-at-a-time mode
- **1 API call per layout** for panel analysis
- **Zero API calls** for matching (local analysis)
- **Parallel processing** for throughput
- **Memory-safe** with dynamic adjustment
### Benchmarks
- **Simple layouts (≤2 panels)**: ~2-3 seconds per layout
- **Complex layouts (>2 panels)**: ~5-7 seconds per layout
- **Parallel mode**: ~50-100 layouts per minute (system dependent)
- **Memory usage**: Dynamic adjustment prevents exhaustion
## Advanced Features
### Parallel Layout Processing
Process multiple layouts concurrently with coordinated inlier analysis.
```bash
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
```
### CEN Refinement
Automatically switch between censored (CEN) and uncensored versions.
```bash
python cli.py --all --hybrid --cen-refinement
```
### Custom Splitting Parameters
Fine-tune panel splitting behavior.
```bash
# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10
# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
```
### Image Preprocessing
Enhance detection accuracy with preprocessing.
```bash
# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale
# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
```
## Troubleshooting
### Common Issues
**"Cost tracking is disabled"**
- Add `--enable-cost-tracking` flag to enable cost monitoring
**"Memory usage too high"**
- System will auto-adjust workers
- Reduce `--local-workers` or `--layout-workers` manually
**"Too many open files"**
- Reduce concurrent workers
- System will auto-recover and limit workers
**"No matches found"**
- Try different detection modes
- Adjust inlier thresholds
- Enable fallback mode
### Memory Management
The system includes automatic memory management:
- Monitors RAM and swap usage
- Dynamically adjusts worker counts
- Prevents system crashes
- Logs resource usage
### Logging
All processing is logged to both terminal and file:
- Log files: `master_adapt_detect_TIMESTAMP.log`
- Includes system diagnostics
- Crash tracking with full traceback
- Resource usage at crash time
## Development
### Running Tests
```bash
# Test hybrid mode
python test_hybrid.py
# Test cost tracking
python test_cost_calculator.py
# Test panel splitting
python test_split_mode.py
```
### Adding New Detection Modes
1. Create new detector class inheriting from base
2. Implement required methods:
- `detect_images_in_layout()`
- `process_all_layouts()`
3. Add CLI integration in `cli.py`
4. Update documentation
## OpenAI Pricing (2025)
- **Input tokens**: $2.00 per million
- **Cached input**: $0.50 per million
- **Output tokens**: $8.00 per million
Hybrid mode achieves significant cost savings by minimizing API calls.
## License
[License information]
## Credits
Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.