505 lines
14 KiB
Markdown
505 lines
14 KiB
Markdown
# Master Adapt Detect
|
|
|
|
A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.
|
|
|
|
## Overview
|
|
|
|
This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:
|
|
|
|
1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
|
|
2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini
|
|
3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search
|
|
4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis
|
|
|
|
## Key Features
|
|
|
|
### Detection Capabilities
|
|
- **Multi-strategy detection** - Choose from 4 different detection engines
|
|
- **Panel counting** - Automatic detection of number of panels in layouts
|
|
- **Censorship detection** - Identifies censored vs uncensored content with CEN refinement
|
|
- **Smart matching** - Handles cropped, scaled, rotated, and transformed images
|
|
- **Confidence scoring** - Provides match confidence based on panel count and detected matches
|
|
|
|
### Hybrid Mode (Primary Feature)
|
|
- **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection
|
|
- **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex
|
|
- **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division
|
|
- **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching
|
|
- **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching
|
|
- **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed
|
|
|
|
### Processing Options
|
|
- **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination
|
|
- **Memory management** - Dynamic worker adjustment based on system resources
|
|
- **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring
|
|
- **Batch processing** - Process hundreds of layouts efficiently
|
|
- **Progress tracking** - Real-time progress updates with ETA
|
|
|
|
## Installation
|
|
|
|
### Prerequisites
|
|
- Python 3.8+
|
|
- OpenCV
|
|
- Google Cloud credentials (for Vector mode)
|
|
- OpenAI API key (for OpenAI/Hybrid modes)
|
|
- Google AI API key (for Gemini mode)
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone <repository-url>
|
|
cd master_adapt_detect
|
|
|
|
# Create virtual environment
|
|
python3 -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Configure API keys
|
|
cp .env.example .env
|
|
# Edit .env and add your API keys:
|
|
# OPENAI_API_KEY=your_openai_key
|
|
# GOOGLE_API_KEY=your_google_ai_key
|
|
# GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line Interface
|
|
|
|
The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes.
|
|
|
|
```bash
|
|
# Basic usage - hybrid mode with test
|
|
python cli.py --test --hybrid
|
|
|
|
# Process first 10 layouts in hybrid mode
|
|
python cli.py --limit 10 --hybrid
|
|
|
|
# Process all layouts with parallel processing
|
|
python cli.py --all --hybrid --parallel-layouts
|
|
|
|
# OpenAI mode with one-at-a-time comparison
|
|
python cli.py --limit 10 --openai --one-at-a-time
|
|
|
|
# Vector mode with similarity search
|
|
python cli.py --all --vector
|
|
|
|
# Enable cost tracking
|
|
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
|
|
```
|
|
|
|
### Detection Modes
|
|
|
|
#### Hybrid Mode (Recommended)
|
|
Best balance of speed, cost, and accuracy.
|
|
|
|
```bash
|
|
# Simple layouts (≤2 panels) use local analysis
|
|
python cli.py --all --hybrid --panel-threshold 2
|
|
|
|
# With panel splitting for complex layouts
|
|
python cli.py --all --hybrid --split-simple
|
|
|
|
# Advanced edge detection splitting
|
|
python cli.py --all --hybrid --split-advanced
|
|
|
|
# Vector similarity instead of inlier analysis
|
|
python cli.py --all --hybrid --vector-mode
|
|
|
|
# With fallback to OpenAI if needed
|
|
python cli.py --all --hybrid --fallback-one-at-a-time
|
|
```
|
|
|
|
#### OpenAI Mode
|
|
Full AI-powered detection with optional refinement.
|
|
|
|
```bash
|
|
# Standard mode (all masters in one API call)
|
|
python cli.py --limit 10 --openai
|
|
|
|
# One-at-a-time mode (one API call per master)
|
|
python cli.py --limit 10 --openai --one-at-a-time
|
|
|
|
# With CEN refinement for censorship handling
|
|
python cli.py --limit 10 --openai --cen-refinement
|
|
```
|
|
|
|
#### Vector Mode
|
|
Semantic similarity using embeddings.
|
|
|
|
```bash
|
|
# Process with vector embeddings
|
|
python cli.py --all --vector
|
|
|
|
# Adjust similarity threshold
|
|
python cli.py --all --vector --similarity-threshold 0.8
|
|
```
|
|
|
|
#### Gemini Mode
|
|
Google Gemini 2.5 Pro detection.
|
|
|
|
```bash
|
|
# Standard Gemini detection
|
|
python cli.py --limit 10 --gemini
|
|
```
|
|
|
|
### Key Options
|
|
|
|
**Detection Mode:**
|
|
- `--hybrid` - Hybrid detection mode (default)
|
|
- `--openai` - OpenAI detection mode
|
|
- `--vector` - Vector similarity mode
|
|
- `--gemini` - Gemini detection mode
|
|
|
|
**Processing:**
|
|
- `--test` - Test with 1 layout
|
|
- `--limit N` - Process first N layouts
|
|
- `--all` - Process all layouts
|
|
- `--specific-file FILE` - Process specific file
|
|
|
|
**Hybrid Options:**
|
|
- `--panel-threshold N` - Panel threshold for routing (default: 2)
|
|
- `--split-simple` - Use simple even division splitting
|
|
- `--split-advanced` - Use advanced edge detection splitting
|
|
- `--vector-mode` - Use vector similarity instead of inlier analysis
|
|
- `--fallback-one-at-a-time` - Enable OpenAI fallback
|
|
- `--parallel-layouts` - Enable parallel layout processing
|
|
- `--no-truncation` - Disable match truncation to panel count
|
|
|
|
**Cost Tracking:**
|
|
- `--enable-cost-tracking` - Enable cost tracking (disabled by default)
|
|
- `--cost-report` - Generate detailed cost report
|
|
- `--cost-estimate N` - Estimate monthly cost for N layouts
|
|
|
|
**Worker Configuration:**
|
|
- `--openai-workers N` - OpenAI worker count (default: auto)
|
|
- `--local-workers N` - Local analysis workers (default: auto)
|
|
- `--layout-workers N` - Parallel layout workers (default: auto)
|
|
|
|
**Other:**
|
|
- `--output NAME` - Custom output filename
|
|
- `--help` - Show all options
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
#### Detection Engines
|
|
|
|
1. **HybridImageDetector** (`hybrid_detector.py`)
|
|
- Main hybrid detection implementation
|
|
- Routes layouts based on panel count
|
|
- Integrates OpenAI, local analysis, and splitting
|
|
- Handles parallel processing coordination
|
|
|
|
2. **OpenAIImageDetector** (`openai_detector.py`)
|
|
- OpenAI O3 mini integration
|
|
- Panel counting and censorship detection
|
|
- One-at-a-time and batch detection modes
|
|
- CEN refinement for censored content
|
|
|
|
3. **VectorDetector** (`vector_detector.py`)
|
|
- Google Vertex AI multimodal embeddings
|
|
- Cosine similarity matching
|
|
- Embedding caching for performance
|
|
|
|
4. **GeminiDetector** (`gemini_detector.py`)
|
|
- Google Gemini 2.5 Pro integration
|
|
- Visual reasoning and analysis
|
|
|
|
#### Panel Splitting
|
|
|
|
1. **PanelSplitter** (`panel_splitter.py`)
|
|
- Multi-method panel splitting
|
|
- Optimized Canny edge detection
|
|
- Hough line transform for separators
|
|
- Tuned for 14-panel detection
|
|
|
|
2. **AdvancedPanelSplitter** (`advanced_splitter.py`)
|
|
- Edge detection and gutter analysis
|
|
- Sobel gradient detection
|
|
- Configurable percentile thresholds
|
|
|
|
3. **SimplePanelSplitter** (`simple_splitter.py`)
|
|
- Simple even division
|
|
- Fast horizontal splitting
|
|
- Grid layout support
|
|
|
|
#### Supporting Systems
|
|
|
|
1. **Cost Calculator** (`cost_calculator.py`)
|
|
- Tracks OpenAI API usage
|
|
- Per-layout and session cost tracking
|
|
- Monthly cost estimation
|
|
- Detailed JSON reports
|
|
|
|
2. **Memory Manager** (`memory_manager.py`)
|
|
- Prevents memory exhaustion
|
|
- Dynamic worker adjustment
|
|
- System resource monitoring
|
|
|
|
3. **Logging Config** (`logging_config.py`)
|
|
- Dual output (terminal + file)
|
|
- Crash tracking
|
|
- System diagnostics
|
|
|
|
4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`)
|
|
- Serial execution of inlier analysis
|
|
- Task queue management
|
|
- Prevents system overload
|
|
|
|
### Workflow
|
|
|
|
#### Hybrid Mode Workflow
|
|
|
|
1. **OpenAI Analysis** (1 API call)
|
|
- Count panels in layout
|
|
- Detect censorship status
|
|
- Consolidated analysis
|
|
|
|
2. **Detection Routing**
|
|
- ≤ panel_threshold: Direct local/vector analysis
|
|
- > panel_threshold: Split + local/vector analysis
|
|
|
|
3. **Local Analysis** (no API calls)
|
|
- OpenCV AKAZE feature detection
|
|
- Multiprocessing for speed
|
|
- RANSAC homography estimation
|
|
- Inlier-based confidence scoring
|
|
|
|
4. **Post-Processing**
|
|
- CEN refinement (if enabled)
|
|
- Deduplication
|
|
- Truncation to panel count
|
|
- Confidence scoring
|
|
|
|
5. **Optional Fallback** (if enabled)
|
|
- Triggers when matches < panels
|
|
- OpenAI one-at-a-time detection
|
|
- Additional API calls only when needed
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
master_adapt_detect/
|
|
├── cli.py # Main command-line interface
|
|
├── hybrid_detector.py # Hybrid detection engine
|
|
├── openai_detector.py # OpenAI detection engine
|
|
├── vector_detector.py # Vector similarity engine
|
|
├── gemini_detector.py # Gemini detection engine
|
|
├── panel_splitter.py # Traditional panel splitter
|
|
├── advanced_splitter.py # Advanced edge detection splitter
|
|
├── simple_splitter.py # Simple even division splitter
|
|
├── cost_calculator.py # Cost tracking system
|
|
├── memory_manager.py # Memory management
|
|
├── logging_config.py # Logging configuration
|
|
├── requirements.txt # Python dependencies
|
|
├── .env # API keys (not in git)
|
|
├── master_images/ # Master images to detect (41 images)
|
|
├── layouts/ # Layout images to process (299+ images)
|
|
├── results/ # JSON output files
|
|
└── embeddings_cache/ # Cached vector embeddings
|
|
```
|
|
|
|
## Output Format
|
|
|
|
Results are saved as JSON files with detailed metadata.
|
|
|
|
### Example Output
|
|
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"total_layouts_processed": 10,
|
|
"total_master_images": 41,
|
|
"provider": "hybrid",
|
|
"model": "openai_o3_plus_local_analysis",
|
|
"panel_threshold": 2,
|
|
"processing_mode": "hybrid"
|
|
},
|
|
"results": {
|
|
"6814786": {
|
|
"layout_filename": "6814786.jpg",
|
|
"detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
|
|
"detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
|
|
"detection_method": "local_inlier_analysis",
|
|
"panel_count": 2,
|
|
"confidence_score": 100.0,
|
|
"panel_analysis": {
|
|
"panel_count": 2,
|
|
"confidence": "high"
|
|
},
|
|
"censorship_analysis": {
|
|
"is_censored": false,
|
|
"confidence": "high"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Cost Tracking
|
|
|
|
Cost tracking monitors OpenAI API usage and provides detailed reports.
|
|
|
|
### Enable Cost Tracking
|
|
|
|
```bash
|
|
# Enable tracking
|
|
python cli.py --test --hybrid --enable-cost-tracking
|
|
|
|
# With detailed report
|
|
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
|
|
|
|
# With monthly estimate
|
|
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
|
|
```
|
|
|
|
### Cost Report Output
|
|
|
|
- **Session summary** - Total cost, tokens, API calls
|
|
- **Per-layout breakdown** - Cost for each layout
|
|
- **Operation analysis** - Cost by operation type
|
|
- **Monthly estimates** - Projected monthly/annual costs
|
|
- **JSON reports** - Detailed cost data in `results/`
|
|
|
|
See `COST_TRACKING_README.md` for complete documentation.
|
|
|
|
## Performance
|
|
|
|
### Hybrid Mode Benefits
|
|
|
|
- **97.6% cost reduction** vs OpenAI one-at-a-time mode
|
|
- **1 API call per layout** for panel analysis
|
|
- **Zero API calls** for matching (local analysis)
|
|
- **Parallel processing** for throughput
|
|
- **Memory-safe** with dynamic adjustment
|
|
|
|
### Benchmarks
|
|
|
|
- **Simple layouts (≤2 panels)**: ~2-3 seconds per layout
|
|
- **Complex layouts (>2 panels)**: ~5-7 seconds per layout
|
|
- **Parallel mode**: ~50-100 layouts per minute (system dependent)
|
|
- **Memory usage**: Dynamic adjustment prevents exhaustion
|
|
|
|
## Advanced Features
|
|
|
|
### Parallel Layout Processing
|
|
|
|
Process multiple layouts concurrently with coordinated inlier analysis.
|
|
|
|
```bash
|
|
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
|
|
```
|
|
|
|
### CEN Refinement
|
|
|
|
Automatically switch between censored (CEN) and uncensored versions.
|
|
|
|
```bash
|
|
python cli.py --all --hybrid --cen-refinement
|
|
```
|
|
|
|
### Custom Splitting Parameters
|
|
|
|
Fine-tune panel splitting behavior.
|
|
|
|
```bash
|
|
# Advanced splitter with custom thresholds
|
|
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10
|
|
|
|
# Adjust inlier thresholds
|
|
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
|
|
```
|
|
|
|
### Image Preprocessing
|
|
|
|
Enhance detection accuracy with preprocessing.
|
|
|
|
```bash
|
|
# Greyscale conversion
|
|
python cli.py --all --hybrid --enable-greyscale
|
|
|
|
# Contrast enhancement
|
|
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**"Cost tracking is disabled"**
|
|
- Add `--enable-cost-tracking` flag to enable cost monitoring
|
|
|
|
**"Memory usage too high"**
|
|
- System will auto-adjust workers
|
|
- Reduce `--local-workers` or `--layout-workers` manually
|
|
|
|
**"Too many open files"**
|
|
- Reduce concurrent workers
|
|
- System will auto-recover and limit workers
|
|
|
|
**"No matches found"**
|
|
- Try different detection modes
|
|
- Adjust inlier thresholds
|
|
- Enable fallback mode
|
|
|
|
### Memory Management
|
|
|
|
The system includes automatic memory management:
|
|
- Monitors RAM and swap usage
|
|
- Dynamically adjusts worker counts
|
|
- Prevents system crashes
|
|
- Logs resource usage
|
|
|
|
### Logging
|
|
|
|
All processing is logged to both terminal and file:
|
|
- Log files: `master_adapt_detect_TIMESTAMP.log`
|
|
- Includes system diagnostics
|
|
- Crash tracking with full traceback
|
|
- Resource usage at crash time
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Test hybrid mode
|
|
python test_hybrid.py
|
|
|
|
# Test cost tracking
|
|
python test_cost_calculator.py
|
|
|
|
# Test panel splitting
|
|
python test_split_mode.py
|
|
```
|
|
|
|
### Adding New Detection Modes
|
|
|
|
1. Create new detector class inheriting from base
|
|
2. Implement required methods:
|
|
- `detect_images_in_layout()`
|
|
- `process_all_layouts()`
|
|
3. Add CLI integration in `cli.py`
|
|
4. Update documentation
|
|
|
|
## OpenAI Pricing (2025)
|
|
|
|
- **Input tokens**: $2.00 per million
|
|
- **Cached input**: $0.50 per million
|
|
- **Output tokens**: $8.00 per million
|
|
|
|
Hybrid mode achieves significant cost savings by minimizing API calls.
|
|
|
|
## License
|
|
|
|
[License information]
|
|
|
|
## Credits
|
|
|
|
Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.
|