| docs | ||
| example_code | ||
| .gitignore | ||
| advanced_splitter.py | ||
| check_system_resources.py | ||
| claude.md | ||
| cli.py | ||
| cost_calculator.py | ||
| COST_TRACKING_README.md | ||
| fix_stalled_processing.py | ||
| gemini_detector.py | ||
| hybrid_detector.py | ||
| image_detector.py | ||
| improved_splitting.py | ||
| logging_config.py | ||
| master_adapt_detector_diagram.md | ||
| MEMORY_FIX_SUMMARY.md | ||
| memory_manager.py | ||
| ONE_AT_A_TIME_COST_TRACKING.md | ||
| OPENAI_API_COST_TRACKING_VERIFICATION.md | ||
| openai_detector.py | ||
| optimize_split_parameters.py | ||
| panel_splitter.py | ||
| process_detection.py | ||
| README.md | ||
| requirements.txt | ||
| simple_splitter.py | ||
| SPLITTING_TEST_RESULTS.md | ||
| test_14_panel_split.py | ||
| test_6786505_cli.py | ||
| test_cost_calculator.py | ||
| test_cost_tracking_integration.py | ||
| test_horizontal_splitting.py | ||
| test_hybrid.py | ||
| test_memory_fix.py | ||
| test_one_at_a_time_cost_tracking.py | ||
| test_optimized_canny.py | ||
| test_panel_accuracy.py | ||
| test_parallel_implementation.py | ||
| test_simple_split.py | ||
| test_split_mode.py | ||
| tune_14_panel_split.py | ||
| vector_detector.py | ||
Master Adapt Detect
A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.
Overview
This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:
- Hybrid Mode (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
- OpenAI Mode - Full AI-powered detection using OpenAI O3 mini
- Vector Mode - Google Vertex AI multimodal embeddings for similarity search
- Gemini Mode - Google Gemini 2.5 Pro for visual analysis
Key Features
Detection Capabilities
- Multi-strategy detection - Choose from 4 different detection engines
- Panel counting - Automatic detection of number of panels in layouts
- Censorship detection - Identifies censored vs uncensored content with CEN refinement
- Smart matching - Handles cropped, scaled, rotated, and transformed images
- Confidence scoring - Provides match confidence based on panel count and detected matches
Hybrid Mode (Primary Feature)
- Cost optimization - 97.6% reduction in API costs vs one-at-a-time detection
- Intelligent routing - Uses local analysis for simple layouts (≤2 panels), split method for complex
- Panel splitting - Three splitting strategies: traditional, advanced edge detection, simple division
- Local inlier analysis - OpenCV AKAZE features with multiprocessing for fast matching
- Vector similarity - Optional Google Vertex AI embeddings for semantic matching
- Fallback support - Automatic fallback to OpenAI one-at-a-time when needed
Processing Options
- Parallel processing - Concurrent layout processing with serial inlier analysis coordination
- Memory management - Dynamic worker adjustment based on system resources
- Cost tracking - Comprehensive OpenAI API usage and cost monitoring
- Batch processing - Process hundreds of layouts efficiently
- Progress tracking - Real-time progress updates with ETA
Installation
Prerequisites
- Python 3.8+
- OpenCV
- Google Cloud credentials (for Vector mode)
- OpenAI API key (for OpenAI/Hybrid modes)
- Google AI API key (for Gemini mode)
Setup
# Clone the repository
git clone <repository-url>
cd master_adapt_detect
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# GOOGLE_API_KEY=your_google_ai_key
# GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
Usage
Command Line Interface
The main entry point is cli.py which provides a comprehensive CLI for all detection modes.
# Basic usage - hybrid mode with test
python cli.py --test --hybrid
# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid
# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts
# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time
# Vector mode with similarity search
python cli.py --all --vector
# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
Detection Modes
Hybrid Mode (Recommended)
Best balance of speed, cost, and accuracy.
# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2
# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple
# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced
# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode
# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time
OpenAI Mode
Full AI-powered detection with optional refinement.
# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai
# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time
# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement
Vector Mode
Semantic similarity using embeddings.
# Process with vector embeddings
python cli.py --all --vector
# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8
Gemini Mode
Google Gemini 2.5 Pro detection.
# Standard Gemini detection
python cli.py --limit 10 --gemini
Key Options
Detection Mode:
--hybrid- Hybrid detection mode (default)--openai- OpenAI detection mode--vector- Vector similarity mode--gemini- Gemini detection mode
Processing:
--test- Test with 1 layout--limit N- Process first N layouts--all- Process all layouts--specific-file FILE- Process specific file
Hybrid Options:
--panel-threshold N- Panel threshold for routing (default: 2)--split-simple- Use simple even division splitting--split-advanced- Use advanced edge detection splitting--vector-mode- Use vector similarity instead of inlier analysis--fallback-one-at-a-time- Enable OpenAI fallback--parallel-layouts- Enable parallel layout processing--no-truncation- Disable match truncation to panel count
Cost Tracking:
--enable-cost-tracking- Enable cost tracking (disabled by default)--cost-report- Generate detailed cost report--cost-estimate N- Estimate monthly cost for N layouts
Worker Configuration:
--openai-workers N- OpenAI worker count (default: auto)--local-workers N- Local analysis workers (default: auto)--layout-workers N- Parallel layout workers (default: auto)
Other:
--output NAME- Custom output filename--help- Show all options
Architecture
Core Components
Detection Engines
-
HybridImageDetector (
hybrid_detector.py)- Main hybrid detection implementation
- Routes layouts based on panel count
- Integrates OpenAI, local analysis, and splitting
- Handles parallel processing coordination
-
OpenAIImageDetector (
openai_detector.py)- OpenAI O3 mini integration
- Panel counting and censorship detection
- One-at-a-time and batch detection modes
- CEN refinement for censored content
-
VectorDetector (
vector_detector.py)- Google Vertex AI multimodal embeddings
- Cosine similarity matching
- Embedding caching for performance
-
GeminiDetector (
gemini_detector.py)- Google Gemini 2.5 Pro integration
- Visual reasoning and analysis
Panel Splitting
-
PanelSplitter (
panel_splitter.py)- Multi-method panel splitting
- Optimized Canny edge detection
- Hough line transform for separators
- Tuned for 14-panel detection
-
AdvancedPanelSplitter (
advanced_splitter.py)- Edge detection and gutter analysis
- Sobel gradient detection
- Configurable percentile thresholds
-
SimplePanelSplitter (
simple_splitter.py)- Simple even division
- Fast horizontal splitting
- Grid layout support
Supporting Systems
-
Cost Calculator (
cost_calculator.py)- Tracks OpenAI API usage
- Per-layout and session cost tracking
- Monthly cost estimation
- Detailed JSON reports
-
Memory Manager (
memory_manager.py)- Prevents memory exhaustion
- Dynamic worker adjustment
- System resource monitoring
-
Logging Config (
logging_config.py)- Dual output (terminal + file)
- Crash tracking
- System diagnostics
-
InlierAnalysisCoordinator (in
hybrid_detector.py)- Serial execution of inlier analysis
- Task queue management
- Prevents system overload
Workflow
Hybrid Mode Workflow
-
OpenAI Analysis (1 API call)
- Count panels in layout
- Detect censorship status
- Consolidated analysis
-
Detection Routing
- ≤ panel_threshold: Direct local/vector analysis
-
panel_threshold: Split + local/vector analysis
-
Local Analysis (no API calls)
- OpenCV AKAZE feature detection
- Multiprocessing for speed
- RANSAC homography estimation
- Inlier-based confidence scoring
-
Post-Processing
- CEN refinement (if enabled)
- Deduplication
- Truncation to panel count
- Confidence scoring
-
Optional Fallback (if enabled)
- Triggers when matches < panels
- OpenAI one-at-a-time detection
- Additional API calls only when needed
Directory Structure
master_adapt_detect/
├── cli.py # Main command-line interface
├── hybrid_detector.py # Hybrid detection engine
├── openai_detector.py # OpenAI detection engine
├── vector_detector.py # Vector similarity engine
├── gemini_detector.py # Gemini detection engine
├── panel_splitter.py # Traditional panel splitter
├── advanced_splitter.py # Advanced edge detection splitter
├── simple_splitter.py # Simple even division splitter
├── cost_calculator.py # Cost tracking system
├── memory_manager.py # Memory management
├── logging_config.py # Logging configuration
├── requirements.txt # Python dependencies
├── .env # API keys (not in git)
├── master_images/ # Master images to detect (41 images)
├── layouts/ # Layout images to process (299+ images)
├── results/ # JSON output files
└── embeddings_cache/ # Cached vector embeddings
Output Format
Results are saved as JSON files with detailed metadata.
Example Output
{
"metadata": {
"total_layouts_processed": 10,
"total_master_images": 41,
"provider": "hybrid",
"model": "openai_o3_plus_local_analysis",
"panel_threshold": 2,
"processing_mode": "hybrid"
},
"results": {
"6814786": {
"layout_filename": "6814786.jpg",
"detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
"detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
"detection_method": "local_inlier_analysis",
"panel_count": 2,
"confidence_score": 100.0,
"panel_analysis": {
"panel_count": 2,
"confidence": "high"
},
"censorship_analysis": {
"is_censored": false,
"confidence": "high"
}
}
}
}
Cost Tracking
Cost tracking monitors OpenAI API usage and provides detailed reports.
Enable Cost Tracking
# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking
# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
Cost Report Output
- Session summary - Total cost, tokens, API calls
- Per-layout breakdown - Cost for each layout
- Operation analysis - Cost by operation type
- Monthly estimates - Projected monthly/annual costs
- JSON reports - Detailed cost data in
results/
See COST_TRACKING_README.md for complete documentation.
Performance
Hybrid Mode Benefits
- 97.6% cost reduction vs OpenAI one-at-a-time mode
- 1 API call per layout for panel analysis
- Zero API calls for matching (local analysis)
- Parallel processing for throughput
- Memory-safe with dynamic adjustment
Benchmarks
- Simple layouts (≤2 panels): ~2-3 seconds per layout
- Complex layouts (>2 panels): ~5-7 seconds per layout
- Parallel mode: ~50-100 layouts per minute (system dependent)
- Memory usage: Dynamic adjustment prevents exhaustion
Advanced Features
Parallel Layout Processing
Process multiple layouts concurrently with coordinated inlier analysis.
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
CEN Refinement
Automatically switch between censored (CEN) and uncensored versions.
python cli.py --all --hybrid --cen-refinement
Custom Splitting Parameters
Fine-tune panel splitting behavior.
# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10
# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
Image Preprocessing
Enhance detection accuracy with preprocessing.
# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale
# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
Troubleshooting
Common Issues
"Cost tracking is disabled"
- Add
--enable-cost-trackingflag to enable cost monitoring
"Memory usage too high"
- System will auto-adjust workers
- Reduce
--local-workersor--layout-workersmanually
"Too many open files"
- Reduce concurrent workers
- System will auto-recover and limit workers
"No matches found"
- Try different detection modes
- Adjust inlier thresholds
- Enable fallback mode
Memory Management
The system includes automatic memory management:
- Monitors RAM and swap usage
- Dynamically adjusts worker counts
- Prevents system crashes
- Logs resource usage
Logging
All processing is logged to both terminal and file:
- Log files:
master_adapt_detect_TIMESTAMP.log - Includes system diagnostics
- Crash tracking with full traceback
- Resource usage at crash time
Development
Running Tests
# Test hybrid mode
python test_hybrid.py
# Test cost tracking
python test_cost_calculator.py
# Test panel splitting
python test_split_mode.py
Adding New Detection Modes
- Create new detector class inheriting from base
- Implement required methods:
detect_images_in_layout()process_all_layouts()
- Add CLI integration in
cli.py - Update documentation
OpenAI Pricing (2025)
- Input tokens: $2.00 per million
- Cached input: $0.50 per million
- Output tokens: $8.00 per million
Hybrid mode achieves significant cost savings by minimizing API calls.
License
[License information]
Credits
Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.