# Master Adapt Detect A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features. ## Overview This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes: 1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision 2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini 3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search 4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis ## Key Features ### Detection Capabilities - **Multi-strategy detection** - Choose from 4 different detection engines - **Panel counting** - Automatic detection of number of panels in layouts - **Censorship detection** - Identifies censored vs uncensored content with CEN refinement - **Smart matching** - Handles cropped, scaled, rotated, and transformed images - **Confidence scoring** - Provides match confidence based on panel count and detected matches ### Hybrid Mode (Primary Feature) - **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection - **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex - **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division - **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching - **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching - **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed ### Processing Options - **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination - **Memory management** - Dynamic worker adjustment based on system resources - **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring - **Batch processing** - Process hundreds of layouts efficiently - **Progress tracking** - Real-time progress updates with ETA ## Installation ### Prerequisites - Python 3.8+ - OpenCV - Google Cloud credentials (for Vector mode) - OpenAI API key (for OpenAI/Hybrid modes) - Google AI API key (for Gemini mode) ### Setup ```bash # Clone the repository git clone cd master_adapt_detect # Create virtual environment python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Configure API keys cp .env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=your_openai_key # GOOGLE_API_KEY=your_google_ai_key # GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json ``` ## Usage ### Command Line Interface The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes. ```bash # Basic usage - hybrid mode with test python cli.py --test --hybrid # Process first 10 layouts in hybrid mode python cli.py --limit 10 --hybrid # Process all layouts with parallel processing python cli.py --all --hybrid --parallel-layouts # OpenAI mode with one-at-a-time comparison python cli.py --limit 10 --openai --one-at-a-time # Vector mode with similarity search python cli.py --all --vector # Enable cost tracking python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report ``` ### Detection Modes #### Hybrid Mode (Recommended) Best balance of speed, cost, and accuracy. ```bash # Simple layouts (≤2 panels) use local analysis python cli.py --all --hybrid --panel-threshold 2 # With panel splitting for complex layouts python cli.py --all --hybrid --split-simple # Advanced edge detection splitting python cli.py --all --hybrid --split-advanced # Vector similarity instead of inlier analysis python cli.py --all --hybrid --vector-mode # With fallback to OpenAI if needed python cli.py --all --hybrid --fallback-one-at-a-time ``` #### OpenAI Mode Full AI-powered detection with optional refinement. ```bash # Standard mode (all masters in one API call) python cli.py --limit 10 --openai # One-at-a-time mode (one API call per master) python cli.py --limit 10 --openai --one-at-a-time # With CEN refinement for censorship handling python cli.py --limit 10 --openai --cen-refinement ``` #### Vector Mode Semantic similarity using embeddings. ```bash # Process with vector embeddings python cli.py --all --vector # Adjust similarity threshold python cli.py --all --vector --similarity-threshold 0.8 ``` #### Gemini Mode Google Gemini 2.5 Pro detection. ```bash # Standard Gemini detection python cli.py --limit 10 --gemini ``` ### Key Options **Detection Mode:** - `--hybrid` - Hybrid detection mode (default) - `--openai` - OpenAI detection mode - `--vector` - Vector similarity mode - `--gemini` - Gemini detection mode **Processing:** - `--test` - Test with 1 layout - `--limit N` - Process first N layouts - `--all` - Process all layouts - `--specific-file FILE` - Process specific file **Hybrid Options:** - `--panel-threshold N` - Panel threshold for routing (default: 2) - `--split-simple` - Use simple even division splitting - `--split-advanced` - Use advanced edge detection splitting - `--vector-mode` - Use vector similarity instead of inlier analysis - `--fallback-one-at-a-time` - Enable OpenAI fallback - `--parallel-layouts` - Enable parallel layout processing - `--no-truncation` - Disable match truncation to panel count **Cost Tracking:** - `--enable-cost-tracking` - Enable cost tracking (disabled by default) - `--cost-report` - Generate detailed cost report - `--cost-estimate N` - Estimate monthly cost for N layouts **Worker Configuration:** - `--openai-workers N` - OpenAI worker count (default: auto) - `--local-workers N` - Local analysis workers (default: auto) - `--layout-workers N` - Parallel layout workers (default: auto) **Other:** - `--output NAME` - Custom output filename - `--help` - Show all options ## Architecture ### Core Components #### Detection Engines 1. **HybridImageDetector** (`hybrid_detector.py`) - Main hybrid detection implementation - Routes layouts based on panel count - Integrates OpenAI, local analysis, and splitting - Handles parallel processing coordination 2. **OpenAIImageDetector** (`openai_detector.py`) - OpenAI O3 mini integration - Panel counting and censorship detection - One-at-a-time and batch detection modes - CEN refinement for censored content 3. **VectorDetector** (`vector_detector.py`) - Google Vertex AI multimodal embeddings - Cosine similarity matching - Embedding caching for performance 4. **GeminiDetector** (`gemini_detector.py`) - Google Gemini 2.5 Pro integration - Visual reasoning and analysis #### Panel Splitting 1. **PanelSplitter** (`panel_splitter.py`) - Multi-method panel splitting - Optimized Canny edge detection - Hough line transform for separators - Tuned for 14-panel detection 2. **AdvancedPanelSplitter** (`advanced_splitter.py`) - Edge detection and gutter analysis - Sobel gradient detection - Configurable percentile thresholds 3. **SimplePanelSplitter** (`simple_splitter.py`) - Simple even division - Fast horizontal splitting - Grid layout support #### Supporting Systems 1. **Cost Calculator** (`cost_calculator.py`) - Tracks OpenAI API usage - Per-layout and session cost tracking - Monthly cost estimation - Detailed JSON reports 2. **Memory Manager** (`memory_manager.py`) - Prevents memory exhaustion - Dynamic worker adjustment - System resource monitoring 3. **Logging Config** (`logging_config.py`) - Dual output (terminal + file) - Crash tracking - System diagnostics 4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`) - Serial execution of inlier analysis - Task queue management - Prevents system overload ### Workflow #### Hybrid Mode Workflow 1. **OpenAI Analysis** (1 API call) - Count panels in layout - Detect censorship status - Consolidated analysis 2. **Detection Routing** - ≤ panel_threshold: Direct local/vector analysis - > panel_threshold: Split + local/vector analysis 3. **Local Analysis** (no API calls) - OpenCV AKAZE feature detection - Multiprocessing for speed - RANSAC homography estimation - Inlier-based confidence scoring 4. **Post-Processing** - CEN refinement (if enabled) - Deduplication - Truncation to panel count - Confidence scoring 5. **Optional Fallback** (if enabled) - Triggers when matches < panels - OpenAI one-at-a-time detection - Additional API calls only when needed ## Directory Structure ``` master_adapt_detect/ ├── cli.py # Main command-line interface ├── hybrid_detector.py # Hybrid detection engine ├── openai_detector.py # OpenAI detection engine ├── vector_detector.py # Vector similarity engine ├── gemini_detector.py # Gemini detection engine ├── panel_splitter.py # Traditional panel splitter ├── advanced_splitter.py # Advanced edge detection splitter ├── simple_splitter.py # Simple even division splitter ├── cost_calculator.py # Cost tracking system ├── memory_manager.py # Memory management ├── logging_config.py # Logging configuration ├── requirements.txt # Python dependencies ├── .env # API keys (not in git) ├── master_images/ # Master images to detect (41 images) ├── layouts/ # Layout images to process (299+ images) ├── results/ # JSON output files └── embeddings_cache/ # Cached vector embeddings ``` ## Output Format Results are saved as JSON files with detailed metadata. ### Example Output ```json { "metadata": { "total_layouts_processed": 10, "total_master_images": 41, "provider": "hybrid", "model": "openai_o3_plus_local_analysis", "panel_threshold": 2, "processing_mode": "hybrid" }, "results": { "6814786": { "layout_filename": "6814786.jpg", "detected_master_ids": ["1011A_1011_05", "1011A_1011_06"], "detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"], "detection_method": "local_inlier_analysis", "panel_count": 2, "confidence_score": 100.0, "panel_analysis": { "panel_count": 2, "confidence": "high" }, "censorship_analysis": { "is_censored": false, "confidence": "high" } } } } ``` ## Cost Tracking Cost tracking monitors OpenAI API usage and provides detailed reports. ### Enable Cost Tracking ```bash # Enable tracking python cli.py --test --hybrid --enable-cost-tracking # With detailed report python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report # With monthly estimate python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300 ``` ### Cost Report Output - **Session summary** - Total cost, tokens, API calls - **Per-layout breakdown** - Cost for each layout - **Operation analysis** - Cost by operation type - **Monthly estimates** - Projected monthly/annual costs - **JSON reports** - Detailed cost data in `results/` See `COST_TRACKING_README.md` for complete documentation. ## Performance ### Hybrid Mode Benefits - **97.6% cost reduction** vs OpenAI one-at-a-time mode - **1 API call per layout** for panel analysis - **Zero API calls** for matching (local analysis) - **Parallel processing** for throughput - **Memory-safe** with dynamic adjustment ### Benchmarks - **Simple layouts (≤2 panels)**: ~2-3 seconds per layout - **Complex layouts (>2 panels)**: ~5-7 seconds per layout - **Parallel mode**: ~50-100 layouts per minute (system dependent) - **Memory usage**: Dynamic adjustment prevents exhaustion ## Advanced Features ### Parallel Layout Processing Process multiple layouts concurrently with coordinated inlier analysis. ```bash python cli.py --all --hybrid --parallel-layouts --layout-workers 4 ``` ### CEN Refinement Automatically switch between censored (CEN) and uncensored versions. ```bash python cli.py --all --hybrid --cen-refinement ``` ### Custom Splitting Parameters Fine-tune panel splitting behavior. ```bash # Advanced splitter with custom thresholds python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10 # Adjust inlier thresholds python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5 ``` ### Image Preprocessing Enhance detection accuracy with preprocessing. ```bash # Greyscale conversion python cli.py --all --hybrid --enable-greyscale # Contrast enhancement python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5 ``` ## Troubleshooting ### Common Issues **"Cost tracking is disabled"** - Add `--enable-cost-tracking` flag to enable cost monitoring **"Memory usage too high"** - System will auto-adjust workers - Reduce `--local-workers` or `--layout-workers` manually **"Too many open files"** - Reduce concurrent workers - System will auto-recover and limit workers **"No matches found"** - Try different detection modes - Adjust inlier thresholds - Enable fallback mode ### Memory Management The system includes automatic memory management: - Monitors RAM and swap usage - Dynamically adjusts worker counts - Prevents system crashes - Logs resource usage ### Logging All processing is logged to both terminal and file: - Log files: `master_adapt_detect_TIMESTAMP.log` - Includes system diagnostics - Crash tracking with full traceback - Resource usage at crash time ## Development ### Running Tests ```bash # Test hybrid mode python test_hybrid.py # Test cost tracking python test_cost_calculator.py # Test panel splitting python test_split_mode.py ``` ### Adding New Detection Modes 1. Create new detector class inheriting from base 2. Implement required methods: - `detect_images_in_layout()` - `process_all_layouts()` 3. Add CLI integration in `cli.py` 4. Update documentation ## OpenAI Pricing (2025) - **Input tokens**: $2.00 per million - **Cached input**: $0.50 per million - **Output tokens**: $8.00 per million Hybrid mode achieves significant cost savings by minimizing API calls. ## License [License information] ## Credits Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.