# Master Adapt Detect - Developer Documentation ## For AI Assistants and Developers This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project. ## Project Purpose Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task. ## Core Architecture ### System Design Philosophy The system follows a **multi-strategy detection** approach with these design principles: 1. **Cost Optimization** - Minimize API costs while maintaining accuracy 2. **Flexibility** - Support multiple detection engines for different use cases 3. **Performance** - Parallel processing with memory management 4. **Robustness** - Automatic fallbacks and error recovery ### Detection Modes The system provides 4 detection modes, each with specific use cases: #### 1. Hybrid Mode (Primary/Recommended) - **File**: `hybrid_detector.py` (2939 lines) - **Purpose**: Balance speed, cost, and accuracy - **Strategy**: OpenAI O3 for panel analysis + local CV for matching - **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time) **How it works:** 1. Single OpenAI API call to count panels and detect censorship 2. Route based on panel count: - ≤ threshold: Direct local inlier analysis - > threshold: Split layout first, then inlier analysis on each panel 3. Post-process with deduplication, CEN refinement, truncation 4. Optional fallback to OpenAI one-at-a-time if insufficient matches **Key classes:** - `HybridImageDetector` - Main orchestrator - `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode - `ProgressTracker` - Thread-safe progress monitoring #### 2. OpenAI Mode - **File**: `openai_detector.py` - **Purpose**: Pure AI-powered detection - **Strategy**: GPT-4 vision for direct image comparison - **Cost**: 1-41 API calls per layout depending on mode **Modes:** - Standard: All masters in one API call - One-at-a-time: Separate API call per master (expensive but thorough) #### 3. Vector Mode - **File**: `vector_detector.py` - **Purpose**: Semantic similarity matching - **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions) - **Cost**: No OpenAI costs, uses Google Cloud **Features:** - Embedding caching for performance - Cosine similarity matching - Threshold-based filtering #### 4. Gemini Mode - **File**: `gemini_detector.py` - **Purpose**: Alternative AI detection - **Strategy**: Google Gemini 2.5 Pro visual reasoning - **Cost**: Google AI API (not OpenAI) ### Panel Splitting Strategies The system provides 3 panel splitting approaches for complex multi-panel layouts: #### 1. Traditional Multi-Method Splitter - **File**: `panel_splitter.py` (857 lines) - **Strategy**: Optimized Canny edge detection + Hough transform - **Tuning**: Specifically tuned for 14-panel detection - **Parameters**: Thresholds, kernel sizes, line detection params #### 2. Advanced Edge Detection Splitter - **File**: `advanced_splitter.py` (200+ lines) - **Strategy**: Sobel gradient analysis + gutter detection - **Parameters**: - `percentile`: Low-energy column threshold (default: 10) - `min_gap`: Minimum gutter width (default: 5) #### 3. Simple Even Division Splitter - **File**: `simple_splitter.py` (132 lines) - **Strategy**: Equal division based on panel count - **Use case**: Fast processing when layout is regular grid ### Supporting Systems #### Cost Calculator - **File**: `cost_calculator.py` (440 lines) - **Purpose**: Track OpenAI API usage and costs - **Features**: - Per-layout cost breakdown - Session summaries - Monthly estimation - JSON report generation - **Important**: Disabled by default, requires `--enable-cost-tracking` flag **Data structures:** - `TokenUsage` - Track token counts for single API call - `ApiCallCost` - Cost info for single API call - `LayoutCostSummary` - Aggregated cost for one layout - `CostCalculator` - Main tracking class #### Memory Manager - **File**: `memory_manager.py` (119 lines) - **Purpose**: Prevent system crashes from memory exhaustion - **Features**: - RAM and swap monitoring - Dynamic worker adjustment - Safe execution decorators - Feature count limiting **Thresholds:** - Max memory: 80% (configurable) - Max swap: 80% (warning only, doesn't throttle) #### Logging Configuration - **File**: `logging_config.py` (128 lines) - **Purpose**: Dual output (terminal + file) for debugging crashes - **Features**: - Timestamped log files - Exception tracking with resource usage - System diagnostics on startup ### Command-Line Interface - **File**: `cli.py` - **Purpose**: Unified interface for all detection modes - **Features**: - Argument parsing for all modes - Mode-specific configuration - Results aggregation - Cost reporting **Key command patterns:** ```bash # Detection mode selection --hybrid / --openai / --vector / --gemini # Processing scope --test / --limit N / --all / --specific-file FILE # Hybrid-specific --panel-threshold N --split-simple / --split-advanced --vector-mode --fallback-one-at-a-time --parallel-layouts # Cost tracking --enable-cost-tracking --cost-report --cost-estimate N ``` ## Key Algorithms ### Local Inlier Analysis (Hybrid Mode) **Algorithm**: OpenCV AKAZE features + RANSAC homography estimation **Process**: 1. Detect AKAZE keypoints in layout and master images 2. Match descriptors using brute-force matcher with Hamming distance 3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches 4. Estimate homography using RANSAC (threshold: 7.0) 5. Count inliers and calculate confidence **Thresholds:** - `min_good_matches`: 10 (minimum matches before RANSAC) - `inlier_threshold`: 0.65 (relative to best match) - `inlier_ratio_threshold`: 0.4 (minimum inlier ratio) **Confidence levels:** - High: ≥30 inliers, ≥50% ratio - Medium: ≥15 inliers, ≥30% ratio - Low: Below medium thresholds **Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing) ### Vector Similarity Analysis **Algorithm**: Cosine similarity on 1408-dimensional embeddings **Process**: 1. Generate embedding for layout using Vertex AI 2. Compare against cached master embeddings 3. Calculate cosine similarity for each master 4. Filter by threshold (default: 0.75) 5. Sort by similarity descending **Formula**: ``` similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2)) ``` **Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl` ### Panel Splitting (Canny Detection) **Algorithm**: Multi-threshold Canny + Hough line transform **Process**: 1. Apply Canny edge detection at multiple thresholds: - (50, 150), (100, 200), (150, 250) 2. Morphological closing with (3, 1) kernel 3. Combine edge maps with maximum operation 4. Hough line transform for horizontal lines: - Threshold: 1324 - Min length: 3530 - Max gap: 1059 5. Filter for nearly horizontal lines (< 5% slope) 6. Create panel bounds from separator positions **Tuning**: Parameters specifically optimized for 14-panel detection accuracy ### CEN Refinement **Algorithm**: Censorship-aware master image selection **Process**: 1. Detect if layout is censored (OpenAI analysis) 2. For each detected CEN (censored) master: - If layout is uncensored and non-CEN version exists: Switch to non-CEN - If layout is censored or no alternative: Keep CEN version 3. Update results with refinement metadata **Naming convention**: `*CEN*` in master ID indicates censored version ## Parallel Processing Architecture ### Serial Inlier Analysis Coordinator **Problem**: Parallel inlier analysis causes memory exhaustion and crashes **Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing **Architecture:** ``` Multiple Layout Workers (parallel) ↓ Task Queue ↓ Single Inlier Worker (serial) ↓ Results back to layout workers ``` **Components:** - `InlierAnalysisCoordinator` - Manages serial execution - Task queue - Queues inlier analysis requests - Worker thread - Processes tasks one at a time - Futures - Async communication between layout and inlier workers **Benefits:** - Prevents memory explosion from too many concurrent inlier analyses - Allows multiple layouts to be processed in parallel - Coordinates resource usage across system ### Dynamic Worker Adjustment **Monitoring:** - Memory usage (RAM percentage) - Swap usage (swap percentage) - Queue size (backlog of inlier tasks) - Open file descriptors **Adjustment triggers:** - Memory > 85%: Reduce workers - Swap > 95% AND Memory > 80%: Reduce workers - Queue size ≥ 3: Reduce layout workers (producers) - Memory < 75% AND Swap < 80%: Increase workers **Auto-scaling:** - Layout workers: Start at min(4, CPU/2), adjust dynamically - Local workers: Start at CPU-2, adjust dynamically - OpenAI workers: Set to number of master images ## Important Implementation Details ### Multiprocessing Considerations **Challenge**: Python multiprocessing requires pickleable functions **Solutions:** 1. `process_single_master_inlier_analysis()` - Standalone function (not class method) 2. All imports inside function to ensure worker processes have dependencies 3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors) **Memory safety:** - Feature limiting: Max 10,000-15,000 features per image - Dynamic worker reduction based on feature count - Forced garbage collection after processing ### Cost Tracking Integration **Important**: Cost tracking is DISABLED by default **Reason**: Avoid repetitive initialization messages from multiprocessing workers **Integration points:** - `openai_detector.py`: After every OpenAI API call - `hybrid_detector.py`: Track all OpenAI operations - Results JSON: Cost breakdown per layout **Data flow:** 1. API call made → Extract token usage from response 2. Call `cost_calculator.track_api_call()` 3. Update session totals 4. Generate reports on demand ### Error Handling Patterns **OpenAI API errors:** ```python try: response = openai_call() except Exception as e: # Automatic retry logic # Fallback to alternative method # Return error result dict ``` **Memory errors:** ```python try: result = memory_intensive_operation() except MemoryError: # Reduce worker count # Force garbage collection # Retry with lower concurrency ``` **File descriptor exhaustion:** ```python except OSError as e: if "Too many open files" in str(e): # Limit concurrent workers # Clean up temp files # Force resource release ``` ## File Organization ### Core Detection Files - `hybrid_detector.py` - Hybrid detection (2939 lines) - `openai_detector.py` - OpenAI detection - `vector_detector.py` - Vector similarity - `gemini_detector.py` - Gemini detection ### Panel Splitting Files - `panel_splitter.py` - Traditional multi-method - `advanced_splitter.py` - Edge detection - `simple_splitter.py` - Even division ### Supporting Files - `cost_calculator.py` - Cost tracking - `memory_manager.py` - Memory management - `logging_config.py` - Logging configuration - `cli.py` - Command-line interface ### Test Files - `test_hybrid.py` - Hybrid mode tests - `test_cost_calculator.py` - Cost tracking tests - `test_split_mode.py` - Panel splitting tests - `test_panel_accuracy.py` - Panel detection accuracy - Various tuning and debug scripts ### Data Directories - `master_images/` - 41 master images to detect - `layouts/` - 299+ layout images to process - `results/` - JSON output files - `embeddings_cache/` - Cached vector embeddings ## Development Guidelines ### Adding New Features 1. **New Detection Mode:** - Create new detector class - Inherit from base detector if applicable - Implement `detect_images_in_layout()` method - Add CLI integration in `cli.py` - Update tests and documentation 2. **New Panel Splitting Method:** - Create new splitter class - Implement `split_panels(image_path, panel_count)` method - Return list of dicts with keys: `image`, `bounds`, `confidence`, `method` - Add CLI flag for selection - Test with various panel counts 3. **Cost Tracking for New API:** - Add extraction function for token usage - Track calls with `cost_calculator.track_api_call()` - Update operation types - Add to cost reports ### Testing Strategy 1. **Unit tests** - Individual components 2. **Integration tests** - Full detection pipeline 3. **Performance tests** - Memory and speed benchmarks 4. **Accuracy tests** - Panel detection accuracy 5. **Cost tests** - Verify tracking accuracy ### Performance Optimization Tips 1. **Reduce API calls** - Primary cost driver 2. **Cache embeddings** - Avoid regenerating 3. **Limit features** - Prevent memory explosion 4. **Use multiprocessing** - Parallel CPU work 5. **Monitor memory** - Dynamic adjustment 6. **Profile bottlenecks** - Optimize hot paths ### Common Pitfalls 1. **Multiprocessing pickle errors** - Use standalone functions 2. **Memory exhaustion** - Limit concurrent workers 3. **File descriptor limits** - Close files properly 4. **Cost calculator in workers** - Keep in main process only 5. **Swap as error condition** - Swap usage is OK, not error ## Configuration Reference ### Environment Variables ``` OPENAI_API_KEY - OpenAI API authentication GOOGLE_API_KEY - Google AI API authentication GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON ``` ### Default Values ```python # Hybrid mode panel_threshold = 2 inlier_threshold = 0.65 inlier_ratio_threshold = 0.4 min_good_matches = 10 similarity_threshold = 0.75 # vector mode # Workers (auto-detected) openai_workers = len(master_images) local_workers = max(1, cpu_count - 2) layout_workers = min(4, cpu_count // 2) # Memory management max_memory_percent = 75 max_swap_percent = 80 # Cost tracking enable_tracking = False # Must explicitly enable ``` ## Output Format Specification ### Results JSON Structure ```json { "metadata": { "total_layouts_processed": int, "total_master_images": int, "master_images_available": [str], "provider": str, "model": str, "panel_threshold": int, "inlier_threshold": float, "processing_mode": str, "cost_tracking": {dict} | null }, "results": { "layout_id": { "layout_filename": str, "detected_master_ids": [str], "detected_master_filenames": [str], "detection_method": str, "panel_count": int, "confidence_score": float, "panel_analysis": {dict}, "censorship_analysis": {dict}, "truncation_applied": bool, "deduplication_applied": bool, "cost_breakdown": {dict} | null } } } ``` ## Debugging Tips ### Enable Debug Logging ```python # In code import logging logging.basicConfig(level=logging.DEBUG) # Via environment export LOG_LEVEL=DEBUG ``` ### Memory Issues ```bash # Check current memory python check_system_resources.py # Test with memory fix python test_memory_fix.py # Run with reduced workers python cli.py --all --hybrid --local-workers 1 --layout-workers 1 ``` ### Cost Tracking Issues ```bash # Verify cost tracking works python test_cost_calculator.py # Test integration python test_cost_tracking_integration.py # Run with tracking enabled python cli.py --test --hybrid --enable-cost-tracking ``` ### Panel Splitting Issues ```bash # Test splitting accuracy python test_panel_accuracy.py # Tune parameters python tune_14_panel_split.py # Debug specific layout python test_6786505_cli.py ``` ## API Costs (Current Pricing) ### OpenAI O3 (2025) - Input tokens: $2.00 / million - Cached input: $0.50 / million - Output tokens: $8.00 / million ### Typical Usage - Hybrid mode: ~$0.01-0.02 per layout - OpenAI mode: ~$0.02-0.05 per layout - One-at-a-time: ~$0.50-1.00 per layout ### Cost Optimization - Hybrid mode: 97.6% reduction vs one-at-a-time - Caching: Reduces input token costs - Batch processing: Amortizes overhead ## Future Enhancement Ideas 1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration 2. **Incremental processing** - Resume from saved progress 3. **Web interface** - Browser-based detection and visualization 4. **Active learning** - Use detection results to improve models 5. **Custom training** - Fine-tune models on domain-specific data 6. **Real-time processing** - Stream processing for live detection 7. **Distributed processing** - Multi-machine coordination 8. **Advanced caching** - Persistent result caching across runs ## Contact and Support For questions or issues: 1. Check logs in `master_adapt_detect_*.log` 2. Review cost reports in `results/` 3. Run diagnostic scripts 4. Check system resources 5. Review error messages carefully ## Version History Current implementation includes: - Multiple detection modes (Hybrid, OpenAI, Vector, Gemini) - Three panel splitting strategies - Cost tracking and reporting - Memory management and safety - Parallel processing with coordination - Dynamic worker adjustment - Comprehensive logging and debugging - Extensive configuration options Last major update: January 2025