17 KiB
Master Adapt Detect - Developer Documentation
For AI Assistants and Developers
This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.
Project Purpose
Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.
Core Architecture
System Design Philosophy
The system follows a multi-strategy detection approach with these design principles:
- Cost Optimization - Minimize API costs while maintaining accuracy
- Flexibility - Support multiple detection engines for different use cases
- Performance - Parallel processing with memory management
- Robustness - Automatic fallbacks and error recovery
Detection Modes
The system provides 4 detection modes, each with specific use cases:
1. Hybrid Mode (Primary/Recommended)
- File:
hybrid_detector.py(2939 lines) - Purpose: Balance speed, cost, and accuracy
- Strategy: OpenAI O3 for panel analysis + local CV for matching
- Cost: ~1 API call per layout (97.6% reduction vs one-at-a-time)
How it works:
- Single OpenAI API call to count panels and detect censorship
- Route based on panel count:
- ≤ threshold: Direct local inlier analysis
-
threshold: Split layout first, then inlier analysis on each panel
- Post-process with deduplication, CEN refinement, truncation
- Optional fallback to OpenAI one-at-a-time if insufficient matches
Key classes:
HybridImageDetector- Main orchestratorInlierAnalysisCoordinator- Serial execution coordinator for parallel modeProgressTracker- Thread-safe progress monitoring
2. OpenAI Mode
- File:
openai_detector.py - Purpose: Pure AI-powered detection
- Strategy: GPT-4 vision for direct image comparison
- Cost: 1-41 API calls per layout depending on mode
Modes:
- Standard: All masters in one API call
- One-at-a-time: Separate API call per master (expensive but thorough)
3. Vector Mode
- File:
vector_detector.py - Purpose: Semantic similarity matching
- Strategy: Google Vertex AI multimodal embeddings (1408 dimensions)
- Cost: No OpenAI costs, uses Google Cloud
Features:
- Embedding caching for performance
- Cosine similarity matching
- Threshold-based filtering
4. Gemini Mode
- File:
gemini_detector.py - Purpose: Alternative AI detection
- Strategy: Google Gemini 2.5 Pro visual reasoning
- Cost: Google AI API (not OpenAI)
Panel Splitting Strategies
The system provides 3 panel splitting approaches for complex multi-panel layouts:
1. Traditional Multi-Method Splitter
- File:
panel_splitter.py(857 lines) - Strategy: Optimized Canny edge detection + Hough transform
- Tuning: Specifically tuned for 14-panel detection
- Parameters: Thresholds, kernel sizes, line detection params
2. Advanced Edge Detection Splitter
- File:
advanced_splitter.py(200+ lines) - Strategy: Sobel gradient analysis + gutter detection
- Parameters:
percentile: Low-energy column threshold (default: 10)min_gap: Minimum gutter width (default: 5)
3. Simple Even Division Splitter
- File:
simple_splitter.py(132 lines) - Strategy: Equal division based on panel count
- Use case: Fast processing when layout is regular grid
Supporting Systems
Cost Calculator
- File:
cost_calculator.py(440 lines) - Purpose: Track OpenAI API usage and costs
- Features:
- Per-layout cost breakdown
- Session summaries
- Monthly estimation
- JSON report generation
- Important: Disabled by default, requires
--enable-cost-trackingflag
Data structures:
TokenUsage- Track token counts for single API callApiCallCost- Cost info for single API callLayoutCostSummary- Aggregated cost for one layoutCostCalculator- Main tracking class
Memory Manager
- File:
memory_manager.py(119 lines) - Purpose: Prevent system crashes from memory exhaustion
- Features:
- RAM and swap monitoring
- Dynamic worker adjustment
- Safe execution decorators
- Feature count limiting
Thresholds:
- Max memory: 80% (configurable)
- Max swap: 80% (warning only, doesn't throttle)
Logging Configuration
- File:
logging_config.py(128 lines) - Purpose: Dual output (terminal + file) for debugging crashes
- Features:
- Timestamped log files
- Exception tracking with resource usage
- System diagnostics on startup
Command-Line Interface
- File:
cli.py - Purpose: Unified interface for all detection modes
- Features:
- Argument parsing for all modes
- Mode-specific configuration
- Results aggregation
- Cost reporting
Key command patterns:
# Detection mode selection
--hybrid / --openai / --vector / --gemini
# Processing scope
--test / --limit N / --all / --specific-file FILE
# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts
# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N
Key Algorithms
Local Inlier Analysis (Hybrid Mode)
Algorithm: OpenCV AKAZE features + RANSAC homography estimation
Process:
- Detect AKAZE keypoints in layout and master images
- Match descriptors using brute-force matcher with Hamming distance
- Apply Lowe's ratio test (threshold: 0.80) to filter good matches
- Estimate homography using RANSAC (threshold: 7.0)
- Count inliers and calculate confidence
Thresholds:
min_good_matches: 10 (minimum matches before RANSAC)inlier_threshold: 0.65 (relative to best match)inlier_ratio_threshold: 0.4 (minimum inlier ratio)
Confidence levels:
- High: ≥30 inliers, ≥50% ratio
- Medium: ≥15 inliers, ≥30% ratio
- Low: Below medium thresholds
Implementation: process_single_master_inlier_analysis() function (standalone for multiprocessing)
Vector Similarity Analysis
Algorithm: Cosine similarity on 1408-dimensional embeddings
Process:
- Generate embedding for layout using Vertex AI
- Compare against cached master embeddings
- Calculate cosine similarity for each master
- Filter by threshold (default: 0.75)
- Sort by similarity descending
Formula:
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
Caching: Embeddings stored in embeddings_cache/master_embeddings.pkl
Panel Splitting (Canny Detection)
Algorithm: Multi-threshold Canny + Hough line transform
Process:
- Apply Canny edge detection at multiple thresholds:
- (50, 150), (100, 200), (150, 250)
- Morphological closing with (3, 1) kernel
- Combine edge maps with maximum operation
- Hough line transform for horizontal lines:
- Threshold: 1324
- Min length: 3530
- Max gap: 1059
- Filter for nearly horizontal lines (< 5% slope)
- Create panel bounds from separator positions
Tuning: Parameters specifically optimized for 14-panel detection accuracy
CEN Refinement
Algorithm: Censorship-aware master image selection
Process:
- Detect if layout is censored (OpenAI analysis)
- For each detected CEN (censored) master:
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
- If layout is censored or no alternative: Keep CEN version
- Update results with refinement metadata
Naming convention: *CEN* in master ID indicates censored version
Parallel Processing Architecture
Serial Inlier Analysis Coordinator
Problem: Parallel inlier analysis causes memory exhaustion and crashes
Solution: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing
Architecture:
Multiple Layout Workers (parallel)
↓
Task Queue
↓
Single Inlier Worker (serial)
↓
Results back to layout workers
Components:
InlierAnalysisCoordinator- Manages serial execution- Task queue - Queues inlier analysis requests
- Worker thread - Processes tasks one at a time
- Futures - Async communication between layout and inlier workers
Benefits:
- Prevents memory explosion from too many concurrent inlier analyses
- Allows multiple layouts to be processed in parallel
- Coordinates resource usage across system
Dynamic Worker Adjustment
Monitoring:
- Memory usage (RAM percentage)
- Swap usage (swap percentage)
- Queue size (backlog of inlier tasks)
- Open file descriptors
Adjustment triggers:
- Memory > 85%: Reduce workers
- Swap > 95% AND Memory > 80%: Reduce workers
- Queue size ≥ 3: Reduce layout workers (producers)
- Memory < 75% AND Swap < 80%: Increase workers
Auto-scaling:
- Layout workers: Start at min(4, CPU/2), adjust dynamically
- Local workers: Start at CPU-2, adjust dynamically
- OpenAI workers: Set to number of master images
Important Implementation Details
Multiprocessing Considerations
Challenge: Python multiprocessing requires pickleable functions
Solutions:
process_single_master_inlier_analysis()- Standalone function (not class method)- All imports inside function to ensure worker processes have dependencies
- Cost calculator NOT imported in multiprocessing functions (causes pickle errors)
Memory safety:
- Feature limiting: Max 10,000-15,000 features per image
- Dynamic worker reduction based on feature count
- Forced garbage collection after processing
Cost Tracking Integration
Important: Cost tracking is DISABLED by default
Reason: Avoid repetitive initialization messages from multiprocessing workers
Integration points:
openai_detector.py: After every OpenAI API callhybrid_detector.py: Track all OpenAI operations- Results JSON: Cost breakdown per layout
Data flow:
- API call made → Extract token usage from response
- Call
cost_calculator.track_api_call() - Update session totals
- Generate reports on demand
Error Handling Patterns
OpenAI API errors:
try:
response = openai_call()
except Exception as e:
# Automatic retry logic
# Fallback to alternative method
# Return error result dict
Memory errors:
try:
result = memory_intensive_operation()
except MemoryError:
# Reduce worker count
# Force garbage collection
# Retry with lower concurrency
File descriptor exhaustion:
except OSError as e:
if "Too many open files" in str(e):
# Limit concurrent workers
# Clean up temp files
# Force resource release
File Organization
Core Detection Files
hybrid_detector.py- Hybrid detection (2939 lines)openai_detector.py- OpenAI detectionvector_detector.py- Vector similaritygemini_detector.py- Gemini detection
Panel Splitting Files
panel_splitter.py- Traditional multi-methodadvanced_splitter.py- Edge detectionsimple_splitter.py- Even division
Supporting Files
cost_calculator.py- Cost trackingmemory_manager.py- Memory managementlogging_config.py- Logging configurationcli.py- Command-line interface
Test Files
test_hybrid.py- Hybrid mode teststest_cost_calculator.py- Cost tracking teststest_split_mode.py- Panel splitting teststest_panel_accuracy.py- Panel detection accuracy- Various tuning and debug scripts
Data Directories
master_images/- 41 master images to detectlayouts/- 299+ layout images to processresults/- JSON output filesembeddings_cache/- Cached vector embeddings
Development Guidelines
Adding New Features
-
New Detection Mode:
- Create new detector class
- Inherit from base detector if applicable
- Implement
detect_images_in_layout()method - Add CLI integration in
cli.py - Update tests and documentation
-
New Panel Splitting Method:
- Create new splitter class
- Implement
split_panels(image_path, panel_count)method - Return list of dicts with keys:
image,bounds,confidence,method - Add CLI flag for selection
- Test with various panel counts
-
Cost Tracking for New API:
- Add extraction function for token usage
- Track calls with
cost_calculator.track_api_call() - Update operation types
- Add to cost reports
Testing Strategy
- Unit tests - Individual components
- Integration tests - Full detection pipeline
- Performance tests - Memory and speed benchmarks
- Accuracy tests - Panel detection accuracy
- Cost tests - Verify tracking accuracy
Performance Optimization Tips
- Reduce API calls - Primary cost driver
- Cache embeddings - Avoid regenerating
- Limit features - Prevent memory explosion
- Use multiprocessing - Parallel CPU work
- Monitor memory - Dynamic adjustment
- Profile bottlenecks - Optimize hot paths
Common Pitfalls
- Multiprocessing pickle errors - Use standalone functions
- Memory exhaustion - Limit concurrent workers
- File descriptor limits - Close files properly
- Cost calculator in workers - Keep in main process only
- Swap as error condition - Swap usage is OK, not error
Configuration Reference
Environment Variables
OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
Default Values
# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75 # vector mode
# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)
# Memory management
max_memory_percent = 75
max_swap_percent = 80
# Cost tracking
enable_tracking = False # Must explicitly enable
Output Format Specification
Results JSON Structure
{
"metadata": {
"total_layouts_processed": int,
"total_master_images": int,
"master_images_available": [str],
"provider": str,
"model": str,
"panel_threshold": int,
"inlier_threshold": float,
"processing_mode": str,
"cost_tracking": {dict} | null
},
"results": {
"layout_id": {
"layout_filename": str,
"detected_master_ids": [str],
"detected_master_filenames": [str],
"detection_method": str,
"panel_count": int,
"confidence_score": float,
"panel_analysis": {dict},
"censorship_analysis": {dict},
"truncation_applied": bool,
"deduplication_applied": bool,
"cost_breakdown": {dict} | null
}
}
}
Debugging Tips
Enable Debug Logging
# In code
import logging
logging.basicConfig(level=logging.DEBUG)
# Via environment
export LOG_LEVEL=DEBUG
Memory Issues
# Check current memory
python check_system_resources.py
# Test with memory fix
python test_memory_fix.py
# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
Cost Tracking Issues
# Verify cost tracking works
python test_cost_calculator.py
# Test integration
python test_cost_tracking_integration.py
# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking
Panel Splitting Issues
# Test splitting accuracy
python test_panel_accuracy.py
# Tune parameters
python tune_14_panel_split.py
# Debug specific layout
python test_6786505_cli.py
API Costs (Current Pricing)
OpenAI O3 (2025)
- Input tokens: $2.00 / million
- Cached input: $0.50 / million
- Output tokens: $8.00 / million
Typical Usage
- Hybrid mode: ~$0.01-0.02 per layout
- OpenAI mode: ~$0.02-0.05 per layout
- One-at-a-time: ~$0.50-1.00 per layout
Cost Optimization
- Hybrid mode: 97.6% reduction vs one-at-a-time
- Caching: Reduces input token costs
- Batch processing: Amortizes overhead
Future Enhancement Ideas
- Multi-GPU support - Parallel inlier analysis with GPU acceleration
- Incremental processing - Resume from saved progress
- Web interface - Browser-based detection and visualization
- Active learning - Use detection results to improve models
- Custom training - Fine-tune models on domain-specific data
- Real-time processing - Stream processing for live detection
- Distributed processing - Multi-machine coordination
- Advanced caching - Persistent result caching across runs
Contact and Support
For questions or issues:
- Check logs in
master_adapt_detect_*.log - Review cost reports in
results/ - Run diagnostic scripts
- Check system resources
- Review error messages carefully
Version History
Current implementation includes:
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
- Three panel splitting strategies
- Cost tracking and reporting
- Memory management and safety
- Parallel processing with coordination
- Dynamic worker adjustment
- Comprehensive logging and debugging
- Extensive configuration options
Last major update: January 2025