594 lines
17 KiB
Markdown
594 lines
17 KiB
Markdown
# Master Adapt Detect - Developer Documentation
|
|
|
|
## For AI Assistants and Developers
|
|
|
|
This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.
|
|
|
|
## Project Purpose
|
|
|
|
Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.
|
|
|
|
## Core Architecture
|
|
|
|
### System Design Philosophy
|
|
|
|
The system follows a **multi-strategy detection** approach with these design principles:
|
|
|
|
1. **Cost Optimization** - Minimize API costs while maintaining accuracy
|
|
2. **Flexibility** - Support multiple detection engines for different use cases
|
|
3. **Performance** - Parallel processing with memory management
|
|
4. **Robustness** - Automatic fallbacks and error recovery
|
|
|
|
### Detection Modes
|
|
|
|
The system provides 4 detection modes, each with specific use cases:
|
|
|
|
#### 1. Hybrid Mode (Primary/Recommended)
|
|
- **File**: `hybrid_detector.py` (2939 lines)
|
|
- **Purpose**: Balance speed, cost, and accuracy
|
|
- **Strategy**: OpenAI O3 for panel analysis + local CV for matching
|
|
- **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time)
|
|
|
|
**How it works:**
|
|
1. Single OpenAI API call to count panels and detect censorship
|
|
2. Route based on panel count:
|
|
- ≤ threshold: Direct local inlier analysis
|
|
- > threshold: Split layout first, then inlier analysis on each panel
|
|
3. Post-process with deduplication, CEN refinement, truncation
|
|
4. Optional fallback to OpenAI one-at-a-time if insufficient matches
|
|
|
|
**Key classes:**
|
|
- `HybridImageDetector` - Main orchestrator
|
|
- `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode
|
|
- `ProgressTracker` - Thread-safe progress monitoring
|
|
|
|
#### 2. OpenAI Mode
|
|
- **File**: `openai_detector.py`
|
|
- **Purpose**: Pure AI-powered detection
|
|
- **Strategy**: GPT-4 vision for direct image comparison
|
|
- **Cost**: 1-41 API calls per layout depending on mode
|
|
|
|
**Modes:**
|
|
- Standard: All masters in one API call
|
|
- One-at-a-time: Separate API call per master (expensive but thorough)
|
|
|
|
#### 3. Vector Mode
|
|
- **File**: `vector_detector.py`
|
|
- **Purpose**: Semantic similarity matching
|
|
- **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions)
|
|
- **Cost**: No OpenAI costs, uses Google Cloud
|
|
|
|
**Features:**
|
|
- Embedding caching for performance
|
|
- Cosine similarity matching
|
|
- Threshold-based filtering
|
|
|
|
#### 4. Gemini Mode
|
|
- **File**: `gemini_detector.py`
|
|
- **Purpose**: Alternative AI detection
|
|
- **Strategy**: Google Gemini 2.5 Pro visual reasoning
|
|
- **Cost**: Google AI API (not OpenAI)
|
|
|
|
### Panel Splitting Strategies
|
|
|
|
The system provides 3 panel splitting approaches for complex multi-panel layouts:
|
|
|
|
#### 1. Traditional Multi-Method Splitter
|
|
- **File**: `panel_splitter.py` (857 lines)
|
|
- **Strategy**: Optimized Canny edge detection + Hough transform
|
|
- **Tuning**: Specifically tuned for 14-panel detection
|
|
- **Parameters**: Thresholds, kernel sizes, line detection params
|
|
|
|
#### 2. Advanced Edge Detection Splitter
|
|
- **File**: `advanced_splitter.py` (200+ lines)
|
|
- **Strategy**: Sobel gradient analysis + gutter detection
|
|
- **Parameters**:
|
|
- `percentile`: Low-energy column threshold (default: 10)
|
|
- `min_gap`: Minimum gutter width (default: 5)
|
|
|
|
#### 3. Simple Even Division Splitter
|
|
- **File**: `simple_splitter.py` (132 lines)
|
|
- **Strategy**: Equal division based on panel count
|
|
- **Use case**: Fast processing when layout is regular grid
|
|
|
|
### Supporting Systems
|
|
|
|
#### Cost Calculator
|
|
- **File**: `cost_calculator.py` (440 lines)
|
|
- **Purpose**: Track OpenAI API usage and costs
|
|
- **Features**:
|
|
- Per-layout cost breakdown
|
|
- Session summaries
|
|
- Monthly estimation
|
|
- JSON report generation
|
|
- **Important**: Disabled by default, requires `--enable-cost-tracking` flag
|
|
|
|
**Data structures:**
|
|
- `TokenUsage` - Track token counts for single API call
|
|
- `ApiCallCost` - Cost info for single API call
|
|
- `LayoutCostSummary` - Aggregated cost for one layout
|
|
- `CostCalculator` - Main tracking class
|
|
|
|
#### Memory Manager
|
|
- **File**: `memory_manager.py` (119 lines)
|
|
- **Purpose**: Prevent system crashes from memory exhaustion
|
|
- **Features**:
|
|
- RAM and swap monitoring
|
|
- Dynamic worker adjustment
|
|
- Safe execution decorators
|
|
- Feature count limiting
|
|
|
|
**Thresholds:**
|
|
- Max memory: 80% (configurable)
|
|
- Max swap: 80% (warning only, doesn't throttle)
|
|
|
|
#### Logging Configuration
|
|
- **File**: `logging_config.py` (128 lines)
|
|
- **Purpose**: Dual output (terminal + file) for debugging crashes
|
|
- **Features**:
|
|
- Timestamped log files
|
|
- Exception tracking with resource usage
|
|
- System diagnostics on startup
|
|
|
|
### Command-Line Interface
|
|
|
|
- **File**: `cli.py`
|
|
- **Purpose**: Unified interface for all detection modes
|
|
- **Features**:
|
|
- Argument parsing for all modes
|
|
- Mode-specific configuration
|
|
- Results aggregation
|
|
- Cost reporting
|
|
|
|
**Key command patterns:**
|
|
```bash
|
|
# Detection mode selection
|
|
--hybrid / --openai / --vector / --gemini
|
|
|
|
# Processing scope
|
|
--test / --limit N / --all / --specific-file FILE
|
|
|
|
# Hybrid-specific
|
|
--panel-threshold N
|
|
--split-simple / --split-advanced
|
|
--vector-mode
|
|
--fallback-one-at-a-time
|
|
--parallel-layouts
|
|
|
|
# Cost tracking
|
|
--enable-cost-tracking
|
|
--cost-report
|
|
--cost-estimate N
|
|
```
|
|
|
|
## Key Algorithms
|
|
|
|
### Local Inlier Analysis (Hybrid Mode)
|
|
|
|
**Algorithm**: OpenCV AKAZE features + RANSAC homography estimation
|
|
|
|
**Process**:
|
|
1. Detect AKAZE keypoints in layout and master images
|
|
2. Match descriptors using brute-force matcher with Hamming distance
|
|
3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
|
|
4. Estimate homography using RANSAC (threshold: 7.0)
|
|
5. Count inliers and calculate confidence
|
|
|
|
**Thresholds:**
|
|
- `min_good_matches`: 10 (minimum matches before RANSAC)
|
|
- `inlier_threshold`: 0.65 (relative to best match)
|
|
- `inlier_ratio_threshold`: 0.4 (minimum inlier ratio)
|
|
|
|
**Confidence levels:**
|
|
- High: ≥30 inliers, ≥50% ratio
|
|
- Medium: ≥15 inliers, ≥30% ratio
|
|
- Low: Below medium thresholds
|
|
|
|
**Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing)
|
|
|
|
### Vector Similarity Analysis
|
|
|
|
**Algorithm**: Cosine similarity on 1408-dimensional embeddings
|
|
|
|
**Process**:
|
|
1. Generate embedding for layout using Vertex AI
|
|
2. Compare against cached master embeddings
|
|
3. Calculate cosine similarity for each master
|
|
4. Filter by threshold (default: 0.75)
|
|
5. Sort by similarity descending
|
|
|
|
**Formula**:
|
|
```
|
|
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
|
|
```
|
|
|
|
**Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl`
|
|
|
|
### Panel Splitting (Canny Detection)
|
|
|
|
**Algorithm**: Multi-threshold Canny + Hough line transform
|
|
|
|
**Process**:
|
|
1. Apply Canny edge detection at multiple thresholds:
|
|
- (50, 150), (100, 200), (150, 250)
|
|
2. Morphological closing with (3, 1) kernel
|
|
3. Combine edge maps with maximum operation
|
|
4. Hough line transform for horizontal lines:
|
|
- Threshold: 1324
|
|
- Min length: 3530
|
|
- Max gap: 1059
|
|
5. Filter for nearly horizontal lines (< 5% slope)
|
|
6. Create panel bounds from separator positions
|
|
|
|
**Tuning**: Parameters specifically optimized for 14-panel detection accuracy
|
|
|
|
### CEN Refinement
|
|
|
|
**Algorithm**: Censorship-aware master image selection
|
|
|
|
**Process**:
|
|
1. Detect if layout is censored (OpenAI analysis)
|
|
2. For each detected CEN (censored) master:
|
|
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
|
|
- If layout is censored or no alternative: Keep CEN version
|
|
3. Update results with refinement metadata
|
|
|
|
**Naming convention**: `*CEN*` in master ID indicates censored version
|
|
|
|
## Parallel Processing Architecture
|
|
|
|
### Serial Inlier Analysis Coordinator
|
|
|
|
**Problem**: Parallel inlier analysis causes memory exhaustion and crashes
|
|
|
|
**Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing
|
|
|
|
**Architecture:**
|
|
```
|
|
Multiple Layout Workers (parallel)
|
|
↓
|
|
Task Queue
|
|
↓
|
|
Single Inlier Worker (serial)
|
|
↓
|
|
Results back to layout workers
|
|
```
|
|
|
|
**Components:**
|
|
- `InlierAnalysisCoordinator` - Manages serial execution
|
|
- Task queue - Queues inlier analysis requests
|
|
- Worker thread - Processes tasks one at a time
|
|
- Futures - Async communication between layout and inlier workers
|
|
|
|
**Benefits:**
|
|
- Prevents memory explosion from too many concurrent inlier analyses
|
|
- Allows multiple layouts to be processed in parallel
|
|
- Coordinates resource usage across system
|
|
|
|
### Dynamic Worker Adjustment
|
|
|
|
**Monitoring:**
|
|
- Memory usage (RAM percentage)
|
|
- Swap usage (swap percentage)
|
|
- Queue size (backlog of inlier tasks)
|
|
- Open file descriptors
|
|
|
|
**Adjustment triggers:**
|
|
- Memory > 85%: Reduce workers
|
|
- Swap > 95% AND Memory > 80%: Reduce workers
|
|
- Queue size ≥ 3: Reduce layout workers (producers)
|
|
- Memory < 75% AND Swap < 80%: Increase workers
|
|
|
|
**Auto-scaling:**
|
|
- Layout workers: Start at min(4, CPU/2), adjust dynamically
|
|
- Local workers: Start at CPU-2, adjust dynamically
|
|
- OpenAI workers: Set to number of master images
|
|
|
|
## Important Implementation Details
|
|
|
|
### Multiprocessing Considerations
|
|
|
|
**Challenge**: Python multiprocessing requires pickleable functions
|
|
|
|
**Solutions:**
|
|
1. `process_single_master_inlier_analysis()` - Standalone function (not class method)
|
|
2. All imports inside function to ensure worker processes have dependencies
|
|
3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)
|
|
|
|
**Memory safety:**
|
|
- Feature limiting: Max 10,000-15,000 features per image
|
|
- Dynamic worker reduction based on feature count
|
|
- Forced garbage collection after processing
|
|
|
|
### Cost Tracking Integration
|
|
|
|
**Important**: Cost tracking is DISABLED by default
|
|
|
|
**Reason**: Avoid repetitive initialization messages from multiprocessing workers
|
|
|
|
**Integration points:**
|
|
- `openai_detector.py`: After every OpenAI API call
|
|
- `hybrid_detector.py`: Track all OpenAI operations
|
|
- Results JSON: Cost breakdown per layout
|
|
|
|
**Data flow:**
|
|
1. API call made → Extract token usage from response
|
|
2. Call `cost_calculator.track_api_call()`
|
|
3. Update session totals
|
|
4. Generate reports on demand
|
|
|
|
### Error Handling Patterns
|
|
|
|
**OpenAI API errors:**
|
|
```python
|
|
try:
|
|
response = openai_call()
|
|
except Exception as e:
|
|
# Automatic retry logic
|
|
# Fallback to alternative method
|
|
# Return error result dict
|
|
```
|
|
|
|
**Memory errors:**
|
|
```python
|
|
try:
|
|
result = memory_intensive_operation()
|
|
except MemoryError:
|
|
# Reduce worker count
|
|
# Force garbage collection
|
|
# Retry with lower concurrency
|
|
```
|
|
|
|
**File descriptor exhaustion:**
|
|
```python
|
|
except OSError as e:
|
|
if "Too many open files" in str(e):
|
|
# Limit concurrent workers
|
|
# Clean up temp files
|
|
# Force resource release
|
|
```
|
|
|
|
## File Organization
|
|
|
|
### Core Detection Files
|
|
- `hybrid_detector.py` - Hybrid detection (2939 lines)
|
|
- `openai_detector.py` - OpenAI detection
|
|
- `vector_detector.py` - Vector similarity
|
|
- `gemini_detector.py` - Gemini detection
|
|
|
|
### Panel Splitting Files
|
|
- `panel_splitter.py` - Traditional multi-method
|
|
- `advanced_splitter.py` - Edge detection
|
|
- `simple_splitter.py` - Even division
|
|
|
|
### Supporting Files
|
|
- `cost_calculator.py` - Cost tracking
|
|
- `memory_manager.py` - Memory management
|
|
- `logging_config.py` - Logging configuration
|
|
- `cli.py` - Command-line interface
|
|
|
|
### Test Files
|
|
- `test_hybrid.py` - Hybrid mode tests
|
|
- `test_cost_calculator.py` - Cost tracking tests
|
|
- `test_split_mode.py` - Panel splitting tests
|
|
- `test_panel_accuracy.py` - Panel detection accuracy
|
|
- Various tuning and debug scripts
|
|
|
|
### Data Directories
|
|
- `master_images/` - 41 master images to detect
|
|
- `layouts/` - 299+ layout images to process
|
|
- `results/` - JSON output files
|
|
- `embeddings_cache/` - Cached vector embeddings
|
|
|
|
## Development Guidelines
|
|
|
|
### Adding New Features
|
|
|
|
1. **New Detection Mode:**
|
|
- Create new detector class
|
|
- Inherit from base detector if applicable
|
|
- Implement `detect_images_in_layout()` method
|
|
- Add CLI integration in `cli.py`
|
|
- Update tests and documentation
|
|
|
|
2. **New Panel Splitting Method:**
|
|
- Create new splitter class
|
|
- Implement `split_panels(image_path, panel_count)` method
|
|
- Return list of dicts with keys: `image`, `bounds`, `confidence`, `method`
|
|
- Add CLI flag for selection
|
|
- Test with various panel counts
|
|
|
|
3. **Cost Tracking for New API:**
|
|
- Add extraction function for token usage
|
|
- Track calls with `cost_calculator.track_api_call()`
|
|
- Update operation types
|
|
- Add to cost reports
|
|
|
|
### Testing Strategy
|
|
|
|
1. **Unit tests** - Individual components
|
|
2. **Integration tests** - Full detection pipeline
|
|
3. **Performance tests** - Memory and speed benchmarks
|
|
4. **Accuracy tests** - Panel detection accuracy
|
|
5. **Cost tests** - Verify tracking accuracy
|
|
|
|
### Performance Optimization Tips
|
|
|
|
1. **Reduce API calls** - Primary cost driver
|
|
2. **Cache embeddings** - Avoid regenerating
|
|
3. **Limit features** - Prevent memory explosion
|
|
4. **Use multiprocessing** - Parallel CPU work
|
|
5. **Monitor memory** - Dynamic adjustment
|
|
6. **Profile bottlenecks** - Optimize hot paths
|
|
|
|
### Common Pitfalls
|
|
|
|
1. **Multiprocessing pickle errors** - Use standalone functions
|
|
2. **Memory exhaustion** - Limit concurrent workers
|
|
3. **File descriptor limits** - Close files properly
|
|
4. **Cost calculator in workers** - Keep in main process only
|
|
5. **Swap as error condition** - Swap usage is OK, not error
|
|
|
|
## Configuration Reference
|
|
|
|
### Environment Variables
|
|
```
|
|
OPENAI_API_KEY - OpenAI API authentication
|
|
GOOGLE_API_KEY - Google AI API authentication
|
|
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
|
|
```
|
|
|
|
### Default Values
|
|
```python
|
|
# Hybrid mode
|
|
panel_threshold = 2
|
|
inlier_threshold = 0.65
|
|
inlier_ratio_threshold = 0.4
|
|
min_good_matches = 10
|
|
similarity_threshold = 0.75 # vector mode
|
|
|
|
# Workers (auto-detected)
|
|
openai_workers = len(master_images)
|
|
local_workers = max(1, cpu_count - 2)
|
|
layout_workers = min(4, cpu_count // 2)
|
|
|
|
# Memory management
|
|
max_memory_percent = 75
|
|
max_swap_percent = 80
|
|
|
|
# Cost tracking
|
|
enable_tracking = False # Must explicitly enable
|
|
```
|
|
|
|
## Output Format Specification
|
|
|
|
### Results JSON Structure
|
|
```json
|
|
{
|
|
"metadata": {
|
|
"total_layouts_processed": int,
|
|
"total_master_images": int,
|
|
"master_images_available": [str],
|
|
"provider": str,
|
|
"model": str,
|
|
"panel_threshold": int,
|
|
"inlier_threshold": float,
|
|
"processing_mode": str,
|
|
"cost_tracking": {dict} | null
|
|
},
|
|
"results": {
|
|
"layout_id": {
|
|
"layout_filename": str,
|
|
"detected_master_ids": [str],
|
|
"detected_master_filenames": [str],
|
|
"detection_method": str,
|
|
"panel_count": int,
|
|
"confidence_score": float,
|
|
"panel_analysis": {dict},
|
|
"censorship_analysis": {dict},
|
|
"truncation_applied": bool,
|
|
"deduplication_applied": bool,
|
|
"cost_breakdown": {dict} | null
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Debugging Tips
|
|
|
|
### Enable Debug Logging
|
|
```python
|
|
# In code
|
|
import logging
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
|
|
# Via environment
|
|
export LOG_LEVEL=DEBUG
|
|
```
|
|
|
|
### Memory Issues
|
|
```bash
|
|
# Check current memory
|
|
python check_system_resources.py
|
|
|
|
# Test with memory fix
|
|
python test_memory_fix.py
|
|
|
|
# Run with reduced workers
|
|
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
|
|
```
|
|
|
|
### Cost Tracking Issues
|
|
```bash
|
|
# Verify cost tracking works
|
|
python test_cost_calculator.py
|
|
|
|
# Test integration
|
|
python test_cost_tracking_integration.py
|
|
|
|
# Run with tracking enabled
|
|
python cli.py --test --hybrid --enable-cost-tracking
|
|
```
|
|
|
|
### Panel Splitting Issues
|
|
```bash
|
|
# Test splitting accuracy
|
|
python test_panel_accuracy.py
|
|
|
|
# Tune parameters
|
|
python tune_14_panel_split.py
|
|
|
|
# Debug specific layout
|
|
python test_6786505_cli.py
|
|
```
|
|
|
|
## API Costs (Current Pricing)
|
|
|
|
### OpenAI O3 (2025)
|
|
- Input tokens: $2.00 / million
|
|
- Cached input: $0.50 / million
|
|
- Output tokens: $8.00 / million
|
|
|
|
### Typical Usage
|
|
- Hybrid mode: ~$0.01-0.02 per layout
|
|
- OpenAI mode: ~$0.02-0.05 per layout
|
|
- One-at-a-time: ~$0.50-1.00 per layout
|
|
|
|
### Cost Optimization
|
|
- Hybrid mode: 97.6% reduction vs one-at-a-time
|
|
- Caching: Reduces input token costs
|
|
- Batch processing: Amortizes overhead
|
|
|
|
## Future Enhancement Ideas
|
|
|
|
1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration
|
|
2. **Incremental processing** - Resume from saved progress
|
|
3. **Web interface** - Browser-based detection and visualization
|
|
4. **Active learning** - Use detection results to improve models
|
|
5. **Custom training** - Fine-tune models on domain-specific data
|
|
6. **Real-time processing** - Stream processing for live detection
|
|
7. **Distributed processing** - Multi-machine coordination
|
|
8. **Advanced caching** - Persistent result caching across runs
|
|
|
|
## Contact and Support
|
|
|
|
For questions or issues:
|
|
1. Check logs in `master_adapt_detect_*.log`
|
|
2. Review cost reports in `results/`
|
|
3. Run diagnostic scripts
|
|
4. Check system resources
|
|
5. Review error messages carefully
|
|
|
|
## Version History
|
|
|
|
Current implementation includes:
|
|
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
|
|
- Three panel splitting strategies
|
|
- Cost tracking and reporting
|
|
- Memory management and safety
|
|
- Parallel processing with coordination
|
|
- Dynamic worker adjustment
|
|
- Comprehensive logging and debugging
|
|
- Extensive configuration options
|
|
|
|
Last major update: January 2025
|