master_adapt_detect/claude.md

594 lines
17 KiB
Markdown

# Master Adapt Detect - Developer Documentation
## For AI Assistants and Developers
This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.
## Project Purpose
Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.
## Core Architecture
### System Design Philosophy
The system follows a **multi-strategy detection** approach with these design principles:
1. **Cost Optimization** - Minimize API costs while maintaining accuracy
2. **Flexibility** - Support multiple detection engines for different use cases
3. **Performance** - Parallel processing with memory management
4. **Robustness** - Automatic fallbacks and error recovery
### Detection Modes
The system provides 4 detection modes, each with specific use cases:
#### 1. Hybrid Mode (Primary/Recommended)
- **File**: `hybrid_detector.py` (2939 lines)
- **Purpose**: Balance speed, cost, and accuracy
- **Strategy**: OpenAI O3 for panel analysis + local CV for matching
- **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time)
**How it works:**
1. Single OpenAI API call to count panels and detect censorship
2. Route based on panel count:
- ≤ threshold: Direct local inlier analysis
- > threshold: Split layout first, then inlier analysis on each panel
3. Post-process with deduplication, CEN refinement, truncation
4. Optional fallback to OpenAI one-at-a-time if insufficient matches
**Key classes:**
- `HybridImageDetector` - Main orchestrator
- `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode
- `ProgressTracker` - Thread-safe progress monitoring
#### 2. OpenAI Mode
- **File**: `openai_detector.py`
- **Purpose**: Pure AI-powered detection
- **Strategy**: GPT-4 vision for direct image comparison
- **Cost**: 1-41 API calls per layout depending on mode
**Modes:**
- Standard: All masters in one API call
- One-at-a-time: Separate API call per master (expensive but thorough)
#### 3. Vector Mode
- **File**: `vector_detector.py`
- **Purpose**: Semantic similarity matching
- **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions)
- **Cost**: No OpenAI costs, uses Google Cloud
**Features:**
- Embedding caching for performance
- Cosine similarity matching
- Threshold-based filtering
#### 4. Gemini Mode
- **File**: `gemini_detector.py`
- **Purpose**: Alternative AI detection
- **Strategy**: Google Gemini 2.5 Pro visual reasoning
- **Cost**: Google AI API (not OpenAI)
### Panel Splitting Strategies
The system provides 3 panel splitting approaches for complex multi-panel layouts:
#### 1. Traditional Multi-Method Splitter
- **File**: `panel_splitter.py` (857 lines)
- **Strategy**: Optimized Canny edge detection + Hough transform
- **Tuning**: Specifically tuned for 14-panel detection
- **Parameters**: Thresholds, kernel sizes, line detection params
#### 2. Advanced Edge Detection Splitter
- **File**: `advanced_splitter.py` (200+ lines)
- **Strategy**: Sobel gradient analysis + gutter detection
- **Parameters**:
- `percentile`: Low-energy column threshold (default: 10)
- `min_gap`: Minimum gutter width (default: 5)
#### 3. Simple Even Division Splitter
- **File**: `simple_splitter.py` (132 lines)
- **Strategy**: Equal division based on panel count
- **Use case**: Fast processing when layout is regular grid
### Supporting Systems
#### Cost Calculator
- **File**: `cost_calculator.py` (440 lines)
- **Purpose**: Track OpenAI API usage and costs
- **Features**:
- Per-layout cost breakdown
- Session summaries
- Monthly estimation
- JSON report generation
- **Important**: Disabled by default, requires `--enable-cost-tracking` flag
**Data structures:**
- `TokenUsage` - Track token counts for single API call
- `ApiCallCost` - Cost info for single API call
- `LayoutCostSummary` - Aggregated cost for one layout
- `CostCalculator` - Main tracking class
#### Memory Manager
- **File**: `memory_manager.py` (119 lines)
- **Purpose**: Prevent system crashes from memory exhaustion
- **Features**:
- RAM and swap monitoring
- Dynamic worker adjustment
- Safe execution decorators
- Feature count limiting
**Thresholds:**
- Max memory: 80% (configurable)
- Max swap: 80% (warning only, doesn't throttle)
#### Logging Configuration
- **File**: `logging_config.py` (128 lines)
- **Purpose**: Dual output (terminal + file) for debugging crashes
- **Features**:
- Timestamped log files
- Exception tracking with resource usage
- System diagnostics on startup
### Command-Line Interface
- **File**: `cli.py`
- **Purpose**: Unified interface for all detection modes
- **Features**:
- Argument parsing for all modes
- Mode-specific configuration
- Results aggregation
- Cost reporting
**Key command patterns:**
```bash
# Detection mode selection
--hybrid / --openai / --vector / --gemini
# Processing scope
--test / --limit N / --all / --specific-file FILE
# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts
# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N
```
## Key Algorithms
### Local Inlier Analysis (Hybrid Mode)
**Algorithm**: OpenCV AKAZE features + RANSAC homography estimation
**Process**:
1. Detect AKAZE keypoints in layout and master images
2. Match descriptors using brute-force matcher with Hamming distance
3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
4. Estimate homography using RANSAC (threshold: 7.0)
5. Count inliers and calculate confidence
**Thresholds:**
- `min_good_matches`: 10 (minimum matches before RANSAC)
- `inlier_threshold`: 0.65 (relative to best match)
- `inlier_ratio_threshold`: 0.4 (minimum inlier ratio)
**Confidence levels:**
- High: ≥30 inliers, ≥50% ratio
- Medium: ≥15 inliers, ≥30% ratio
- Low: Below medium thresholds
**Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing)
### Vector Similarity Analysis
**Algorithm**: Cosine similarity on 1408-dimensional embeddings
**Process**:
1. Generate embedding for layout using Vertex AI
2. Compare against cached master embeddings
3. Calculate cosine similarity for each master
4. Filter by threshold (default: 0.75)
5. Sort by similarity descending
**Formula**:
```
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
```
**Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl`
### Panel Splitting (Canny Detection)
**Algorithm**: Multi-threshold Canny + Hough line transform
**Process**:
1. Apply Canny edge detection at multiple thresholds:
- (50, 150), (100, 200), (150, 250)
2. Morphological closing with (3, 1) kernel
3. Combine edge maps with maximum operation
4. Hough line transform for horizontal lines:
- Threshold: 1324
- Min length: 3530
- Max gap: 1059
5. Filter for nearly horizontal lines (< 5% slope)
6. Create panel bounds from separator positions
**Tuning**: Parameters specifically optimized for 14-panel detection accuracy
### CEN Refinement
**Algorithm**: Censorship-aware master image selection
**Process**:
1. Detect if layout is censored (OpenAI analysis)
2. For each detected CEN (censored) master:
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
- If layout is censored or no alternative: Keep CEN version
3. Update results with refinement metadata
**Naming convention**: `*CEN*` in master ID indicates censored version
## Parallel Processing Architecture
### Serial Inlier Analysis Coordinator
**Problem**: Parallel inlier analysis causes memory exhaustion and crashes
**Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing
**Architecture:**
```
Multiple Layout Workers (parallel)
Task Queue
Single Inlier Worker (serial)
Results back to layout workers
```
**Components:**
- `InlierAnalysisCoordinator` - Manages serial execution
- Task queue - Queues inlier analysis requests
- Worker thread - Processes tasks one at a time
- Futures - Async communication between layout and inlier workers
**Benefits:**
- Prevents memory explosion from too many concurrent inlier analyses
- Allows multiple layouts to be processed in parallel
- Coordinates resource usage across system
### Dynamic Worker Adjustment
**Monitoring:**
- Memory usage (RAM percentage)
- Swap usage (swap percentage)
- Queue size (backlog of inlier tasks)
- Open file descriptors
**Adjustment triggers:**
- Memory > 85%: Reduce workers
- Swap > 95% AND Memory > 80%: Reduce workers
- Queue size ≥ 3: Reduce layout workers (producers)
- Memory < 75% AND Swap < 80%: Increase workers
**Auto-scaling:**
- Layout workers: Start at min(4, CPU/2), adjust dynamically
- Local workers: Start at CPU-2, adjust dynamically
- OpenAI workers: Set to number of master images
## Important Implementation Details
### Multiprocessing Considerations
**Challenge**: Python multiprocessing requires pickleable functions
**Solutions:**
1. `process_single_master_inlier_analysis()` - Standalone function (not class method)
2. All imports inside function to ensure worker processes have dependencies
3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)
**Memory safety:**
- Feature limiting: Max 10,000-15,000 features per image
- Dynamic worker reduction based on feature count
- Forced garbage collection after processing
### Cost Tracking Integration
**Important**: Cost tracking is DISABLED by default
**Reason**: Avoid repetitive initialization messages from multiprocessing workers
**Integration points:**
- `openai_detector.py`: After every OpenAI API call
- `hybrid_detector.py`: Track all OpenAI operations
- Results JSON: Cost breakdown per layout
**Data flow:**
1. API call made Extract token usage from response
2. Call `cost_calculator.track_api_call()`
3. Update session totals
4. Generate reports on demand
### Error Handling Patterns
**OpenAI API errors:**
```python
try:
response = openai_call()
except Exception as e:
# Automatic retry logic
# Fallback to alternative method
# Return error result dict
```
**Memory errors:**
```python
try:
result = memory_intensive_operation()
except MemoryError:
# Reduce worker count
# Force garbage collection
# Retry with lower concurrency
```
**File descriptor exhaustion:**
```python
except OSError as e:
if "Too many open files" in str(e):
# Limit concurrent workers
# Clean up temp files
# Force resource release
```
## File Organization
### Core Detection Files
- `hybrid_detector.py` - Hybrid detection (2939 lines)
- `openai_detector.py` - OpenAI detection
- `vector_detector.py` - Vector similarity
- `gemini_detector.py` - Gemini detection
### Panel Splitting Files
- `panel_splitter.py` - Traditional multi-method
- `advanced_splitter.py` - Edge detection
- `simple_splitter.py` - Even division
### Supporting Files
- `cost_calculator.py` - Cost tracking
- `memory_manager.py` - Memory management
- `logging_config.py` - Logging configuration
- `cli.py` - Command-line interface
### Test Files
- `test_hybrid.py` - Hybrid mode tests
- `test_cost_calculator.py` - Cost tracking tests
- `test_split_mode.py` - Panel splitting tests
- `test_panel_accuracy.py` - Panel detection accuracy
- Various tuning and debug scripts
### Data Directories
- `master_images/` - 41 master images to detect
- `layouts/` - 299+ layout images to process
- `results/` - JSON output files
- `embeddings_cache/` - Cached vector embeddings
## Development Guidelines
### Adding New Features
1. **New Detection Mode:**
- Create new detector class
- Inherit from base detector if applicable
- Implement `detect_images_in_layout()` method
- Add CLI integration in `cli.py`
- Update tests and documentation
2. **New Panel Splitting Method:**
- Create new splitter class
- Implement `split_panels(image_path, panel_count)` method
- Return list of dicts with keys: `image`, `bounds`, `confidence`, `method`
- Add CLI flag for selection
- Test with various panel counts
3. **Cost Tracking for New API:**
- Add extraction function for token usage
- Track calls with `cost_calculator.track_api_call()`
- Update operation types
- Add to cost reports
### Testing Strategy
1. **Unit tests** - Individual components
2. **Integration tests** - Full detection pipeline
3. **Performance tests** - Memory and speed benchmarks
4. **Accuracy tests** - Panel detection accuracy
5. **Cost tests** - Verify tracking accuracy
### Performance Optimization Tips
1. **Reduce API calls** - Primary cost driver
2. **Cache embeddings** - Avoid regenerating
3. **Limit features** - Prevent memory explosion
4. **Use multiprocessing** - Parallel CPU work
5. **Monitor memory** - Dynamic adjustment
6. **Profile bottlenecks** - Optimize hot paths
### Common Pitfalls
1. **Multiprocessing pickle errors** - Use standalone functions
2. **Memory exhaustion** - Limit concurrent workers
3. **File descriptor limits** - Close files properly
4. **Cost calculator in workers** - Keep in main process only
5. **Swap as error condition** - Swap usage is OK, not error
## Configuration Reference
### Environment Variables
```
OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
```
### Default Values
```python
# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75 # vector mode
# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)
# Memory management
max_memory_percent = 75
max_swap_percent = 80
# Cost tracking
enable_tracking = False # Must explicitly enable
```
## Output Format Specification
### Results JSON Structure
```json
{
"metadata": {
"total_layouts_processed": int,
"total_master_images": int,
"master_images_available": [str],
"provider": str,
"model": str,
"panel_threshold": int,
"inlier_threshold": float,
"processing_mode": str,
"cost_tracking": {dict} | null
},
"results": {
"layout_id": {
"layout_filename": str,
"detected_master_ids": [str],
"detected_master_filenames": [str],
"detection_method": str,
"panel_count": int,
"confidence_score": float,
"panel_analysis": {dict},
"censorship_analysis": {dict},
"truncation_applied": bool,
"deduplication_applied": bool,
"cost_breakdown": {dict} | null
}
}
}
```
## Debugging Tips
### Enable Debug Logging
```python
# In code
import logging
logging.basicConfig(level=logging.DEBUG)
# Via environment
export LOG_LEVEL=DEBUG
```
### Memory Issues
```bash
# Check current memory
python check_system_resources.py
# Test with memory fix
python test_memory_fix.py
# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
```
### Cost Tracking Issues
```bash
# Verify cost tracking works
python test_cost_calculator.py
# Test integration
python test_cost_tracking_integration.py
# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking
```
### Panel Splitting Issues
```bash
# Test splitting accuracy
python test_panel_accuracy.py
# Tune parameters
python tune_14_panel_split.py
# Debug specific layout
python test_6786505_cli.py
```
## API Costs (Current Pricing)
### OpenAI O3 (2025)
- Input tokens: $2.00 / million
- Cached input: $0.50 / million
- Output tokens: $8.00 / million
### Typical Usage
- Hybrid mode: ~$0.01-0.02 per layout
- OpenAI mode: ~$0.02-0.05 per layout
- One-at-a-time: ~$0.50-1.00 per layout
### Cost Optimization
- Hybrid mode: 97.6% reduction vs one-at-a-time
- Caching: Reduces input token costs
- Batch processing: Amortizes overhead
## Future Enhancement Ideas
1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration
2. **Incremental processing** - Resume from saved progress
3. **Web interface** - Browser-based detection and visualization
4. **Active learning** - Use detection results to improve models
5. **Custom training** - Fine-tune models on domain-specific data
6. **Real-time processing** - Stream processing for live detection
7. **Distributed processing** - Multi-machine coordination
8. **Advanced caching** - Persistent result caching across runs
## Contact and Support
For questions or issues:
1. Check logs in `master_adapt_detect_*.log`
2. Review cost reports in `results/`
3. Run diagnostic scripts
4. Check system resources
5. Review error messages carefully
## Version History
Current implementation includes:
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
- Three panel splitting strategies
- Cost tracking and reporting
- Memory management and safety
- Parallel processing with coordination
- Dynamic worker adjustment
- Comprehensive logging and debugging
- Extensive configuration options
Last major update: January 2025