master_adapt_detect/claude.md

# Master Adapt Detect - Developer Documentation

## For AI Assistants and Developers

This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.

## Project Purpose

Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.

## Core Architecture

### System Design Philosophy

The system follows a **multi-strategy detection** approach with these design principles:

1. **Cost Optimization** - Minimize API costs while maintaining accuracy
2. **Flexibility** - Support multiple detection engines for different use cases
3. **Performance** - Parallel processing with memory management
4. **Robustness** - Automatic fallbacks and error recovery

### Detection Modes

The system provides 4 detection modes, each with specific use cases:

#### 1. Hybrid Mode (Primary/Recommended)
- **File**: `hybrid_detector.py` (2939 lines)
- **Purpose**: Balance speed, cost, and accuracy
- **Strategy**: OpenAI O3 for panel analysis + local CV for matching
- **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time)

**How it works:**
1. Single OpenAI API call to count panels and detect censorship
2. Route based on panel count:
   - ≤ threshold: Direct local inlier analysis
   - > threshold: Split layout first, then inlier analysis on each panel
3. Post-process with deduplication, CEN refinement, truncation
4. Optional fallback to OpenAI one-at-a-time if insufficient matches

**Key classes:**
- `HybridImageDetector` - Main orchestrator
- `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode
- `ProgressTracker` - Thread-safe progress monitoring

#### 2. OpenAI Mode
- **File**: `openai_detector.py`
- **Purpose**: Pure AI-powered detection
- **Strategy**: GPT-4 vision for direct image comparison
- **Cost**: 1-41 API calls per layout depending on mode

**Modes:**
- Standard: All masters in one API call
- One-at-a-time: Separate API call per master (expensive but thorough)

#### 3. Vector Mode
- **File**: `vector_detector.py`
- **Purpose**: Semantic similarity matching
- **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions)
- **Cost**: No OpenAI costs, uses Google Cloud

**Features:**
- Embedding caching for performance
- Cosine similarity matching
- Threshold-based filtering

#### 4. Gemini Mode
- **File**: `gemini_detector.py`
- **Purpose**: Alternative AI detection
- **Strategy**: Google Gemini 2.5 Pro visual reasoning
- **Cost**: Google AI API (not OpenAI)

### Panel Splitting Strategies

The system provides 3 panel splitting approaches for complex multi-panel layouts:

#### 1. Traditional Multi-Method Splitter
- **File**: `panel_splitter.py` (857 lines)
- **Strategy**: Optimized Canny edge detection + Hough transform
- **Tuning**: Specifically tuned for 14-panel detection
- **Parameters**: Thresholds, kernel sizes, line detection params

#### 2. Advanced Edge Detection Splitter
- **File**: `advanced_splitter.py` (200+ lines)
- **Strategy**: Sobel gradient analysis + gutter detection
- **Parameters**:
  - `percentile`: Low-energy column threshold (default: 10)
  - `min_gap`: Minimum gutter width (default: 5)

#### 3. Simple Even Division Splitter
- **File**: `simple_splitter.py` (132 lines)
- **Strategy**: Equal division based on panel count
- **Use case**: Fast processing when layout is regular grid

### Supporting Systems

#### Cost Calculator
- **File**: `cost_calculator.py` (440 lines)
- **Purpose**: Track OpenAI API usage and costs
- **Features**:
  - Per-layout cost breakdown
  - Session summaries
  - Monthly estimation
  - JSON report generation
- **Important**: Disabled by default, requires `--enable-cost-tracking` flag

**Data structures:**
- `TokenUsage` - Track token counts for single API call
- `ApiCallCost` - Cost info for single API call
- `LayoutCostSummary` - Aggregated cost for one layout
- `CostCalculator` - Main tracking class

#### Memory Manager
- **File**: `memory_manager.py` (119 lines)
- **Purpose**: Prevent system crashes from memory exhaustion
- **Features**:
  - RAM and swap monitoring
  - Dynamic worker adjustment
  - Safe execution decorators
  - Feature count limiting

**Thresholds:**
- Max memory: 80% (configurable)
- Max swap: 80% (warning only, doesn't throttle)

#### Logging Configuration
- **File**: `logging_config.py` (128 lines)
- **Purpose**: Dual output (terminal + file) for debugging crashes
- **Features**:
  - Timestamped log files
  - Exception tracking with resource usage
  - System diagnostics on startup

### Command-Line Interface

- **File**: `cli.py`
- **Purpose**: Unified interface for all detection modes
- **Features**:
  - Argument parsing for all modes
  - Mode-specific configuration
  - Results aggregation
  - Cost reporting

**Key command patterns:**
```bash
# Detection mode selection
--hybrid / --openai / --vector / --gemini

# Processing scope
--test / --limit N / --all / --specific-file FILE

# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts

# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N
```

## Key Algorithms

### Local Inlier Analysis (Hybrid Mode)

**Algorithm**: OpenCV AKAZE features + RANSAC homography estimation

**Process**:
1. Detect AKAZE keypoints in layout and master images
2. Match descriptors using brute-force matcher with Hamming distance
3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
4. Estimate homography using RANSAC (threshold: 7.0)
5. Count inliers and calculate confidence

**Thresholds:**
- `min_good_matches`: 10 (minimum matches before RANSAC)
- `inlier_threshold`: 0.65 (relative to best match)
- `inlier_ratio_threshold`: 0.4 (minimum inlier ratio)

**Confidence levels:**
- High: ≥30 inliers, ≥50% ratio
- Medium: ≥15 inliers, ≥30% ratio
- Low: Below medium thresholds

**Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing)

### Vector Similarity Analysis

**Algorithm**: Cosine similarity on 1408-dimensional embeddings

**Process**:
1. Generate embedding for layout using Vertex AI
2. Compare against cached master embeddings
3. Calculate cosine similarity for each master
4. Filter by threshold (default: 0.75)
5. Sort by similarity descending

**Formula**:
```
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
```

**Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl`

### Panel Splitting (Canny Detection)

**Algorithm**: Multi-threshold Canny + Hough line transform

**Process**:
1. Apply Canny edge detection at multiple thresholds:
   - (50, 150), (100, 200), (150, 250)
2. Morphological closing with (3, 1) kernel
3. Combine edge maps with maximum operation
4. Hough line transform for horizontal lines:
   - Threshold: 1324
   - Min length: 3530
   - Max gap: 1059
5. Filter for nearly horizontal lines (< 5% slope)
6. Create panel bounds from separator positions

**Tuning**: Parameters specifically optimized for 14-panel detection accuracy

### CEN Refinement

**Algorithm**: Censorship-aware master image selection

**Process**:
1. Detect if layout is censored (OpenAI analysis)
2. For each detected CEN (censored) master:
   - If layout is uncensored and non-CEN version exists: Switch to non-CEN
   - If layout is censored or no alternative: Keep CEN version
3. Update results with refinement metadata

**Naming convention**: `*CEN*` in master ID indicates censored version

## Parallel Processing Architecture

### Serial Inlier Analysis Coordinator

**Problem**: Parallel inlier analysis causes memory exhaustion and crashes

**Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing

**Architecture:**
```
Multiple Layout Workers (parallel)
         ↓
    Task Queue
         ↓
Single Inlier Worker (serial)
         ↓
    Results back to layout workers
```

**Components:**
- `InlierAnalysisCoordinator` - Manages serial execution
- Task queue - Queues inlier analysis requests
- Worker thread - Processes tasks one at a time
- Futures - Async communication between layout and inlier workers

**Benefits:**
- Prevents memory explosion from too many concurrent inlier analyses
- Allows multiple layouts to be processed in parallel
- Coordinates resource usage across system

### Dynamic Worker Adjustment

**Monitoring:**
- Memory usage (RAM percentage)
- Swap usage (swap percentage)
- Queue size (backlog of inlier tasks)
- Open file descriptors

**Adjustment triggers:**
- Memory > 85%: Reduce workers
- Swap > 95% AND Memory > 80%: Reduce workers
- Queue size ≥ 3: Reduce layout workers (producers)
- Memory < 75% AND Swap < 80%: Increase workers

**Auto-scaling:**
- Layout workers: Start at min(4, CPU/2), adjust dynamically
- Local workers: Start at CPU-2, adjust dynamically
- OpenAI workers: Set to number of master images

## Important Implementation Details

### Multiprocessing Considerations

**Challenge**: Python multiprocessing requires pickleable functions

**Solutions:**
1. `process_single_master_inlier_analysis()` - Standalone function (not class method)
2. All imports inside function to ensure worker processes have dependencies
3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)

**Memory safety:**
- Feature limiting: Max 10,000-15,000 features per image
- Dynamic worker reduction based on feature count
- Forced garbage collection after processing

### Cost Tracking Integration

**Important**: Cost tracking is DISABLED by default

**Reason**: Avoid repetitive initialization messages from multiprocessing workers

**Integration points:**
- `openai_detector.py`: After every OpenAI API call
- `hybrid_detector.py`: Track all OpenAI operations
- Results JSON: Cost breakdown per layout

**Data flow:**
1. API call made → Extract token usage from response
2. Call `cost_calculator.track_api_call()`
3. Update session totals
4. Generate reports on demand

### Error Handling Patterns

**OpenAI API errors:**
```python
try:
    response = openai_call()
except Exception as e:
    # Automatic retry logic
    # Fallback to alternative method
    # Return error result dict
```

**Memory errors:**
```python
try:
    result = memory_intensive_operation()
except MemoryError:
    # Reduce worker count
    # Force garbage collection
    # Retry with lower concurrency
```

**File descriptor exhaustion:**
```python
except OSError as e:
    if "Too many open files" in str(e):
        # Limit concurrent workers
        # Clean up temp files
        # Force resource release
```

## File Organization

### Core Detection Files
- `hybrid_detector.py` - Hybrid detection (2939 lines)
- `openai_detector.py` - OpenAI detection
- `vector_detector.py` - Vector similarity
- `gemini_detector.py` - Gemini detection

### Panel Splitting Files
- `panel_splitter.py` - Traditional multi-method
- `advanced_splitter.py` - Edge detection
- `simple_splitter.py` - Even division

### Supporting Files
- `cost_calculator.py` - Cost tracking
- `memory_manager.py` - Memory management
- `logging_config.py` - Logging configuration
- `cli.py` - Command-line interface

### Test Files
- `test_hybrid.py` - Hybrid mode tests
- `test_cost_calculator.py` - Cost tracking tests
- `test_split_mode.py` - Panel splitting tests
- `test_panel_accuracy.py` - Panel detection accuracy
- Various tuning and debug scripts

### Data Directories
- `master_images/` - 41 master images to detect
- `layouts/` - 299+ layout images to process
- `results/` - JSON output files
- `embeddings_cache/` - Cached vector embeddings

## Development Guidelines

### Adding New Features

1. **New Detection Mode:**
   - Create new detector class
   - Inherit from base detector if applicable
   - Implement `detect_images_in_layout()` method
   - Add CLI integration in `cli.py`
   - Update tests and documentation

2. **New Panel Splitting Method:**
   - Create new splitter class
   - Implement `split_panels(image_path, panel_count)` method
   - Return list of dicts with keys: `image`, `bounds`, `confidence`, `method`
   - Add CLI flag for selection
   - Test with various panel counts

3. **Cost Tracking for New API:**
   - Add extraction function for token usage
   - Track calls with `cost_calculator.track_api_call()`
   - Update operation types
   - Add to cost reports

### Testing Strategy

1. **Unit tests** - Individual components
2. **Integration tests** - Full detection pipeline
3. **Performance tests** - Memory and speed benchmarks
4. **Accuracy tests** - Panel detection accuracy
5. **Cost tests** - Verify tracking accuracy

### Performance Optimization Tips

1. **Reduce API calls** - Primary cost driver
2. **Cache embeddings** - Avoid regenerating
3. **Limit features** - Prevent memory explosion
4. **Use multiprocessing** - Parallel CPU work
5. **Monitor memory** - Dynamic adjustment
6. **Profile bottlenecks** - Optimize hot paths

### Common Pitfalls

1. **Multiprocessing pickle errors** - Use standalone functions
2. **Memory exhaustion** - Limit concurrent workers
3. **File descriptor limits** - Close files properly
4. **Cost calculator in workers** - Keep in main process only
5. **Swap as error condition** - Swap usage is OK, not error

## Configuration Reference

### Environment Variables
```
OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
```

### Default Values
```python
# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75  # vector mode

# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)

# Memory management
max_memory_percent = 75
max_swap_percent = 80

# Cost tracking
enable_tracking = False  # Must explicitly enable
```

## Output Format Specification

### Results JSON Structure
```json
{
  "metadata": {
    "total_layouts_processed": int,
    "total_master_images": int,
    "master_images_available": [str],
    "provider": str,
    "model": str,
    "panel_threshold": int,
    "inlier_threshold": float,
    "processing_mode": str,
    "cost_tracking": {dict} | null
  },
  "results": {
    "layout_id": {
      "layout_filename": str,
      "detected_master_ids": [str],
      "detected_master_filenames": [str],
      "detection_method": str,
      "panel_count": int,
      "confidence_score": float,
      "panel_analysis": {dict},
      "censorship_analysis": {dict},
      "truncation_applied": bool,
      "deduplication_applied": bool,
      "cost_breakdown": {dict} | null
    }
  }
}
```

## Debugging Tips

### Enable Debug Logging
```python
# In code
import logging
logging.basicConfig(level=logging.DEBUG)

# Via environment
export LOG_LEVEL=DEBUG
```

### Memory Issues
```bash
# Check current memory
python check_system_resources.py

# Test with memory fix
python test_memory_fix.py

# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
```

### Cost Tracking Issues
```bash
# Verify cost tracking works
python test_cost_calculator.py

# Test integration
python test_cost_tracking_integration.py

# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking
```

### Panel Splitting Issues
```bash
# Test splitting accuracy
python test_panel_accuracy.py

# Tune parameters
python tune_14_panel_split.py

# Debug specific layout
python test_6786505_cli.py
```

## API Costs (Current Pricing)

### OpenAI O3 (2025)
- Input tokens: $2.00 / million
- Cached input: $0.50 / million
- Output tokens: $8.00 / million

### Typical Usage
- Hybrid mode: ~$0.01-0.02 per layout
- OpenAI mode: ~$0.02-0.05 per layout
- One-at-a-time: ~$0.50-1.00 per layout

### Cost Optimization
- Hybrid mode: 97.6% reduction vs one-at-a-time
- Caching: Reduces input token costs
- Batch processing: Amortizes overhead

## Future Enhancement Ideas

1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration
2. **Incremental processing** - Resume from saved progress
3. **Web interface** - Browser-based detection and visualization
4. **Active learning** - Use detection results to improve models
5. **Custom training** - Fine-tune models on domain-specific data
6. **Real-time processing** - Stream processing for live detection
7. **Distributed processing** - Multi-machine coordination
8. **Advanced caching** - Persistent result caching across runs

## Contact and Support

For questions or issues:
1. Check logs in `master_adapt_detect_*.log`
2. Review cost reports in `results/`
3. Run diagnostic scripts
4. Check system resources
5. Review error messages carefully

## Version History

Current implementation includes:
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
- Three panel splitting strategies
- Cost tracking and reporting
- Memory management and safety
- Parallel processing with coordination
- Dynamic worker adjustment
- Comprehensive logging and debugging
- Extensive configuration options

Last major update: January 2025