master_adapt_detect/claude.md

17 KiB

Master Adapt Detect - Developer Documentation

For AI Assistants and Developers

This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.

Project Purpose

Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.

Core Architecture

System Design Philosophy

The system follows a multi-strategy detection approach with these design principles:

  1. Cost Optimization - Minimize API costs while maintaining accuracy
  2. Flexibility - Support multiple detection engines for different use cases
  3. Performance - Parallel processing with memory management
  4. Robustness - Automatic fallbacks and error recovery

Detection Modes

The system provides 4 detection modes, each with specific use cases:

  • File: hybrid_detector.py (2939 lines)
  • Purpose: Balance speed, cost, and accuracy
  • Strategy: OpenAI O3 for panel analysis + local CV for matching
  • Cost: ~1 API call per layout (97.6% reduction vs one-at-a-time)

How it works:

  1. Single OpenAI API call to count panels and detect censorship
  2. Route based on panel count:
    • ≤ threshold: Direct local inlier analysis
    • threshold: Split layout first, then inlier analysis on each panel

  3. Post-process with deduplication, CEN refinement, truncation
  4. Optional fallback to OpenAI one-at-a-time if insufficient matches

Key classes:

  • HybridImageDetector - Main orchestrator
  • InlierAnalysisCoordinator - Serial execution coordinator for parallel mode
  • ProgressTracker - Thread-safe progress monitoring

2. OpenAI Mode

  • File: openai_detector.py
  • Purpose: Pure AI-powered detection
  • Strategy: GPT-4 vision for direct image comparison
  • Cost: 1-41 API calls per layout depending on mode

Modes:

  • Standard: All masters in one API call
  • One-at-a-time: Separate API call per master (expensive but thorough)

3. Vector Mode

  • File: vector_detector.py
  • Purpose: Semantic similarity matching
  • Strategy: Google Vertex AI multimodal embeddings (1408 dimensions)
  • Cost: No OpenAI costs, uses Google Cloud

Features:

  • Embedding caching for performance
  • Cosine similarity matching
  • Threshold-based filtering

4. Gemini Mode

  • File: gemini_detector.py
  • Purpose: Alternative AI detection
  • Strategy: Google Gemini 2.5 Pro visual reasoning
  • Cost: Google AI API (not OpenAI)

Panel Splitting Strategies

The system provides 3 panel splitting approaches for complex multi-panel layouts:

1. Traditional Multi-Method Splitter

  • File: panel_splitter.py (857 lines)
  • Strategy: Optimized Canny edge detection + Hough transform
  • Tuning: Specifically tuned for 14-panel detection
  • Parameters: Thresholds, kernel sizes, line detection params

2. Advanced Edge Detection Splitter

  • File: advanced_splitter.py (200+ lines)
  • Strategy: Sobel gradient analysis + gutter detection
  • Parameters:
    • percentile: Low-energy column threshold (default: 10)
    • min_gap: Minimum gutter width (default: 5)

3. Simple Even Division Splitter

  • File: simple_splitter.py (132 lines)
  • Strategy: Equal division based on panel count
  • Use case: Fast processing when layout is regular grid

Supporting Systems

Cost Calculator

  • File: cost_calculator.py (440 lines)
  • Purpose: Track OpenAI API usage and costs
  • Features:
    • Per-layout cost breakdown
    • Session summaries
    • Monthly estimation
    • JSON report generation
  • Important: Disabled by default, requires --enable-cost-tracking flag

Data structures:

  • TokenUsage - Track token counts for single API call
  • ApiCallCost - Cost info for single API call
  • LayoutCostSummary - Aggregated cost for one layout
  • CostCalculator - Main tracking class

Memory Manager

  • File: memory_manager.py (119 lines)
  • Purpose: Prevent system crashes from memory exhaustion
  • Features:
    • RAM and swap monitoring
    • Dynamic worker adjustment
    • Safe execution decorators
    • Feature count limiting

Thresholds:

  • Max memory: 80% (configurable)
  • Max swap: 80% (warning only, doesn't throttle)

Logging Configuration

  • File: logging_config.py (128 lines)
  • Purpose: Dual output (terminal + file) for debugging crashes
  • Features:
    • Timestamped log files
    • Exception tracking with resource usage
    • System diagnostics on startup

Command-Line Interface

  • File: cli.py
  • Purpose: Unified interface for all detection modes
  • Features:
    • Argument parsing for all modes
    • Mode-specific configuration
    • Results aggregation
    • Cost reporting

Key command patterns:

# Detection mode selection
--hybrid / --openai / --vector / --gemini

# Processing scope
--test / --limit N / --all / --specific-file FILE

# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts

# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N

Key Algorithms

Local Inlier Analysis (Hybrid Mode)

Algorithm: OpenCV AKAZE features + RANSAC homography estimation

Process:

  1. Detect AKAZE keypoints in layout and master images
  2. Match descriptors using brute-force matcher with Hamming distance
  3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
  4. Estimate homography using RANSAC (threshold: 7.0)
  5. Count inliers and calculate confidence

Thresholds:

  • min_good_matches: 10 (minimum matches before RANSAC)
  • inlier_threshold: 0.65 (relative to best match)
  • inlier_ratio_threshold: 0.4 (minimum inlier ratio)

Confidence levels:

  • High: ≥30 inliers, ≥50% ratio
  • Medium: ≥15 inliers, ≥30% ratio
  • Low: Below medium thresholds

Implementation: process_single_master_inlier_analysis() function (standalone for multiprocessing)

Vector Similarity Analysis

Algorithm: Cosine similarity on 1408-dimensional embeddings

Process:

  1. Generate embedding for layout using Vertex AI
  2. Compare against cached master embeddings
  3. Calculate cosine similarity for each master
  4. Filter by threshold (default: 0.75)
  5. Sort by similarity descending

Formula:

similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))

Caching: Embeddings stored in embeddings_cache/master_embeddings.pkl

Panel Splitting (Canny Detection)

Algorithm: Multi-threshold Canny + Hough line transform

Process:

  1. Apply Canny edge detection at multiple thresholds:
    • (50, 150), (100, 200), (150, 250)
  2. Morphological closing with (3, 1) kernel
  3. Combine edge maps with maximum operation
  4. Hough line transform for horizontal lines:
    • Threshold: 1324
    • Min length: 3530
    • Max gap: 1059
  5. Filter for nearly horizontal lines (< 5% slope)
  6. Create panel bounds from separator positions

Tuning: Parameters specifically optimized for 14-panel detection accuracy

CEN Refinement

Algorithm: Censorship-aware master image selection

Process:

  1. Detect if layout is censored (OpenAI analysis)
  2. For each detected CEN (censored) master:
    • If layout is uncensored and non-CEN version exists: Switch to non-CEN
    • If layout is censored or no alternative: Keep CEN version
  3. Update results with refinement metadata

Naming convention: *CEN* in master ID indicates censored version

Parallel Processing Architecture

Serial Inlier Analysis Coordinator

Problem: Parallel inlier analysis causes memory exhaustion and crashes

Solution: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing

Architecture:

Multiple Layout Workers (parallel)
         ↓
    Task Queue
         ↓
Single Inlier Worker (serial)
         ↓
    Results back to layout workers

Components:

  • InlierAnalysisCoordinator - Manages serial execution
  • Task queue - Queues inlier analysis requests
  • Worker thread - Processes tasks one at a time
  • Futures - Async communication between layout and inlier workers

Benefits:

  • Prevents memory explosion from too many concurrent inlier analyses
  • Allows multiple layouts to be processed in parallel
  • Coordinates resource usage across system

Dynamic Worker Adjustment

Monitoring:

  • Memory usage (RAM percentage)
  • Swap usage (swap percentage)
  • Queue size (backlog of inlier tasks)
  • Open file descriptors

Adjustment triggers:

  • Memory > 85%: Reduce workers
  • Swap > 95% AND Memory > 80%: Reduce workers
  • Queue size ≥ 3: Reduce layout workers (producers)
  • Memory < 75% AND Swap < 80%: Increase workers

Auto-scaling:

  • Layout workers: Start at min(4, CPU/2), adjust dynamically
  • Local workers: Start at CPU-2, adjust dynamically
  • OpenAI workers: Set to number of master images

Important Implementation Details

Multiprocessing Considerations

Challenge: Python multiprocessing requires pickleable functions

Solutions:

  1. process_single_master_inlier_analysis() - Standalone function (not class method)
  2. All imports inside function to ensure worker processes have dependencies
  3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)

Memory safety:

  • Feature limiting: Max 10,000-15,000 features per image
  • Dynamic worker reduction based on feature count
  • Forced garbage collection after processing

Cost Tracking Integration

Important: Cost tracking is DISABLED by default

Reason: Avoid repetitive initialization messages from multiprocessing workers

Integration points:

  • openai_detector.py: After every OpenAI API call
  • hybrid_detector.py: Track all OpenAI operations
  • Results JSON: Cost breakdown per layout

Data flow:

  1. API call made → Extract token usage from response
  2. Call cost_calculator.track_api_call()
  3. Update session totals
  4. Generate reports on demand

Error Handling Patterns

OpenAI API errors:

try:
    response = openai_call()
except Exception as e:
    # Automatic retry logic
    # Fallback to alternative method
    # Return error result dict

Memory errors:

try:
    result = memory_intensive_operation()
except MemoryError:
    # Reduce worker count
    # Force garbage collection
    # Retry with lower concurrency

File descriptor exhaustion:

except OSError as e:
    if "Too many open files" in str(e):
        # Limit concurrent workers
        # Clean up temp files
        # Force resource release

File Organization

Core Detection Files

  • hybrid_detector.py - Hybrid detection (2939 lines)
  • openai_detector.py - OpenAI detection
  • vector_detector.py - Vector similarity
  • gemini_detector.py - Gemini detection

Panel Splitting Files

  • panel_splitter.py - Traditional multi-method
  • advanced_splitter.py - Edge detection
  • simple_splitter.py - Even division

Supporting Files

  • cost_calculator.py - Cost tracking
  • memory_manager.py - Memory management
  • logging_config.py - Logging configuration
  • cli.py - Command-line interface

Test Files

  • test_hybrid.py - Hybrid mode tests
  • test_cost_calculator.py - Cost tracking tests
  • test_split_mode.py - Panel splitting tests
  • test_panel_accuracy.py - Panel detection accuracy
  • Various tuning and debug scripts

Data Directories

  • master_images/ - 41 master images to detect
  • layouts/ - 299+ layout images to process
  • results/ - JSON output files
  • embeddings_cache/ - Cached vector embeddings

Development Guidelines

Adding New Features

  1. New Detection Mode:

    • Create new detector class
    • Inherit from base detector if applicable
    • Implement detect_images_in_layout() method
    • Add CLI integration in cli.py
    • Update tests and documentation
  2. New Panel Splitting Method:

    • Create new splitter class
    • Implement split_panels(image_path, panel_count) method
    • Return list of dicts with keys: image, bounds, confidence, method
    • Add CLI flag for selection
    • Test with various panel counts
  3. Cost Tracking for New API:

    • Add extraction function for token usage
    • Track calls with cost_calculator.track_api_call()
    • Update operation types
    • Add to cost reports

Testing Strategy

  1. Unit tests - Individual components
  2. Integration tests - Full detection pipeline
  3. Performance tests - Memory and speed benchmarks
  4. Accuracy tests - Panel detection accuracy
  5. Cost tests - Verify tracking accuracy

Performance Optimization Tips

  1. Reduce API calls - Primary cost driver
  2. Cache embeddings - Avoid regenerating
  3. Limit features - Prevent memory explosion
  4. Use multiprocessing - Parallel CPU work
  5. Monitor memory - Dynamic adjustment
  6. Profile bottlenecks - Optimize hot paths

Common Pitfalls

  1. Multiprocessing pickle errors - Use standalone functions
  2. Memory exhaustion - Limit concurrent workers
  3. File descriptor limits - Close files properly
  4. Cost calculator in workers - Keep in main process only
  5. Swap as error condition - Swap usage is OK, not error

Configuration Reference

Environment Variables

OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON

Default Values

# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75  # vector mode

# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)

# Memory management
max_memory_percent = 75
max_swap_percent = 80

# Cost tracking
enable_tracking = False  # Must explicitly enable

Output Format Specification

Results JSON Structure

{
  "metadata": {
    "total_layouts_processed": int,
    "total_master_images": int,
    "master_images_available": [str],
    "provider": str,
    "model": str,
    "panel_threshold": int,
    "inlier_threshold": float,
    "processing_mode": str,
    "cost_tracking": {dict} | null
  },
  "results": {
    "layout_id": {
      "layout_filename": str,
      "detected_master_ids": [str],
      "detected_master_filenames": [str],
      "detection_method": str,
      "panel_count": int,
      "confidence_score": float,
      "panel_analysis": {dict},
      "censorship_analysis": {dict},
      "truncation_applied": bool,
      "deduplication_applied": bool,
      "cost_breakdown": {dict} | null
    }
  }
}

Debugging Tips

Enable Debug Logging

# In code
import logging
logging.basicConfig(level=logging.DEBUG)

# Via environment
export LOG_LEVEL=DEBUG

Memory Issues

# Check current memory
python check_system_resources.py

# Test with memory fix
python test_memory_fix.py

# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1

Cost Tracking Issues

# Verify cost tracking works
python test_cost_calculator.py

# Test integration
python test_cost_tracking_integration.py

# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking

Panel Splitting Issues

# Test splitting accuracy
python test_panel_accuracy.py

# Tune parameters
python tune_14_panel_split.py

# Debug specific layout
python test_6786505_cli.py

API Costs (Current Pricing)

OpenAI O3 (2025)

  • Input tokens: $2.00 / million
  • Cached input: $0.50 / million
  • Output tokens: $8.00 / million

Typical Usage

  • Hybrid mode: ~$0.01-0.02 per layout
  • OpenAI mode: ~$0.02-0.05 per layout
  • One-at-a-time: ~$0.50-1.00 per layout

Cost Optimization

  • Hybrid mode: 97.6% reduction vs one-at-a-time
  • Caching: Reduces input token costs
  • Batch processing: Amortizes overhead

Future Enhancement Ideas

  1. Multi-GPU support - Parallel inlier analysis with GPU acceleration
  2. Incremental processing - Resume from saved progress
  3. Web interface - Browser-based detection and visualization
  4. Active learning - Use detection results to improve models
  5. Custom training - Fine-tune models on domain-specific data
  6. Real-time processing - Stream processing for live detection
  7. Distributed processing - Multi-machine coordination
  8. Advanced caching - Persistent result caching across runs

Contact and Support

For questions or issues:

  1. Check logs in master_adapt_detect_*.log
  2. Review cost reports in results/
  3. Run diagnostic scripts
  4. Check system resources
  5. Review error messages carefully

Version History

Current implementation includes:

  • Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
  • Three panel splitting strategies
  • Cost tracking and reporting
  • Memory management and safety
  • Parallel processing with coordination
  • Dynamic worker adjustment
  • Comprehensive logging and debugging
  • Extensive configuration options

Last major update: January 2025