michael 380020b8a2 revised documentation, added technical overview

2025-10-01 16:02:40 -05:00

17 KiB

Raw Permalink Blame History

Master Adapt Detect - Developer Documentation

For AI Assistants and Developers

This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.

Project Purpose

Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.

Core Architecture

System Design Philosophy

The system follows a multi-strategy detection approach with these design principles:

Cost Optimization - Minimize API costs while maintaining accuracy
Flexibility - Support multiple detection engines for different use cases
Performance - Parallel processing with memory management
Robustness - Automatic fallbacks and error recovery

Detection Modes

The system provides 4 detection modes, each with specific use cases:

1. Hybrid Mode (Primary/Recommended)

File: hybrid_detector.py (2939 lines)
Purpose: Balance speed, cost, and accuracy
Strategy: OpenAI O3 for panel analysis + local CV for matching
Cost: ~1 API call per layout (97.6% reduction vs one-at-a-time)

How it works:

Single OpenAI API call to count panels and detect censorship
Route based on panel count:
- ≤ threshold: Direct local inlier analysis
- threshold: Split layout first, then inlier analysis on each panel
Post-process with deduplication, CEN refinement, truncation
Optional fallback to OpenAI one-at-a-time if insufficient matches

Key classes:

HybridImageDetector - Main orchestrator
InlierAnalysisCoordinator - Serial execution coordinator for parallel mode
ProgressTracker - Thread-safe progress monitoring

2. OpenAI Mode

File: openai_detector.py
Purpose: Pure AI-powered detection
Strategy: GPT-4 vision for direct image comparison
Cost: 1-41 API calls per layout depending on mode

Modes:

Standard: All masters in one API call
One-at-a-time: Separate API call per master (expensive but thorough)

3. Vector Mode

File: vector_detector.py
Purpose: Semantic similarity matching
Strategy: Google Vertex AI multimodal embeddings (1408 dimensions)
Cost: No OpenAI costs, uses Google Cloud

Features:

Embedding caching for performance
Cosine similarity matching
Threshold-based filtering

4. Gemini Mode

File: gemini_detector.py
Purpose: Alternative AI detection
Strategy: Google Gemini 2.5 Pro visual reasoning
Cost: Google AI API (not OpenAI)

Panel Splitting Strategies

The system provides 3 panel splitting approaches for complex multi-panel layouts:

1. Traditional Multi-Method Splitter

File: panel_splitter.py (857 lines)
Strategy: Optimized Canny edge detection + Hough transform
Tuning: Specifically tuned for 14-panel detection
Parameters: Thresholds, kernel sizes, line detection params

2. Advanced Edge Detection Splitter

File: advanced_splitter.py (200+ lines)
Strategy: Sobel gradient analysis + gutter detection
Parameters:
- percentile: Low-energy column threshold (default: 10)
- min_gap: Minimum gutter width (default: 5)

3. Simple Even Division Splitter

File: simple_splitter.py (132 lines)
Strategy: Equal division based on panel count
Use case: Fast processing when layout is regular grid

Supporting Systems

Cost Calculator

File: cost_calculator.py (440 lines)
Purpose: Track OpenAI API usage and costs
Features:
- Per-layout cost breakdown
- Session summaries
- Monthly estimation
- JSON report generation
Important: Disabled by default, requires --enable-cost-tracking flag

Data structures:

TokenUsage - Track token counts for single API call
ApiCallCost - Cost info for single API call
LayoutCostSummary - Aggregated cost for one layout
CostCalculator - Main tracking class

Memory Manager

File: memory_manager.py (119 lines)
Purpose: Prevent system crashes from memory exhaustion
Features:
- RAM and swap monitoring
- Dynamic worker adjustment
- Safe execution decorators
- Feature count limiting

Thresholds:

Max memory: 80% (configurable)
Max swap: 80% (warning only, doesn't throttle)

Logging Configuration

File: logging_config.py (128 lines)
Purpose: Dual output (terminal + file) for debugging crashes
Features:
- Timestamped log files
- Exception tracking with resource usage
- System diagnostics on startup

Command-Line Interface

File: cli.py
Purpose: Unified interface for all detection modes
Features:
- Argument parsing for all modes
- Mode-specific configuration
- Results aggregation
- Cost reporting

Key command patterns:

# Detection mode selection
--hybrid / --openai / --vector / --gemini

# Processing scope
--test / --limit N / --all / --specific-file FILE

# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts

# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N

Key Algorithms

Local Inlier Analysis (Hybrid Mode)

Algorithm: OpenCV AKAZE features + RANSAC homography estimation

Process:

Detect AKAZE keypoints in layout and master images
Match descriptors using brute-force matcher with Hamming distance
Apply Lowe's ratio test (threshold: 0.80) to filter good matches
Estimate homography using RANSAC (threshold: 7.0)
Count inliers and calculate confidence

Thresholds:

min_good_matches: 10 (minimum matches before RANSAC)
inlier_threshold: 0.65 (relative to best match)
inlier_ratio_threshold: 0.4 (minimum inlier ratio)

Confidence levels:

High: ≥30 inliers, ≥50% ratio
Medium: ≥15 inliers, ≥30% ratio
Low: Below medium thresholds

Implementation: process_single_master_inlier_analysis() function (standalone for multiprocessing)

Vector Similarity Analysis

Algorithm: Cosine similarity on 1408-dimensional embeddings

Process:

Generate embedding for layout using Vertex AI
Compare against cached master embeddings
Calculate cosine similarity for each master
Filter by threshold (default: 0.75)
Sort by similarity descending

Formula:

similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))

Caching: Embeddings stored in embeddings_cache/master_embeddings.pkl

Panel Splitting (Canny Detection)

Algorithm: Multi-threshold Canny + Hough line transform

Process:

Apply Canny edge detection at multiple thresholds:
- (50, 150), (100, 200), (150, 250)
Morphological closing with (3, 1) kernel
Combine edge maps with maximum operation
Hough line transform for horizontal lines:
- Threshold: 1324
- Min length: 3530
- Max gap: 1059
Filter for nearly horizontal lines (< 5% slope)
Create panel bounds from separator positions

Tuning: Parameters specifically optimized for 14-panel detection accuracy

CEN Refinement

Algorithm: Censorship-aware master image selection

Process:

Detect if layout is censored (OpenAI analysis)
For each detected CEN (censored) master:
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
- If layout is censored or no alternative: Keep CEN version
Update results with refinement metadata

Naming convention: *CEN* in master ID indicates censored version

Parallel Processing Architecture

Serial Inlier Analysis Coordinator

Problem: Parallel inlier analysis causes memory exhaustion and crashes

Solution: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing

Architecture:

Multiple Layout Workers (parallel)
         ↓
    Task Queue
         ↓
Single Inlier Worker (serial)
         ↓
    Results back to layout workers

Components:

InlierAnalysisCoordinator - Manages serial execution
Task queue - Queues inlier analysis requests
Worker thread - Processes tasks one at a time
Futures - Async communication between layout and inlier workers

Benefits:

Prevents memory explosion from too many concurrent inlier analyses
Allows multiple layouts to be processed in parallel
Coordinates resource usage across system

Dynamic Worker Adjustment

Monitoring:

Memory usage (RAM percentage)
Swap usage (swap percentage)
Queue size (backlog of inlier tasks)
Open file descriptors

Adjustment triggers:

Memory > 85%: Reduce workers
Swap > 95% AND Memory > 80%: Reduce workers
Queue size ≥ 3: Reduce layout workers (producers)
Memory < 75% AND Swap < 80%: Increase workers

Auto-scaling:

Layout workers: Start at min(4, CPU/2), adjust dynamically
Local workers: Start at CPU-2, adjust dynamically
OpenAI workers: Set to number of master images

Important Implementation Details

Multiprocessing Considerations

Challenge: Python multiprocessing requires pickleable functions

Solutions:

process_single_master_inlier_analysis() - Standalone function (not class method)
All imports inside function to ensure worker processes have dependencies
Cost calculator NOT imported in multiprocessing functions (causes pickle errors)

Memory safety:

Feature limiting: Max 10,000-15,000 features per image
Dynamic worker reduction based on feature count
Forced garbage collection after processing

Cost Tracking Integration

Important: Cost tracking is DISABLED by default

Reason: Avoid repetitive initialization messages from multiprocessing workers

Integration points:

openai_detector.py: After every OpenAI API call
hybrid_detector.py: Track all OpenAI operations
Results JSON: Cost breakdown per layout

Data flow:

API call made → Extract token usage from response
Call cost_calculator.track_api_call()
Update session totals
Generate reports on demand

Error Handling Patterns

OpenAI API errors:

try:
    response = openai_call()
except Exception as e:
    # Automatic retry logic
    # Fallback to alternative method
    # Return error result dict

Memory errors:

try:
    result = memory_intensive_operation()
except MemoryError:
    # Reduce worker count
    # Force garbage collection
    # Retry with lower concurrency

File descriptor exhaustion:

except OSError as e:
    if "Too many open files" in str(e):
        # Limit concurrent workers
        # Clean up temp files
        # Force resource release

File Organization

Core Detection Files

hybrid_detector.py - Hybrid detection (2939 lines)
openai_detector.py - OpenAI detection
vector_detector.py - Vector similarity
gemini_detector.py - Gemini detection

Panel Splitting Files

panel_splitter.py - Traditional multi-method
advanced_splitter.py - Edge detection
simple_splitter.py - Even division

Supporting Files

cost_calculator.py - Cost tracking
memory_manager.py - Memory management
logging_config.py - Logging configuration
cli.py - Command-line interface

Test Files

test_hybrid.py - Hybrid mode tests
test_cost_calculator.py - Cost tracking tests
test_split_mode.py - Panel splitting tests
test_panel_accuracy.py - Panel detection accuracy
Various tuning and debug scripts

Data Directories

master_images/ - 41 master images to detect
layouts/ - 299+ layout images to process
results/ - JSON output files
embeddings_cache/ - Cached vector embeddings

Development Guidelines

Adding New Features

New Detection Mode:
- Create new detector class
- Inherit from base detector if applicable
- Implement detect_images_in_layout() method
- Add CLI integration in cli.py
- Update tests and documentation
New Panel Splitting Method:
- Create new splitter class
- Implement split_panels(image_path, panel_count) method
- Return list of dicts with keys: image, bounds, confidence, method
- Add CLI flag for selection
- Test with various panel counts
Cost Tracking for New API:
- Add extraction function for token usage
- Track calls with cost_calculator.track_api_call()
- Update operation types
- Add to cost reports

Testing Strategy

Unit tests - Individual components
Integration tests - Full detection pipeline
Performance tests - Memory and speed benchmarks
Accuracy tests - Panel detection accuracy
Cost tests - Verify tracking accuracy

Performance Optimization Tips

Reduce API calls - Primary cost driver
Cache embeddings - Avoid regenerating
Limit features - Prevent memory explosion
Use multiprocessing - Parallel CPU work
Monitor memory - Dynamic adjustment
Profile bottlenecks - Optimize hot paths

Common Pitfalls

Multiprocessing pickle errors - Use standalone functions
Memory exhaustion - Limit concurrent workers
File descriptor limits - Close files properly
Cost calculator in workers - Keep in main process only
Swap as error condition - Swap usage is OK, not error

Configuration Reference

Environment Variables

OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON

Default Values

# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75  # vector mode

# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)

# Memory management
max_memory_percent = 75
max_swap_percent = 80

# Cost tracking
enable_tracking = False  # Must explicitly enable

Output Format Specification

Results JSON Structure

{
  "metadata": {
    "total_layouts_processed": int,
    "total_master_images": int,
    "master_images_available": [str],
    "provider": str,
    "model": str,
    "panel_threshold": int,
    "inlier_threshold": float,
    "processing_mode": str,
    "cost_tracking": {dict} | null
  },
  "results": {
    "layout_id": {
      "layout_filename": str,
      "detected_master_ids": [str],
      "detected_master_filenames": [str],
      "detection_method": str,
      "panel_count": int,
      "confidence_score": float,
      "panel_analysis": {dict},
      "censorship_analysis": {dict},
      "truncation_applied": bool,
      "deduplication_applied": bool,
      "cost_breakdown": {dict} | null
    }
  }
}

Debugging Tips

Enable Debug Logging

# In code
import logging
logging.basicConfig(level=logging.DEBUG)

# Via environment
export LOG_LEVEL=DEBUG

Memory Issues

# Check current memory
python check_system_resources.py

# Test with memory fix
python test_memory_fix.py

# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1

Cost Tracking Issues

# Verify cost tracking works
python test_cost_calculator.py

# Test integration
python test_cost_tracking_integration.py

# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking

Panel Splitting Issues

# Test splitting accuracy
python test_panel_accuracy.py

# Tune parameters
python tune_14_panel_split.py

# Debug specific layout
python test_6786505_cli.py

API Costs (Current Pricing)

OpenAI O3 (2025)

Input tokens: $2.00 / million
Cached input: $0.50 / million
Output tokens: $8.00 / million

Typical Usage

Hybrid mode: ~$0.01-0.02 per layout
OpenAI mode: ~$0.02-0.05 per layout
One-at-a-time: ~$0.50-1.00 per layout

Cost Optimization

Hybrid mode: 97.6% reduction vs one-at-a-time
Caching: Reduces input token costs
Batch processing: Amortizes overhead

Future Enhancement Ideas

Multi-GPU support - Parallel inlier analysis with GPU acceleration
Incremental processing - Resume from saved progress
Web interface - Browser-based detection and visualization
Active learning - Use detection results to improve models
Custom training - Fine-tune models on domain-specific data
Real-time processing - Stream processing for live detection
Distributed processing - Multi-machine coordination
Advanced caching - Persistent result caching across runs

Contact and Support

For questions or issues:

Check logs in master_adapt_detect_*.log
Review cost reports in results/
Run diagnostic scripts
Check system resources
Review error messages carefully

Version History

Current implementation includes:

Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
Three panel splitting strategies
Cost tracking and reporting
Memory management and safety
Parallel processing with coordination
Dynamic worker adjustment
Comprehensive logging and debugging
Extensive configuration options

Last major update: January 2025

17 KiB Raw Permalink Blame History

Master Adapt Detect - Developer Documentation

For AI Assistants and Developers

Project Purpose

Core Architecture

System Design Philosophy

Detection Modes

1. Hybrid Mode (Primary/Recommended)

2. OpenAI Mode

3. Vector Mode

4. Gemini Mode

Panel Splitting Strategies

1. Traditional Multi-Method Splitter

2. Advanced Edge Detection Splitter

3. Simple Even Division Splitter

Supporting Systems

Cost Calculator

Memory Manager

Logging Configuration

Command-Line Interface

Key Algorithms

Local Inlier Analysis (Hybrid Mode)

Vector Similarity Analysis

Panel Splitting (Canny Detection)

CEN Refinement

Parallel Processing Architecture

Serial Inlier Analysis Coordinator

Dynamic Worker Adjustment

Important Implementation Details

Multiprocessing Considerations

Cost Tracking Integration

Error Handling Patterns

File Organization

Core Detection Files

Panel Splitting Files

Supporting Files

Test Files

Data Directories

Development Guidelines

Adding New Features

Testing Strategy

Performance Optimization Tips

Common Pitfalls

Configuration Reference

Environment Variables

Default Values

Output Format Specification

Results JSON Structure

Debugging Tips

Enable Debug Logging

Memory Issues

Cost Tracking Issues

Panel Splitting Issues

API Costs (Current Pricing)

OpenAI O3 (2025)

Typical Usage

Cost Optimization

Future Enhancement Ideas

Contact and Support

Version History

17 KiB

Raw Permalink Blame History