master_adapt_detect/README.md

14 KiB

Master Adapt Detect

A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.

Overview

This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:

  1. Hybrid Mode (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
  2. OpenAI Mode - Full AI-powered detection using OpenAI O3 mini
  3. Vector Mode - Google Vertex AI multimodal embeddings for similarity search
  4. Gemini Mode - Google Gemini 2.5 Pro for visual analysis

Key Features

Detection Capabilities

  • Multi-strategy detection - Choose from 4 different detection engines
  • Panel counting - Automatic detection of number of panels in layouts
  • Censorship detection - Identifies censored vs uncensored content with CEN refinement
  • Smart matching - Handles cropped, scaled, rotated, and transformed images
  • Confidence scoring - Provides match confidence based on panel count and detected matches

Hybrid Mode (Primary Feature)

  • Cost optimization - 97.6% reduction in API costs vs one-at-a-time detection
  • Intelligent routing - Uses local analysis for simple layouts (≤2 panels), split method for complex
  • Panel splitting - Three splitting strategies: traditional, advanced edge detection, simple division
  • Local inlier analysis - OpenCV AKAZE features with multiprocessing for fast matching
  • Vector similarity - Optional Google Vertex AI embeddings for semantic matching
  • Fallback support - Automatic fallback to OpenAI one-at-a-time when needed

Processing Options

  • Parallel processing - Concurrent layout processing with serial inlier analysis coordination
  • Memory management - Dynamic worker adjustment based on system resources
  • Cost tracking - Comprehensive OpenAI API usage and cost monitoring
  • Batch processing - Process hundreds of layouts efficiently
  • Progress tracking - Real-time progress updates with ETA

Installation

Prerequisites

  • Python 3.8+
  • OpenCV
  • Google Cloud credentials (for Vector mode)
  • OpenAI API key (for OpenAI/Hybrid modes)
  • Google AI API key (for Gemini mode)

Setup

# Clone the repository
git clone <repository-url>
cd master_adapt_detect

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
#   OPENAI_API_KEY=your_openai_key
#   GOOGLE_API_KEY=your_google_ai_key
#   GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json

Usage

Command Line Interface

The main entry point is cli.py which provides a comprehensive CLI for all detection modes.

# Basic usage - hybrid mode with test
python cli.py --test --hybrid

# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid

# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts

# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time

# Vector mode with similarity search
python cli.py --all --vector

# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report

Detection Modes

Best balance of speed, cost, and accuracy.

# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2

# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple

# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced

# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode

# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time

OpenAI Mode

Full AI-powered detection with optional refinement.

# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai

# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time

# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement

Vector Mode

Semantic similarity using embeddings.

# Process with vector embeddings
python cli.py --all --vector

# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8

Gemini Mode

Google Gemini 2.5 Pro detection.

# Standard Gemini detection
python cli.py --limit 10 --gemini

Key Options

Detection Mode:

  • --hybrid - Hybrid detection mode (default)
  • --openai - OpenAI detection mode
  • --vector - Vector similarity mode
  • --gemini - Gemini detection mode

Processing:

  • --test - Test with 1 layout
  • --limit N - Process first N layouts
  • --all - Process all layouts
  • --specific-file FILE - Process specific file

Hybrid Options:

  • --panel-threshold N - Panel threshold for routing (default: 2)
  • --split-simple - Use simple even division splitting
  • --split-advanced - Use advanced edge detection splitting
  • --vector-mode - Use vector similarity instead of inlier analysis
  • --fallback-one-at-a-time - Enable OpenAI fallback
  • --parallel-layouts - Enable parallel layout processing
  • --no-truncation - Disable match truncation to panel count

Cost Tracking:

  • --enable-cost-tracking - Enable cost tracking (disabled by default)
  • --cost-report - Generate detailed cost report
  • --cost-estimate N - Estimate monthly cost for N layouts

Worker Configuration:

  • --openai-workers N - OpenAI worker count (default: auto)
  • --local-workers N - Local analysis workers (default: auto)
  • --layout-workers N - Parallel layout workers (default: auto)

Other:

  • --output NAME - Custom output filename
  • --help - Show all options

Architecture

Core Components

Detection Engines

  1. HybridImageDetector (hybrid_detector.py)

    • Main hybrid detection implementation
    • Routes layouts based on panel count
    • Integrates OpenAI, local analysis, and splitting
    • Handles parallel processing coordination
  2. OpenAIImageDetector (openai_detector.py)

    • OpenAI O3 mini integration
    • Panel counting and censorship detection
    • One-at-a-time and batch detection modes
    • CEN refinement for censored content
  3. VectorDetector (vector_detector.py)

    • Google Vertex AI multimodal embeddings
    • Cosine similarity matching
    • Embedding caching for performance
  4. GeminiDetector (gemini_detector.py)

    • Google Gemini 2.5 Pro integration
    • Visual reasoning and analysis

Panel Splitting

  1. PanelSplitter (panel_splitter.py)

    • Multi-method panel splitting
    • Optimized Canny edge detection
    • Hough line transform for separators
    • Tuned for 14-panel detection
  2. AdvancedPanelSplitter (advanced_splitter.py)

    • Edge detection and gutter analysis
    • Sobel gradient detection
    • Configurable percentile thresholds
  3. SimplePanelSplitter (simple_splitter.py)

    • Simple even division
    • Fast horizontal splitting
    • Grid layout support

Supporting Systems

  1. Cost Calculator (cost_calculator.py)

    • Tracks OpenAI API usage
    • Per-layout and session cost tracking
    • Monthly cost estimation
    • Detailed JSON reports
  2. Memory Manager (memory_manager.py)

    • Prevents memory exhaustion
    • Dynamic worker adjustment
    • System resource monitoring
  3. Logging Config (logging_config.py)

    • Dual output (terminal + file)
    • Crash tracking
    • System diagnostics
  4. InlierAnalysisCoordinator (in hybrid_detector.py)

    • Serial execution of inlier analysis
    • Task queue management
    • Prevents system overload

Workflow

Hybrid Mode Workflow

  1. OpenAI Analysis (1 API call)

    • Count panels in layout
    • Detect censorship status
    • Consolidated analysis
  2. Detection Routing

    • ≤ panel_threshold: Direct local/vector analysis
    • panel_threshold: Split + local/vector analysis

  3. Local Analysis (no API calls)

    • OpenCV AKAZE feature detection
    • Multiprocessing for speed
    • RANSAC homography estimation
    • Inlier-based confidence scoring
  4. Post-Processing

    • CEN refinement (if enabled)
    • Deduplication
    • Truncation to panel count
    • Confidence scoring
  5. Optional Fallback (if enabled)

    • Triggers when matches < panels
    • OpenAI one-at-a-time detection
    • Additional API calls only when needed

Directory Structure

master_adapt_detect/
├── cli.py                      # Main command-line interface
├── hybrid_detector.py          # Hybrid detection engine
├── openai_detector.py          # OpenAI detection engine
├── vector_detector.py          # Vector similarity engine
├── gemini_detector.py          # Gemini detection engine
├── panel_splitter.py           # Traditional panel splitter
├── advanced_splitter.py        # Advanced edge detection splitter
├── simple_splitter.py          # Simple even division splitter
├── cost_calculator.py          # Cost tracking system
├── memory_manager.py           # Memory management
├── logging_config.py           # Logging configuration
├── requirements.txt            # Python dependencies
├── .env                        # API keys (not in git)
├── master_images/              # Master images to detect (41 images)
├── layouts/                    # Layout images to process (299+ images)
├── results/                    # JSON output files
└── embeddings_cache/           # Cached vector embeddings

Output Format

Results are saved as JSON files with detailed metadata.

Example Output

{
  "metadata": {
    "total_layouts_processed": 10,
    "total_master_images": 41,
    "provider": "hybrid",
    "model": "openai_o3_plus_local_analysis",
    "panel_threshold": 2,
    "processing_mode": "hybrid"
  },
  "results": {
    "6814786": {
      "layout_filename": "6814786.jpg",
      "detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
      "detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
      "detection_method": "local_inlier_analysis",
      "panel_count": 2,
      "confidence_score": 100.0,
      "panel_analysis": {
        "panel_count": 2,
        "confidence": "high"
      },
      "censorship_analysis": {
        "is_censored": false,
        "confidence": "high"
      }
    }
  }
}

Cost Tracking

Cost tracking monitors OpenAI API usage and provides detailed reports.

Enable Cost Tracking

# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking

# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report

# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

Cost Report Output

  • Session summary - Total cost, tokens, API calls
  • Per-layout breakdown - Cost for each layout
  • Operation analysis - Cost by operation type
  • Monthly estimates - Projected monthly/annual costs
  • JSON reports - Detailed cost data in results/

See COST_TRACKING_README.md for complete documentation.

Performance

Hybrid Mode Benefits

  • 97.6% cost reduction vs OpenAI one-at-a-time mode
  • 1 API call per layout for panel analysis
  • Zero API calls for matching (local analysis)
  • Parallel processing for throughput
  • Memory-safe with dynamic adjustment

Benchmarks

  • Simple layouts (≤2 panels): ~2-3 seconds per layout
  • Complex layouts (>2 panels): ~5-7 seconds per layout
  • Parallel mode: ~50-100 layouts per minute (system dependent)
  • Memory usage: Dynamic adjustment prevents exhaustion

Advanced Features

Parallel Layout Processing

Process multiple layouts concurrently with coordinated inlier analysis.

python cli.py --all --hybrid --parallel-layouts --layout-workers 4

CEN Refinement

Automatically switch between censored (CEN) and uncensored versions.

python cli.py --all --hybrid --cen-refinement

Custom Splitting Parameters

Fine-tune panel splitting behavior.

# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10

# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5

Image Preprocessing

Enhance detection accuracy with preprocessing.

# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale

# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5

Troubleshooting

Common Issues

"Cost tracking is disabled"

  • Add --enable-cost-tracking flag to enable cost monitoring

"Memory usage too high"

  • System will auto-adjust workers
  • Reduce --local-workers or --layout-workers manually

"Too many open files"

  • Reduce concurrent workers
  • System will auto-recover and limit workers

"No matches found"

  • Try different detection modes
  • Adjust inlier thresholds
  • Enable fallback mode

Memory Management

The system includes automatic memory management:

  • Monitors RAM and swap usage
  • Dynamically adjusts worker counts
  • Prevents system crashes
  • Logs resource usage

Logging

All processing is logged to both terminal and file:

  • Log files: master_adapt_detect_TIMESTAMP.log
  • Includes system diagnostics
  • Crash tracking with full traceback
  • Resource usage at crash time

Development

Running Tests

# Test hybrid mode
python test_hybrid.py

# Test cost tracking
python test_cost_calculator.py

# Test panel splitting
python test_split_mode.py

Adding New Detection Modes

  1. Create new detector class inheriting from base
  2. Implement required methods:
    • detect_images_in_layout()
    • process_all_layouts()
  3. Add CLI integration in cli.py
  4. Update documentation

OpenAI Pricing (2025)

  • Input tokens: $2.00 per million
  • Cached input: $0.50 per million
  • Output tokens: $8.00 per million

Hybrid mode achieves significant cost savings by minimizing API calls.

License

[License information]

Credits

Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.