michael 380020b8a2 revised documentation, added technical overview

2025-10-01 16:02:40 -05:00

14 KiB

Raw Permalink Blame History

Master Adapt Detect

A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.

Overview

This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:

Hybrid Mode (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
OpenAI Mode - Full AI-powered detection using OpenAI O3 mini
Vector Mode - Google Vertex AI multimodal embeddings for similarity search
Gemini Mode - Google Gemini 2.5 Pro for visual analysis

Key Features

Detection Capabilities

Multi-strategy detection - Choose from 4 different detection engines
Panel counting - Automatic detection of number of panels in layouts
Censorship detection - Identifies censored vs uncensored content with CEN refinement
Smart matching - Handles cropped, scaled, rotated, and transformed images
Confidence scoring - Provides match confidence based on panel count and detected matches

Hybrid Mode (Primary Feature)

Cost optimization - 97.6% reduction in API costs vs one-at-a-time detection
Intelligent routing - Uses local analysis for simple layouts (≤2 panels), split method for complex
Panel splitting - Three splitting strategies: traditional, advanced edge detection, simple division
Local inlier analysis - OpenCV AKAZE features with multiprocessing for fast matching
Vector similarity - Optional Google Vertex AI embeddings for semantic matching
Fallback support - Automatic fallback to OpenAI one-at-a-time when needed

Processing Options

Parallel processing - Concurrent layout processing with serial inlier analysis coordination
Memory management - Dynamic worker adjustment based on system resources
Cost tracking - Comprehensive OpenAI API usage and cost monitoring
Batch processing - Process hundreds of layouts efficiently
Progress tracking - Real-time progress updates with ETA

Installation

Prerequisites

Python 3.8+
OpenCV
Google Cloud credentials (for Vector mode)
OpenAI API key (for OpenAI/Hybrid modes)
Google AI API key (for Gemini mode)

Setup

# Clone the repository
git clone <repository-url>
cd master_adapt_detect

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
#   OPENAI_API_KEY=your_openai_key
#   GOOGLE_API_KEY=your_google_ai_key
#   GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json

Usage

Command Line Interface

The main entry point is cli.py which provides a comprehensive CLI for all detection modes.

# Basic usage - hybrid mode with test
python cli.py --test --hybrid

# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid

# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts

# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time

# Vector mode with similarity search
python cli.py --all --vector

# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report

Detection Modes

Hybrid Mode (Recommended)

Best balance of speed, cost, and accuracy.

# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2

# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple

# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced

# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode

# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time

OpenAI Mode

Full AI-powered detection with optional refinement.

# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai

# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time

# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement

Vector Mode

Semantic similarity using embeddings.

# Process with vector embeddings
python cli.py --all --vector

# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8

Gemini Mode

Google Gemini 2.5 Pro detection.

# Standard Gemini detection
python cli.py --limit 10 --gemini

Key Options

Detection Mode:

--hybrid - Hybrid detection mode (default)
--openai - OpenAI detection mode
--vector - Vector similarity mode
--gemini - Gemini detection mode

Processing:

--test - Test with 1 layout
--limit N - Process first N layouts
--all - Process all layouts
--specific-file FILE - Process specific file

Hybrid Options:

--panel-threshold N - Panel threshold for routing (default: 2)
--split-simple - Use simple even division splitting
--split-advanced - Use advanced edge detection splitting
--vector-mode - Use vector similarity instead of inlier analysis
--fallback-one-at-a-time - Enable OpenAI fallback
--parallel-layouts - Enable parallel layout processing
--no-truncation - Disable match truncation to panel count

Cost Tracking:

--enable-cost-tracking - Enable cost tracking (disabled by default)
--cost-report - Generate detailed cost report
--cost-estimate N - Estimate monthly cost for N layouts

Worker Configuration:

--openai-workers N - OpenAI worker count (default: auto)
--local-workers N - Local analysis workers (default: auto)
--layout-workers N - Parallel layout workers (default: auto)

Other:

--output NAME - Custom output filename
--help - Show all options

Architecture

Core Components

Detection Engines

HybridImageDetector (hybrid_detector.py)
- Main hybrid detection implementation
- Routes layouts based on panel count
- Integrates OpenAI, local analysis, and splitting
- Handles parallel processing coordination
OpenAIImageDetector (openai_detector.py)
- OpenAI O3 mini integration
- Panel counting and censorship detection
- One-at-a-time and batch detection modes
- CEN refinement for censored content
VectorDetector (vector_detector.py)
- Google Vertex AI multimodal embeddings
- Cosine similarity matching
- Embedding caching for performance
GeminiDetector (gemini_detector.py)
- Google Gemini 2.5 Pro integration
- Visual reasoning and analysis

Panel Splitting

PanelSplitter (panel_splitter.py)
- Multi-method panel splitting
- Optimized Canny edge detection
- Hough line transform for separators
- Tuned for 14-panel detection
AdvancedPanelSplitter (advanced_splitter.py)
- Edge detection and gutter analysis
- Sobel gradient detection
- Configurable percentile thresholds
SimplePanelSplitter (simple_splitter.py)
- Simple even division
- Fast horizontal splitting
- Grid layout support

Supporting Systems

Cost Calculator (cost_calculator.py)
- Tracks OpenAI API usage
- Per-layout and session cost tracking
- Monthly cost estimation
- Detailed JSON reports
Memory Manager (memory_manager.py)
- Prevents memory exhaustion
- Dynamic worker adjustment
- System resource monitoring
Logging Config (logging_config.py)
- Dual output (terminal + file)
- Crash tracking
- System diagnostics
InlierAnalysisCoordinator (in hybrid_detector.py)
- Serial execution of inlier analysis
- Task queue management
- Prevents system overload

Workflow

Hybrid Mode Workflow

OpenAI Analysis (1 API call)
- Count panels in layout
- Detect censorship status
- Consolidated analysis
Detection Routing
- ≤ panel_threshold: Direct local/vector analysis
- panel_threshold: Split + local/vector analysis
Local Analysis (no API calls)
- OpenCV AKAZE feature detection
- Multiprocessing for speed
- RANSAC homography estimation
- Inlier-based confidence scoring
Post-Processing
- CEN refinement (if enabled)
- Deduplication
- Truncation to panel count
- Confidence scoring
Optional Fallback (if enabled)
- Triggers when matches < panels
- OpenAI one-at-a-time detection
- Additional API calls only when needed

Directory Structure

master_adapt_detect/
├── cli.py                      # Main command-line interface
├── hybrid_detector.py          # Hybrid detection engine
├── openai_detector.py          # OpenAI detection engine
├── vector_detector.py          # Vector similarity engine
├── gemini_detector.py          # Gemini detection engine
├── panel_splitter.py           # Traditional panel splitter
├── advanced_splitter.py        # Advanced edge detection splitter
├── simple_splitter.py          # Simple even division splitter
├── cost_calculator.py          # Cost tracking system
├── memory_manager.py           # Memory management
├── logging_config.py           # Logging configuration
├── requirements.txt            # Python dependencies
├── .env                        # API keys (not in git)
├── master_images/              # Master images to detect (41 images)
├── layouts/                    # Layout images to process (299+ images)
├── results/                    # JSON output files
└── embeddings_cache/           # Cached vector embeddings

Output Format

Results are saved as JSON files with detailed metadata.

Example Output

{
  "metadata": {
    "total_layouts_processed": 10,
    "total_master_images": 41,
    "provider": "hybrid",
    "model": "openai_o3_plus_local_analysis",
    "panel_threshold": 2,
    "processing_mode": "hybrid"
  },
  "results": {
    "6814786": {
      "layout_filename": "6814786.jpg",
      "detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
      "detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
      "detection_method": "local_inlier_analysis",
      "panel_count": 2,
      "confidence_score": 100.0,
      "panel_analysis": {
        "panel_count": 2,
        "confidence": "high"
      },
      "censorship_analysis": {
        "is_censored": false,
        "confidence": "high"
      }
    }
  }
}

Cost Tracking

Cost tracking monitors OpenAI API usage and provides detailed reports.

Enable Cost Tracking

# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking

# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report

# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

Cost Report Output

Session summary - Total cost, tokens, API calls
Per-layout breakdown - Cost for each layout
Operation analysis - Cost by operation type
Monthly estimates - Projected monthly/annual costs
JSON reports - Detailed cost data in results/

See COST_TRACKING_README.md for complete documentation.

Performance

Hybrid Mode Benefits

97.6% cost reduction vs OpenAI one-at-a-time mode
1 API call per layout for panel analysis
Zero API calls for matching (local analysis)
Parallel processing for throughput
Memory-safe with dynamic adjustment

Benchmarks

Simple layouts (≤2 panels): ~2-3 seconds per layout
Complex layouts (>2 panels): ~5-7 seconds per layout
Parallel mode: ~50-100 layouts per minute (system dependent)
Memory usage: Dynamic adjustment prevents exhaustion

Advanced Features

Parallel Layout Processing

Process multiple layouts concurrently with coordinated inlier analysis.

python cli.py --all --hybrid --parallel-layouts --layout-workers 4

CEN Refinement

Automatically switch between censored (CEN) and uncensored versions.

python cli.py --all --hybrid --cen-refinement

Custom Splitting Parameters

Fine-tune panel splitting behavior.

# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10

# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5

Image Preprocessing

Enhance detection accuracy with preprocessing.

# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale

# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5

Troubleshooting

Common Issues

"Cost tracking is disabled"

Add --enable-cost-tracking flag to enable cost monitoring

"Memory usage too high"

System will auto-adjust workers
Reduce --local-workers or --layout-workers manually

"Too many open files"

Reduce concurrent workers
System will auto-recover and limit workers

"No matches found"

Try different detection modes
Adjust inlier thresholds
Enable fallback mode

Memory Management

The system includes automatic memory management:

Monitors RAM and swap usage
Dynamically adjusts worker counts
Prevents system crashes
Logs resource usage

Logging

All processing is logged to both terminal and file:

Log files: master_adapt_detect_TIMESTAMP.log
Includes system diagnostics
Crash tracking with full traceback
Resource usage at crash time

Development

Running Tests

# Test hybrid mode
python test_hybrid.py

# Test cost tracking
python test_cost_calculator.py

# Test panel splitting
python test_split_mode.py

Adding New Detection Modes

Create new detector class inheriting from base
Implement required methods:
- detect_images_in_layout()
- process_all_layouts()
Add CLI integration in cli.py
Update documentation

OpenAI Pricing (2025)

Input tokens: $2.00 per million
Cached input: $0.50 per million
Output tokens: $8.00 per million

Hybrid mode achieves significant cost savings by minimizing API calls.

License

[License information]

Credits

Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.

14 KiB Raw Permalink Blame History