master_adapt_detect/README.md

# Master Adapt Detect

A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.

## Overview

This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:

1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini
3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search
4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis

## Key Features

### Detection Capabilities
- **Multi-strategy detection** - Choose from 4 different detection engines
- **Panel counting** - Automatic detection of number of panels in layouts
- **Censorship detection** - Identifies censored vs uncensored content with CEN refinement
- **Smart matching** - Handles cropped, scaled, rotated, and transformed images
- **Confidence scoring** - Provides match confidence based on panel count and detected matches

### Hybrid Mode (Primary Feature)
- **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection
- **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex
- **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division
- **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching
- **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching
- **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed

### Processing Options
- **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination
- **Memory management** - Dynamic worker adjustment based on system resources
- **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring
- **Batch processing** - Process hundreds of layouts efficiently
- **Progress tracking** - Real-time progress updates with ETA

## Installation

### Prerequisites
- Python 3.8+
- OpenCV
- Google Cloud credentials (for Vector mode)
- OpenAI API key (for OpenAI/Hybrid modes)
- Google AI API key (for Gemini mode)

### Setup

```bash
# Clone the repository
git clone <repository-url>
cd master_adapt_detect

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
#   OPENAI_API_KEY=your_openai_key
#   GOOGLE_API_KEY=your_google_ai_key
#   GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
```

## Usage

### Command Line Interface

The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes.

```bash
# Basic usage - hybrid mode with test
python cli.py --test --hybrid

# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid

# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts

# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time

# Vector mode with similarity search
python cli.py --all --vector

# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
```

### Detection Modes

#### Hybrid Mode (Recommended)
Best balance of speed, cost, and accuracy.

```bash
# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2

# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple

# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced

# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode

# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time
```

#### OpenAI Mode
Full AI-powered detection with optional refinement.

```bash
# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai

# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time

# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement
```

#### Vector Mode
Semantic similarity using embeddings.

```bash
# Process with vector embeddings
python cli.py --all --vector

# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8
```

#### Gemini Mode
Google Gemini 2.5 Pro detection.

```bash
# Standard Gemini detection
python cli.py --limit 10 --gemini
```

### Key Options

**Detection Mode:**
- `--hybrid` - Hybrid detection mode (default)
- `--openai` - OpenAI detection mode
- `--vector` - Vector similarity mode
- `--gemini` - Gemini detection mode

**Processing:**
- `--test` - Test with 1 layout
- `--limit N` - Process first N layouts
- `--all` - Process all layouts
- `--specific-file FILE` - Process specific file

**Hybrid Options:**
- `--panel-threshold N` - Panel threshold for routing (default: 2)
- `--split-simple` - Use simple even division splitting
- `--split-advanced` - Use advanced edge detection splitting
- `--vector-mode` - Use vector similarity instead of inlier analysis
- `--fallback-one-at-a-time` - Enable OpenAI fallback
- `--parallel-layouts` - Enable parallel layout processing
- `--no-truncation` - Disable match truncation to panel count

**Cost Tracking:**
- `--enable-cost-tracking` - Enable cost tracking (disabled by default)
- `--cost-report` - Generate detailed cost report
- `--cost-estimate N` - Estimate monthly cost for N layouts

**Worker Configuration:**
- `--openai-workers N` - OpenAI worker count (default: auto)
- `--local-workers N` - Local analysis workers (default: auto)
- `--layout-workers N` - Parallel layout workers (default: auto)

**Other:**
- `--output NAME` - Custom output filename
- `--help` - Show all options

## Architecture

### Core Components

#### Detection Engines

1. **HybridImageDetector** (`hybrid_detector.py`)
   - Main hybrid detection implementation
   - Routes layouts based on panel count
   - Integrates OpenAI, local analysis, and splitting
   - Handles parallel processing coordination

2. **OpenAIImageDetector** (`openai_detector.py`)
   - OpenAI O3 mini integration
   - Panel counting and censorship detection
   - One-at-a-time and batch detection modes
   - CEN refinement for censored content

3. **VectorDetector** (`vector_detector.py`)
   - Google Vertex AI multimodal embeddings
   - Cosine similarity matching
   - Embedding caching for performance

4. **GeminiDetector** (`gemini_detector.py`)
   - Google Gemini 2.5 Pro integration
   - Visual reasoning and analysis

#### Panel Splitting

1. **PanelSplitter** (`panel_splitter.py`)
   - Multi-method panel splitting
   - Optimized Canny edge detection
   - Hough line transform for separators
   - Tuned for 14-panel detection

2. **AdvancedPanelSplitter** (`advanced_splitter.py`)
   - Edge detection and gutter analysis
   - Sobel gradient detection
   - Configurable percentile thresholds

3. **SimplePanelSplitter** (`simple_splitter.py`)
   - Simple even division
   - Fast horizontal splitting
   - Grid layout support

#### Supporting Systems

1. **Cost Calculator** (`cost_calculator.py`)
   - Tracks OpenAI API usage
   - Per-layout and session cost tracking
   - Monthly cost estimation
   - Detailed JSON reports

2. **Memory Manager** (`memory_manager.py`)
   - Prevents memory exhaustion
   - Dynamic worker adjustment
   - System resource monitoring

3. **Logging Config** (`logging_config.py`)
   - Dual output (terminal + file)
   - Crash tracking
   - System diagnostics

4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`)
   - Serial execution of inlier analysis
   - Task queue management
   - Prevents system overload

### Workflow

#### Hybrid Mode Workflow

1. **OpenAI Analysis** (1 API call)
   - Count panels in layout
   - Detect censorship status
   - Consolidated analysis

2. **Detection Routing**
   - ≤ panel_threshold: Direct local/vector analysis
   - > panel_threshold: Split + local/vector analysis

3. **Local Analysis** (no API calls)
   - OpenCV AKAZE feature detection
   - Multiprocessing for speed
   - RANSAC homography estimation
   - Inlier-based confidence scoring

4. **Post-Processing**
   - CEN refinement (if enabled)
   - Deduplication
   - Truncation to panel count
   - Confidence scoring

5. **Optional Fallback** (if enabled)
   - Triggers when matches < panels
   - OpenAI one-at-a-time detection
   - Additional API calls only when needed

## Directory Structure

```
master_adapt_detect/
├── cli.py                      # Main command-line interface
├── hybrid_detector.py          # Hybrid detection engine
├── openai_detector.py          # OpenAI detection engine
├── vector_detector.py          # Vector similarity engine
├── gemini_detector.py          # Gemini detection engine
├── panel_splitter.py           # Traditional panel splitter
├── advanced_splitter.py        # Advanced edge detection splitter
├── simple_splitter.py          # Simple even division splitter
├── cost_calculator.py          # Cost tracking system
├── memory_manager.py           # Memory management
├── logging_config.py           # Logging configuration
├── requirements.txt            # Python dependencies
├── .env                        # API keys (not in git)
├── master_images/              # Master images to detect (41 images)
├── layouts/                    # Layout images to process (299+ images)
├── results/                    # JSON output files
└── embeddings_cache/           # Cached vector embeddings
```

## Output Format

Results are saved as JSON files with detailed metadata.

### Example Output

```json
{
  "metadata": {
    "total_layouts_processed": 10,
    "total_master_images": 41,
    "provider": "hybrid",
    "model": "openai_o3_plus_local_analysis",
    "panel_threshold": 2,
    "processing_mode": "hybrid"
  },
  "results": {
    "6814786": {
      "layout_filename": "6814786.jpg",
      "detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
      "detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
      "detection_method": "local_inlier_analysis",
      "panel_count": 2,
      "confidence_score": 100.0,
      "panel_analysis": {
        "panel_count": 2,
        "confidence": "high"
      },
      "censorship_analysis": {
        "is_censored": false,
        "confidence": "high"
      }
    }
  }
}
```

## Cost Tracking

Cost tracking monitors OpenAI API usage and provides detailed reports.

### Enable Cost Tracking

```bash
# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking

# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report

# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
```

### Cost Report Output

- **Session summary** - Total cost, tokens, API calls
- **Per-layout breakdown** - Cost for each layout
- **Operation analysis** - Cost by operation type
- **Monthly estimates** - Projected monthly/annual costs
- **JSON reports** - Detailed cost data in `results/`

See `COST_TRACKING_README.md` for complete documentation.

## Performance

### Hybrid Mode Benefits

- **97.6% cost reduction** vs OpenAI one-at-a-time mode
- **1 API call per layout** for panel analysis
- **Zero API calls** for matching (local analysis)
- **Parallel processing** for throughput
- **Memory-safe** with dynamic adjustment

### Benchmarks

- **Simple layouts (≤2 panels)**: ~2-3 seconds per layout
- **Complex layouts (>2 panels)**: ~5-7 seconds per layout
- **Parallel mode**: ~50-100 layouts per minute (system dependent)
- **Memory usage**: Dynamic adjustment prevents exhaustion

## Advanced Features

### Parallel Layout Processing

Process multiple layouts concurrently with coordinated inlier analysis.

```bash
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
```

### CEN Refinement

Automatically switch between censored (CEN) and uncensored versions.

```bash
python cli.py --all --hybrid --cen-refinement
```

### Custom Splitting Parameters

Fine-tune panel splitting behavior.

```bash
# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10

# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
```

### Image Preprocessing

Enhance detection accuracy with preprocessing.

```bash
# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale

# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
```

## Troubleshooting

### Common Issues

**"Cost tracking is disabled"**
- Add `--enable-cost-tracking` flag to enable cost monitoring

**"Memory usage too high"**
- System will auto-adjust workers
- Reduce `--local-workers` or `--layout-workers` manually

**"Too many open files"**
- Reduce concurrent workers
- System will auto-recover and limit workers

**"No matches found"**
- Try different detection modes
- Adjust inlier thresholds
- Enable fallback mode

### Memory Management

The system includes automatic memory management:
- Monitors RAM and swap usage
- Dynamically adjusts worker counts
- Prevents system crashes
- Logs resource usage

### Logging

All processing is logged to both terminal and file:
- Log files: `master_adapt_detect_TIMESTAMP.log`
- Includes system diagnostics
- Crash tracking with full traceback
- Resource usage at crash time

## Development

### Running Tests

```bash
# Test hybrid mode
python test_hybrid.py

# Test cost tracking
python test_cost_calculator.py

# Test panel splitting
python test_split_mode.py
```

### Adding New Detection Modes

1. Create new detector class inheriting from base
2. Implement required methods:
   - `detect_images_in_layout()`
   - `process_all_layouts()`
3. Add CLI integration in `cli.py`
4. Update documentation

## OpenAI Pricing (2025)

- **Input tokens**: $2.00 per million
- **Cached input**: $0.50 per million
- **Output tokens**: $8.00 per million

Hybrid mode achieves significant cost savings by minimizing API calls.

## License

[License information]

## Credits

Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.