revised documentation, added technical overview
This commit is contained in:
parent
69f2f4cbe9
commit
380020b8a2
4 changed files with 2666 additions and 67 deletions
521
README.md
521
README.md
|
|
@ -1,118 +1,505 @@
|
|||
# Master Image Detection Application
|
||||
# Master Adapt Detect
|
||||
|
||||
This application uses Google Gemini 2.5 Pro API to detect which master images appear in layout images.
|
||||
A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.
|
||||
|
||||
## Features
|
||||
## Overview
|
||||
|
||||
- **Filename-based IDs**: Master images are identified by their filenames (without .jpg extension)
|
||||
- **Comprehensive Detection**: Finds exact matches, cropped versions, scaled/rotated images
|
||||
- **Detailed Results**: JSON output with layout filenames and detected master filenames
|
||||
- **Optimized Processing**: Sequential processing with master images uploaded only once
|
||||
- **Progress Tracking**: Real-time progress updates and periodic saves during batch processing
|
||||
- **Error Handling**: Automatic retries and graceful error recovery
|
||||
This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:
|
||||
|
||||
## Setup
|
||||
1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
|
||||
2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini
|
||||
3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search
|
||||
4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis
|
||||
|
||||
1. **Install Dependencies**:
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
## Key Features
|
||||
|
||||
2. **Configure API Key**:
|
||||
- API key is already set in `.env` file
|
||||
- Ensure `.env` file exists with your Gemini API key
|
||||
### Detection Capabilities
|
||||
- **Multi-strategy detection** - Choose from 4 different detection engines
|
||||
- **Panel counting** - Automatic detection of number of panels in layouts
|
||||
- **Censorship detection** - Identifies censored vs uncensored content with CEN refinement
|
||||
- **Smart matching** - Handles cropped, scaled, rotated, and transformed images
|
||||
- **Confidence scoring** - Provides match confidence based on panel count and detected matches
|
||||
|
||||
### Hybrid Mode (Primary Feature)
|
||||
- **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection
|
||||
- **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex
|
||||
- **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division
|
||||
- **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching
|
||||
- **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching
|
||||
- **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed
|
||||
|
||||
### Processing Options
|
||||
- **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination
|
||||
- **Memory management** - Dynamic worker adjustment based on system resources
|
||||
- **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring
|
||||
- **Batch processing** - Process hundreds of layouts efficiently
|
||||
- **Progress tracking** - Real-time progress updates with ETA
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.8+
|
||||
- OpenCV
|
||||
- Google Cloud credentials (for Vector mode)
|
||||
- OpenAI API key (for OpenAI/Hybrid modes)
|
||||
- Google AI API key (for Gemini mode)
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone <repository-url>
|
||||
cd master_adapt_detect
|
||||
|
||||
# Create virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Configure API keys
|
||||
cp .env.example .env
|
||||
# Edit .env and add your API keys:
|
||||
# OPENAI_API_KEY=your_openai_key
|
||||
# GOOGLE_API_KEY=your_google_ai_key
|
||||
# GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Activate the virtual environment first:
|
||||
### Command Line Interface
|
||||
|
||||
The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes.
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
# Basic usage - hybrid mode with test
|
||||
python cli.py --test --hybrid
|
||||
|
||||
# Process first 10 layouts in hybrid mode
|
||||
python cli.py --limit 10 --hybrid
|
||||
|
||||
# Process all layouts with parallel processing
|
||||
python cli.py --all --hybrid --parallel-layouts
|
||||
|
||||
# OpenAI mode with one-at-a-time comparison
|
||||
python cli.py --limit 10 --openai --one-at-a-time
|
||||
|
||||
# Vector mode with similarity search
|
||||
python cli.py --all --vector
|
||||
|
||||
# Enable cost tracking
|
||||
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
|
||||
```
|
||||
|
||||
### Command Line Options
|
||||
### Detection Modes
|
||||
|
||||
#### Hybrid Mode (Recommended)
|
||||
Best balance of speed, cost, and accuracy.
|
||||
|
||||
```bash
|
||||
# Test with 1 layout
|
||||
python image_detector.py --test
|
||||
# Simple layouts (≤2 panels) use local analysis
|
||||
python cli.py --all --hybrid --panel-threshold 2
|
||||
|
||||
# Process first 10 layouts
|
||||
python image_detector.py --limit 10
|
||||
# With panel splitting for complex layouts
|
||||
python cli.py --all --hybrid --split-simple
|
||||
|
||||
# Process all layouts
|
||||
python image_detector.py --all
|
||||
# Advanced edge detection splitting
|
||||
python cli.py --all --hybrid --split-advanced
|
||||
|
||||
# Custom output filename
|
||||
python image_detector.py --limit 50 --output my_batch_results
|
||||
# Vector similarity instead of inlier analysis
|
||||
python cli.py --all --hybrid --vector-mode
|
||||
|
||||
# Process all layouts (sequential but optimized)
|
||||
python image_detector.py --all
|
||||
|
||||
# Custom paths
|
||||
python image_detector.py --all --master-path /path/to/masters --layout-path /path/to/layouts
|
||||
# With fallback to OpenAI if needed
|
||||
python cli.py --all --hybrid --fallback-one-at-a-time
|
||||
```
|
||||
|
||||
### Help
|
||||
#### OpenAI Mode
|
||||
Full AI-powered detection with optional refinement.
|
||||
|
||||
```bash
|
||||
python image_detector.py --help
|
||||
# Standard mode (all masters in one API call)
|
||||
python cli.py --limit 10 --openai
|
||||
|
||||
# One-at-a-time mode (one API call per master)
|
||||
python cli.py --limit 10 --openai --one-at-a-time
|
||||
|
||||
# With CEN refinement for censorship handling
|
||||
python cli.py --limit 10 --openai --cen-refinement
|
||||
```
|
||||
|
||||
### Common Commands
|
||||
#### Vector Mode
|
||||
Semantic similarity using embeddings.
|
||||
|
||||
```bash
|
||||
# Quick test
|
||||
python image_detector.py --test
|
||||
# Process with vector embeddings
|
||||
python cli.py --all --vector
|
||||
|
||||
# Small batch
|
||||
python image_detector.py --limit 10
|
||||
# Adjust similarity threshold
|
||||
python cli.py --all --vector --similarity-threshold 0.8
|
||||
```
|
||||
|
||||
# Full processing (all 306 layouts) - optimized sequential
|
||||
python image_detector.py --all
|
||||
#### Gemini Mode
|
||||
Google Gemini 2.5 Pro detection.
|
||||
|
||||
```bash
|
||||
# Standard Gemini detection
|
||||
python cli.py --limit 10 --gemini
|
||||
```
|
||||
|
||||
### Key Options
|
||||
|
||||
**Detection Mode:**
|
||||
- `--hybrid` - Hybrid detection mode (default)
|
||||
- `--openai` - OpenAI detection mode
|
||||
- `--vector` - Vector similarity mode
|
||||
- `--gemini` - Gemini detection mode
|
||||
|
||||
**Processing:**
|
||||
- `--test` - Test with 1 layout
|
||||
- `--limit N` - Process first N layouts
|
||||
- `--all` - Process all layouts
|
||||
- `--specific-file FILE` - Process specific file
|
||||
|
||||
**Hybrid Options:**
|
||||
- `--panel-threshold N` - Panel threshold for routing (default: 2)
|
||||
- `--split-simple` - Use simple even division splitting
|
||||
- `--split-advanced` - Use advanced edge detection splitting
|
||||
- `--vector-mode` - Use vector similarity instead of inlier analysis
|
||||
- `--fallback-one-at-a-time` - Enable OpenAI fallback
|
||||
- `--parallel-layouts` - Enable parallel layout processing
|
||||
- `--no-truncation` - Disable match truncation to panel count
|
||||
|
||||
**Cost Tracking:**
|
||||
- `--enable-cost-tracking` - Enable cost tracking (disabled by default)
|
||||
- `--cost-report` - Generate detailed cost report
|
||||
- `--cost-estimate N` - Estimate monthly cost for N layouts
|
||||
|
||||
**Worker Configuration:**
|
||||
- `--openai-workers N` - OpenAI worker count (default: auto)
|
||||
- `--local-workers N` - Local analysis workers (default: auto)
|
||||
- `--layout-workers N` - Parallel layout workers (default: auto)
|
||||
|
||||
**Other:**
|
||||
- `--output NAME` - Custom output filename
|
||||
- `--help` - Show all options
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
#### Detection Engines
|
||||
|
||||
1. **HybridImageDetector** (`hybrid_detector.py`)
|
||||
- Main hybrid detection implementation
|
||||
- Routes layouts based on panel count
|
||||
- Integrates OpenAI, local analysis, and splitting
|
||||
- Handles parallel processing coordination
|
||||
|
||||
2. **OpenAIImageDetector** (`openai_detector.py`)
|
||||
- OpenAI O3 mini integration
|
||||
- Panel counting and censorship detection
|
||||
- One-at-a-time and batch detection modes
|
||||
- CEN refinement for censored content
|
||||
|
||||
3. **VectorDetector** (`vector_detector.py`)
|
||||
- Google Vertex AI multimodal embeddings
|
||||
- Cosine similarity matching
|
||||
- Embedding caching for performance
|
||||
|
||||
4. **GeminiDetector** (`gemini_detector.py`)
|
||||
- Google Gemini 2.5 Pro integration
|
||||
- Visual reasoning and analysis
|
||||
|
||||
#### Panel Splitting
|
||||
|
||||
1. **PanelSplitter** (`panel_splitter.py`)
|
||||
- Multi-method panel splitting
|
||||
- Optimized Canny edge detection
|
||||
- Hough line transform for separators
|
||||
- Tuned for 14-panel detection
|
||||
|
||||
2. **AdvancedPanelSplitter** (`advanced_splitter.py`)
|
||||
- Edge detection and gutter analysis
|
||||
- Sobel gradient detection
|
||||
- Configurable percentile thresholds
|
||||
|
||||
3. **SimplePanelSplitter** (`simple_splitter.py`)
|
||||
- Simple even division
|
||||
- Fast horizontal splitting
|
||||
- Grid layout support
|
||||
|
||||
#### Supporting Systems
|
||||
|
||||
1. **Cost Calculator** (`cost_calculator.py`)
|
||||
- Tracks OpenAI API usage
|
||||
- Per-layout and session cost tracking
|
||||
- Monthly cost estimation
|
||||
- Detailed JSON reports
|
||||
|
||||
2. **Memory Manager** (`memory_manager.py`)
|
||||
- Prevents memory exhaustion
|
||||
- Dynamic worker adjustment
|
||||
- System resource monitoring
|
||||
|
||||
3. **Logging Config** (`logging_config.py`)
|
||||
- Dual output (terminal + file)
|
||||
- Crash tracking
|
||||
- System diagnostics
|
||||
|
||||
4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`)
|
||||
- Serial execution of inlier analysis
|
||||
- Task queue management
|
||||
- Prevents system overload
|
||||
|
||||
### Workflow
|
||||
|
||||
#### Hybrid Mode Workflow
|
||||
|
||||
1. **OpenAI Analysis** (1 API call)
|
||||
- Count panels in layout
|
||||
- Detect censorship status
|
||||
- Consolidated analysis
|
||||
|
||||
2. **Detection Routing**
|
||||
- ≤ panel_threshold: Direct local/vector analysis
|
||||
- > panel_threshold: Split + local/vector analysis
|
||||
|
||||
3. **Local Analysis** (no API calls)
|
||||
- OpenCV AKAZE feature detection
|
||||
- Multiprocessing for speed
|
||||
- RANSAC homography estimation
|
||||
- Inlier-based confidence scoring
|
||||
|
||||
4. **Post-Processing**
|
||||
- CEN refinement (if enabled)
|
||||
- Deduplication
|
||||
- Truncation to panel count
|
||||
- Confidence scoring
|
||||
|
||||
5. **Optional Fallback** (if enabled)
|
||||
- Triggers when matches < panels
|
||||
- OpenAI one-at-a-time detection
|
||||
- Additional API calls only when needed
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
master_adapt_detect/
|
||||
├── cli.py # Main command-line interface
|
||||
├── hybrid_detector.py # Hybrid detection engine
|
||||
├── openai_detector.py # OpenAI detection engine
|
||||
├── vector_detector.py # Vector similarity engine
|
||||
├── gemini_detector.py # Gemini detection engine
|
||||
├── panel_splitter.py # Traditional panel splitter
|
||||
├── advanced_splitter.py # Advanced edge detection splitter
|
||||
├── simple_splitter.py # Simple even division splitter
|
||||
├── cost_calculator.py # Cost tracking system
|
||||
├── memory_manager.py # Memory management
|
||||
├── logging_config.py # Logging configuration
|
||||
├── requirements.txt # Python dependencies
|
||||
├── .env # API keys (not in git)
|
||||
├── master_images/ # Master images to detect (41 images)
|
||||
├── layouts/ # Layout images to process (299+ images)
|
||||
├── results/ # JSON output files
|
||||
└── embeddings_cache/ # Cached vector embeddings
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
Results are saved as JSON with this structure:
|
||||
Results are saved as JSON files with detailed metadata.
|
||||
|
||||
### Example Output
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"total_layouts_processed": 1,
|
||||
"total_layouts_processed": 10,
|
||||
"total_master_images": 41,
|
||||
"master_images_available": ["1011A_1011_05", "1011A_1011_06", ...]
|
||||
"provider": "hybrid",
|
||||
"model": "openai_o3_plus_local_analysis",
|
||||
"panel_threshold": 2,
|
||||
"processing_mode": "hybrid"
|
||||
},
|
||||
"results": {
|
||||
"6814786": {
|
||||
"layout_filename": "6814786.jpg",
|
||||
"detected_master_ids": ["1011A_1011_05"],
|
||||
"detected_master_filenames": ["1011A_1011_05.jpg"],
|
||||
"analysis": "Detailed analysis of what was found..."
|
||||
"detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
|
||||
"detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
|
||||
"detection_method": "local_inlier_analysis",
|
||||
"panel_count": 2,
|
||||
"confidence_score": 100.0,
|
||||
"panel_analysis": {
|
||||
"panel_count": 2,
|
||||
"confidence": "high"
|
||||
},
|
||||
"censorship_analysis": {
|
||||
"is_censored": false,
|
||||
"confidence": "high"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Key Output Fields
|
||||
## Cost Tracking
|
||||
|
||||
- **layout_filename**: The layout image filename
|
||||
- **detected_master_ids**: Master image IDs (filenames without .jpg)
|
||||
- **detected_master_filenames**: Full master image filenames with .jpg extension
|
||||
- **analysis**: Gemini's detailed explanation of the detection
|
||||
Cost tracking monitors OpenAI API usage and provides detailed reports.
|
||||
|
||||
## Directory Structure
|
||||
### Enable Cost Tracking
|
||||
|
||||
```
|
||||
├── master_images/ # 41 master images to detect
|
||||
├── layouts/ # 299+ layout images to analyze
|
||||
├── results/ # JSON output files
|
||||
├── venv/ # Python virtual environment
|
||||
├── image_detector.py # Main application
|
||||
├── test_simple.py # API connection tester
|
||||
├── requirements.txt # Dependencies
|
||||
└── .env # API configuration
|
||||
```bash
|
||||
# Enable tracking
|
||||
python cli.py --test --hybrid --enable-cost-tracking
|
||||
|
||||
# With detailed report
|
||||
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
|
||||
|
||||
# With monthly estimate
|
||||
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
|
||||
```
|
||||
|
||||
## Example Results
|
||||
### Cost Report Output
|
||||
|
||||
Layout `6814786.jpg` contains master image `1011A_1011_05.jpg` (cropped version).
|
||||
- **Session summary** - Total cost, tokens, API calls
|
||||
- **Per-layout breakdown** - Cost for each layout
|
||||
- **Operation analysis** - Cost by operation type
|
||||
- **Monthly estimates** - Projected monthly/annual costs
|
||||
- **JSON reports** - Detailed cost data in `results/`
|
||||
|
||||
See `COST_TRACKING_README.md` for complete documentation.
|
||||
|
||||
## Performance
|
||||
|
||||
### Hybrid Mode Benefits
|
||||
|
||||
- **97.6% cost reduction** vs OpenAI one-at-a-time mode
|
||||
- **1 API call per layout** for panel analysis
|
||||
- **Zero API calls** for matching (local analysis)
|
||||
- **Parallel processing** for throughput
|
||||
- **Memory-safe** with dynamic adjustment
|
||||
|
||||
### Benchmarks
|
||||
|
||||
- **Simple layouts (≤2 panels)**: ~2-3 seconds per layout
|
||||
- **Complex layouts (>2 panels)**: ~5-7 seconds per layout
|
||||
- **Parallel mode**: ~50-100 layouts per minute (system dependent)
|
||||
- **Memory usage**: Dynamic adjustment prevents exhaustion
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Parallel Layout Processing
|
||||
|
||||
Process multiple layouts concurrently with coordinated inlier analysis.
|
||||
|
||||
```bash
|
||||
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
|
||||
```
|
||||
|
||||
### CEN Refinement
|
||||
|
||||
Automatically switch between censored (CEN) and uncensored versions.
|
||||
|
||||
```bash
|
||||
python cli.py --all --hybrid --cen-refinement
|
||||
```
|
||||
|
||||
### Custom Splitting Parameters
|
||||
|
||||
Fine-tune panel splitting behavior.
|
||||
|
||||
```bash
|
||||
# Advanced splitter with custom thresholds
|
||||
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10
|
||||
|
||||
# Adjust inlier thresholds
|
||||
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
|
||||
```
|
||||
|
||||
### Image Preprocessing
|
||||
|
||||
Enhance detection accuracy with preprocessing.
|
||||
|
||||
```bash
|
||||
# Greyscale conversion
|
||||
python cli.py --all --hybrid --enable-greyscale
|
||||
|
||||
# Contrast enhancement
|
||||
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**"Cost tracking is disabled"**
|
||||
- Add `--enable-cost-tracking` flag to enable cost monitoring
|
||||
|
||||
**"Memory usage too high"**
|
||||
- System will auto-adjust workers
|
||||
- Reduce `--local-workers` or `--layout-workers` manually
|
||||
|
||||
**"Too many open files"**
|
||||
- Reduce concurrent workers
|
||||
- System will auto-recover and limit workers
|
||||
|
||||
**"No matches found"**
|
||||
- Try different detection modes
|
||||
- Adjust inlier thresholds
|
||||
- Enable fallback mode
|
||||
|
||||
### Memory Management
|
||||
|
||||
The system includes automatic memory management:
|
||||
- Monitors RAM and swap usage
|
||||
- Dynamically adjusts worker counts
|
||||
- Prevents system crashes
|
||||
- Logs resource usage
|
||||
|
||||
### Logging
|
||||
|
||||
All processing is logged to both terminal and file:
|
||||
- Log files: `master_adapt_detect_TIMESTAMP.log`
|
||||
- Includes system diagnostics
|
||||
- Crash tracking with full traceback
|
||||
- Resource usage at crash time
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Test hybrid mode
|
||||
python test_hybrid.py
|
||||
|
||||
# Test cost tracking
|
||||
python test_cost_calculator.py
|
||||
|
||||
# Test panel splitting
|
||||
python test_split_mode.py
|
||||
```
|
||||
|
||||
### Adding New Detection Modes
|
||||
|
||||
1. Create new detector class inheriting from base
|
||||
2. Implement required methods:
|
||||
- `detect_images_in_layout()`
|
||||
- `process_all_layouts()`
|
||||
3. Add CLI integration in `cli.py`
|
||||
4. Update documentation
|
||||
|
||||
## OpenAI Pricing (2025)
|
||||
|
||||
- **Input tokens**: $2.00 per million
|
||||
- **Cached input**: $0.50 per million
|
||||
- **Output tokens**: $8.00 per million
|
||||
|
||||
Hybrid mode achieves significant cost savings by minimizing API calls.
|
||||
|
||||
## License
|
||||
|
||||
[License information]
|
||||
|
||||
## Credits
|
||||
|
||||
Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.
|
||||
|
|
|
|||
594
claude.md
Normal file
594
claude.md
Normal file
|
|
@ -0,0 +1,594 @@
|
|||
# Master Adapt Detect - Developer Documentation
|
||||
|
||||
## For AI Assistants and Developers
|
||||
|
||||
This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.
|
||||
|
||||
## Project Purpose
|
||||
|
||||
Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.
|
||||
|
||||
## Core Architecture
|
||||
|
||||
### System Design Philosophy
|
||||
|
||||
The system follows a **multi-strategy detection** approach with these design principles:
|
||||
|
||||
1. **Cost Optimization** - Minimize API costs while maintaining accuracy
|
||||
2. **Flexibility** - Support multiple detection engines for different use cases
|
||||
3. **Performance** - Parallel processing with memory management
|
||||
4. **Robustness** - Automatic fallbacks and error recovery
|
||||
|
||||
### Detection Modes
|
||||
|
||||
The system provides 4 detection modes, each with specific use cases:
|
||||
|
||||
#### 1. Hybrid Mode (Primary/Recommended)
|
||||
- **File**: `hybrid_detector.py` (2939 lines)
|
||||
- **Purpose**: Balance speed, cost, and accuracy
|
||||
- **Strategy**: OpenAI O3 for panel analysis + local CV for matching
|
||||
- **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time)
|
||||
|
||||
**How it works:**
|
||||
1. Single OpenAI API call to count panels and detect censorship
|
||||
2. Route based on panel count:
|
||||
- ≤ threshold: Direct local inlier analysis
|
||||
- > threshold: Split layout first, then inlier analysis on each panel
|
||||
3. Post-process with deduplication, CEN refinement, truncation
|
||||
4. Optional fallback to OpenAI one-at-a-time if insufficient matches
|
||||
|
||||
**Key classes:**
|
||||
- `HybridImageDetector` - Main orchestrator
|
||||
- `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode
|
||||
- `ProgressTracker` - Thread-safe progress monitoring
|
||||
|
||||
#### 2. OpenAI Mode
|
||||
- **File**: `openai_detector.py`
|
||||
- **Purpose**: Pure AI-powered detection
|
||||
- **Strategy**: GPT-4 vision for direct image comparison
|
||||
- **Cost**: 1-41 API calls per layout depending on mode
|
||||
|
||||
**Modes:**
|
||||
- Standard: All masters in one API call
|
||||
- One-at-a-time: Separate API call per master (expensive but thorough)
|
||||
|
||||
#### 3. Vector Mode
|
||||
- **File**: `vector_detector.py`
|
||||
- **Purpose**: Semantic similarity matching
|
||||
- **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions)
|
||||
- **Cost**: No OpenAI costs, uses Google Cloud
|
||||
|
||||
**Features:**
|
||||
- Embedding caching for performance
|
||||
- Cosine similarity matching
|
||||
- Threshold-based filtering
|
||||
|
||||
#### 4. Gemini Mode
|
||||
- **File**: `gemini_detector.py`
|
||||
- **Purpose**: Alternative AI detection
|
||||
- **Strategy**: Google Gemini 2.5 Pro visual reasoning
|
||||
- **Cost**: Google AI API (not OpenAI)
|
||||
|
||||
### Panel Splitting Strategies
|
||||
|
||||
The system provides 3 panel splitting approaches for complex multi-panel layouts:
|
||||
|
||||
#### 1. Traditional Multi-Method Splitter
|
||||
- **File**: `panel_splitter.py` (857 lines)
|
||||
- **Strategy**: Optimized Canny edge detection + Hough transform
|
||||
- **Tuning**: Specifically tuned for 14-panel detection
|
||||
- **Parameters**: Thresholds, kernel sizes, line detection params
|
||||
|
||||
#### 2. Advanced Edge Detection Splitter
|
||||
- **File**: `advanced_splitter.py` (200+ lines)
|
||||
- **Strategy**: Sobel gradient analysis + gutter detection
|
||||
- **Parameters**:
|
||||
- `percentile`: Low-energy column threshold (default: 10)
|
||||
- `min_gap`: Minimum gutter width (default: 5)
|
||||
|
||||
#### 3. Simple Even Division Splitter
|
||||
- **File**: `simple_splitter.py` (132 lines)
|
||||
- **Strategy**: Equal division based on panel count
|
||||
- **Use case**: Fast processing when layout is regular grid
|
||||
|
||||
### Supporting Systems
|
||||
|
||||
#### Cost Calculator
|
||||
- **File**: `cost_calculator.py` (440 lines)
|
||||
- **Purpose**: Track OpenAI API usage and costs
|
||||
- **Features**:
|
||||
- Per-layout cost breakdown
|
||||
- Session summaries
|
||||
- Monthly estimation
|
||||
- JSON report generation
|
||||
- **Important**: Disabled by default, requires `--enable-cost-tracking` flag
|
||||
|
||||
**Data structures:**
|
||||
- `TokenUsage` - Track token counts for single API call
|
||||
- `ApiCallCost` - Cost info for single API call
|
||||
- `LayoutCostSummary` - Aggregated cost for one layout
|
||||
- `CostCalculator` - Main tracking class
|
||||
|
||||
#### Memory Manager
|
||||
- **File**: `memory_manager.py` (119 lines)
|
||||
- **Purpose**: Prevent system crashes from memory exhaustion
|
||||
- **Features**:
|
||||
- RAM and swap monitoring
|
||||
- Dynamic worker adjustment
|
||||
- Safe execution decorators
|
||||
- Feature count limiting
|
||||
|
||||
**Thresholds:**
|
||||
- Max memory: 80% (configurable)
|
||||
- Max swap: 80% (warning only, doesn't throttle)
|
||||
|
||||
#### Logging Configuration
|
||||
- **File**: `logging_config.py` (128 lines)
|
||||
- **Purpose**: Dual output (terminal + file) for debugging crashes
|
||||
- **Features**:
|
||||
- Timestamped log files
|
||||
- Exception tracking with resource usage
|
||||
- System diagnostics on startup
|
||||
|
||||
### Command-Line Interface
|
||||
|
||||
- **File**: `cli.py`
|
||||
- **Purpose**: Unified interface for all detection modes
|
||||
- **Features**:
|
||||
- Argument parsing for all modes
|
||||
- Mode-specific configuration
|
||||
- Results aggregation
|
||||
- Cost reporting
|
||||
|
||||
**Key command patterns:**
|
||||
```bash
|
||||
# Detection mode selection
|
||||
--hybrid / --openai / --vector / --gemini
|
||||
|
||||
# Processing scope
|
||||
--test / --limit N / --all / --specific-file FILE
|
||||
|
||||
# Hybrid-specific
|
||||
--panel-threshold N
|
||||
--split-simple / --split-advanced
|
||||
--vector-mode
|
||||
--fallback-one-at-a-time
|
||||
--parallel-layouts
|
||||
|
||||
# Cost tracking
|
||||
--enable-cost-tracking
|
||||
--cost-report
|
||||
--cost-estimate N
|
||||
```
|
||||
|
||||
## Key Algorithms
|
||||
|
||||
### Local Inlier Analysis (Hybrid Mode)
|
||||
|
||||
**Algorithm**: OpenCV AKAZE features + RANSAC homography estimation
|
||||
|
||||
**Process**:
|
||||
1. Detect AKAZE keypoints in layout and master images
|
||||
2. Match descriptors using brute-force matcher with Hamming distance
|
||||
3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
|
||||
4. Estimate homography using RANSAC (threshold: 7.0)
|
||||
5. Count inliers and calculate confidence
|
||||
|
||||
**Thresholds:**
|
||||
- `min_good_matches`: 10 (minimum matches before RANSAC)
|
||||
- `inlier_threshold`: 0.65 (relative to best match)
|
||||
- `inlier_ratio_threshold`: 0.4 (minimum inlier ratio)
|
||||
|
||||
**Confidence levels:**
|
||||
- High: ≥30 inliers, ≥50% ratio
|
||||
- Medium: ≥15 inliers, ≥30% ratio
|
||||
- Low: Below medium thresholds
|
||||
|
||||
**Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing)
|
||||
|
||||
### Vector Similarity Analysis
|
||||
|
||||
**Algorithm**: Cosine similarity on 1408-dimensional embeddings
|
||||
|
||||
**Process**:
|
||||
1. Generate embedding for layout using Vertex AI
|
||||
2. Compare against cached master embeddings
|
||||
3. Calculate cosine similarity for each master
|
||||
4. Filter by threshold (default: 0.75)
|
||||
5. Sort by similarity descending
|
||||
|
||||
**Formula**:
|
||||
```
|
||||
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
|
||||
```
|
||||
|
||||
**Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl`
|
||||
|
||||
### Panel Splitting (Canny Detection)
|
||||
|
||||
**Algorithm**: Multi-threshold Canny + Hough line transform
|
||||
|
||||
**Process**:
|
||||
1. Apply Canny edge detection at multiple thresholds:
|
||||
- (50, 150), (100, 200), (150, 250)
|
||||
2. Morphological closing with (3, 1) kernel
|
||||
3. Combine edge maps with maximum operation
|
||||
4. Hough line transform for horizontal lines:
|
||||
- Threshold: 1324
|
||||
- Min length: 3530
|
||||
- Max gap: 1059
|
||||
5. Filter for nearly horizontal lines (< 5% slope)
|
||||
6. Create panel bounds from separator positions
|
||||
|
||||
**Tuning**: Parameters specifically optimized for 14-panel detection accuracy
|
||||
|
||||
### CEN Refinement
|
||||
|
||||
**Algorithm**: Censorship-aware master image selection
|
||||
|
||||
**Process**:
|
||||
1. Detect if layout is censored (OpenAI analysis)
|
||||
2. For each detected CEN (censored) master:
|
||||
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
|
||||
- If layout is censored or no alternative: Keep CEN version
|
||||
3. Update results with refinement metadata
|
||||
|
||||
**Naming convention**: `*CEN*` in master ID indicates censored version
|
||||
|
||||
## Parallel Processing Architecture
|
||||
|
||||
### Serial Inlier Analysis Coordinator
|
||||
|
||||
**Problem**: Parallel inlier analysis causes memory exhaustion and crashes
|
||||
|
||||
**Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
Multiple Layout Workers (parallel)
|
||||
↓
|
||||
Task Queue
|
||||
↓
|
||||
Single Inlier Worker (serial)
|
||||
↓
|
||||
Results back to layout workers
|
||||
```
|
||||
|
||||
**Components:**
|
||||
- `InlierAnalysisCoordinator` - Manages serial execution
|
||||
- Task queue - Queues inlier analysis requests
|
||||
- Worker thread - Processes tasks one at a time
|
||||
- Futures - Async communication between layout and inlier workers
|
||||
|
||||
**Benefits:**
|
||||
- Prevents memory explosion from too many concurrent inlier analyses
|
||||
- Allows multiple layouts to be processed in parallel
|
||||
- Coordinates resource usage across system
|
||||
|
||||
### Dynamic Worker Adjustment
|
||||
|
||||
**Monitoring:**
|
||||
- Memory usage (RAM percentage)
|
||||
- Swap usage (swap percentage)
|
||||
- Queue size (backlog of inlier tasks)
|
||||
- Open file descriptors
|
||||
|
||||
**Adjustment triggers:**
|
||||
- Memory > 85%: Reduce workers
|
||||
- Swap > 95% AND Memory > 80%: Reduce workers
|
||||
- Queue size ≥ 3: Reduce layout workers (producers)
|
||||
- Memory < 75% AND Swap < 80%: Increase workers
|
||||
|
||||
**Auto-scaling:**
|
||||
- Layout workers: Start at min(4, CPU/2), adjust dynamically
|
||||
- Local workers: Start at CPU-2, adjust dynamically
|
||||
- OpenAI workers: Set to number of master images
|
||||
|
||||
## Important Implementation Details
|
||||
|
||||
### Multiprocessing Considerations
|
||||
|
||||
**Challenge**: Python multiprocessing requires pickleable functions
|
||||
|
||||
**Solutions:**
|
||||
1. `process_single_master_inlier_analysis()` - Standalone function (not class method)
|
||||
2. All imports inside function to ensure worker processes have dependencies
|
||||
3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)
|
||||
|
||||
**Memory safety:**
|
||||
- Feature limiting: Max 10,000-15,000 features per image
|
||||
- Dynamic worker reduction based on feature count
|
||||
- Forced garbage collection after processing
|
||||
|
||||
### Cost Tracking Integration
|
||||
|
||||
**Important**: Cost tracking is DISABLED by default
|
||||
|
||||
**Reason**: Avoid repetitive initialization messages from multiprocessing workers
|
||||
|
||||
**Integration points:**
|
||||
- `openai_detector.py`: After every OpenAI API call
|
||||
- `hybrid_detector.py`: Track all OpenAI operations
|
||||
- Results JSON: Cost breakdown per layout
|
||||
|
||||
**Data flow:**
|
||||
1. API call made → Extract token usage from response
|
||||
2. Call `cost_calculator.track_api_call()`
|
||||
3. Update session totals
|
||||
4. Generate reports on demand
|
||||
|
||||
### Error Handling Patterns
|
||||
|
||||
**OpenAI API errors:**
|
||||
```python
|
||||
try:
|
||||
response = openai_call()
|
||||
except Exception as e:
|
||||
# Automatic retry logic
|
||||
# Fallback to alternative method
|
||||
# Return error result dict
|
||||
```
|
||||
|
||||
**Memory errors:**
|
||||
```python
|
||||
try:
|
||||
result = memory_intensive_operation()
|
||||
except MemoryError:
|
||||
# Reduce worker count
|
||||
# Force garbage collection
|
||||
# Retry with lower concurrency
|
||||
```
|
||||
|
||||
**File descriptor exhaustion:**
|
||||
```python
|
||||
except OSError as e:
|
||||
if "Too many open files" in str(e):
|
||||
# Limit concurrent workers
|
||||
# Clean up temp files
|
||||
# Force resource release
|
||||
```
|
||||
|
||||
## File Organization
|
||||
|
||||
### Core Detection Files
|
||||
- `hybrid_detector.py` - Hybrid detection (2939 lines)
|
||||
- `openai_detector.py` - OpenAI detection
|
||||
- `vector_detector.py` - Vector similarity
|
||||
- `gemini_detector.py` - Gemini detection
|
||||
|
||||
### Panel Splitting Files
|
||||
- `panel_splitter.py` - Traditional multi-method
|
||||
- `advanced_splitter.py` - Edge detection
|
||||
- `simple_splitter.py` - Even division
|
||||
|
||||
### Supporting Files
|
||||
- `cost_calculator.py` - Cost tracking
|
||||
- `memory_manager.py` - Memory management
|
||||
- `logging_config.py` - Logging configuration
|
||||
- `cli.py` - Command-line interface
|
||||
|
||||
### Test Files
|
||||
- `test_hybrid.py` - Hybrid mode tests
|
||||
- `test_cost_calculator.py` - Cost tracking tests
|
||||
- `test_split_mode.py` - Panel splitting tests
|
||||
- `test_panel_accuracy.py` - Panel detection accuracy
|
||||
- Various tuning and debug scripts
|
||||
|
||||
### Data Directories
|
||||
- `master_images/` - 41 master images to detect
|
||||
- `layouts/` - 299+ layout images to process
|
||||
- `results/` - JSON output files
|
||||
- `embeddings_cache/` - Cached vector embeddings
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Adding New Features
|
||||
|
||||
1. **New Detection Mode:**
|
||||
- Create new detector class
|
||||
- Inherit from base detector if applicable
|
||||
- Implement `detect_images_in_layout()` method
|
||||
- Add CLI integration in `cli.py`
|
||||
- Update tests and documentation
|
||||
|
||||
2. **New Panel Splitting Method:**
|
||||
- Create new splitter class
|
||||
- Implement `split_panels(image_path, panel_count)` method
|
||||
- Return list of dicts with keys: `image`, `bounds`, `confidence`, `method`
|
||||
- Add CLI flag for selection
|
||||
- Test with various panel counts
|
||||
|
||||
3. **Cost Tracking for New API:**
|
||||
- Add extraction function for token usage
|
||||
- Track calls with `cost_calculator.track_api_call()`
|
||||
- Update operation types
|
||||
- Add to cost reports
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
1. **Unit tests** - Individual components
|
||||
2. **Integration tests** - Full detection pipeline
|
||||
3. **Performance tests** - Memory and speed benchmarks
|
||||
4. **Accuracy tests** - Panel detection accuracy
|
||||
5. **Cost tests** - Verify tracking accuracy
|
||||
|
||||
### Performance Optimization Tips
|
||||
|
||||
1. **Reduce API calls** - Primary cost driver
|
||||
2. **Cache embeddings** - Avoid regenerating
|
||||
3. **Limit features** - Prevent memory explosion
|
||||
4. **Use multiprocessing** - Parallel CPU work
|
||||
5. **Monitor memory** - Dynamic adjustment
|
||||
6. **Profile bottlenecks** - Optimize hot paths
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
1. **Multiprocessing pickle errors** - Use standalone functions
|
||||
2. **Memory exhaustion** - Limit concurrent workers
|
||||
3. **File descriptor limits** - Close files properly
|
||||
4. **Cost calculator in workers** - Keep in main process only
|
||||
5. **Swap as error condition** - Swap usage is OK, not error
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Environment Variables
|
||||
```
|
||||
OPENAI_API_KEY - OpenAI API authentication
|
||||
GOOGLE_API_KEY - Google AI API authentication
|
||||
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
|
||||
```
|
||||
|
||||
### Default Values
|
||||
```python
|
||||
# Hybrid mode
|
||||
panel_threshold = 2
|
||||
inlier_threshold = 0.65
|
||||
inlier_ratio_threshold = 0.4
|
||||
min_good_matches = 10
|
||||
similarity_threshold = 0.75 # vector mode
|
||||
|
||||
# Workers (auto-detected)
|
||||
openai_workers = len(master_images)
|
||||
local_workers = max(1, cpu_count - 2)
|
||||
layout_workers = min(4, cpu_count // 2)
|
||||
|
||||
# Memory management
|
||||
max_memory_percent = 75
|
||||
max_swap_percent = 80
|
||||
|
||||
# Cost tracking
|
||||
enable_tracking = False # Must explicitly enable
|
||||
```
|
||||
|
||||
## Output Format Specification
|
||||
|
||||
### Results JSON Structure
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"total_layouts_processed": int,
|
||||
"total_master_images": int,
|
||||
"master_images_available": [str],
|
||||
"provider": str,
|
||||
"model": str,
|
||||
"panel_threshold": int,
|
||||
"inlier_threshold": float,
|
||||
"processing_mode": str,
|
||||
"cost_tracking": {dict} | null
|
||||
},
|
||||
"results": {
|
||||
"layout_id": {
|
||||
"layout_filename": str,
|
||||
"detected_master_ids": [str],
|
||||
"detected_master_filenames": [str],
|
||||
"detection_method": str,
|
||||
"panel_count": int,
|
||||
"confidence_score": float,
|
||||
"panel_analysis": {dict},
|
||||
"censorship_analysis": {dict},
|
||||
"truncation_applied": bool,
|
||||
"deduplication_applied": bool,
|
||||
"cost_breakdown": {dict} | null
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
### Enable Debug Logging
|
||||
```python
|
||||
# In code
|
||||
import logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
# Via environment
|
||||
export LOG_LEVEL=DEBUG
|
||||
```
|
||||
|
||||
### Memory Issues
|
||||
```bash
|
||||
# Check current memory
|
||||
python check_system_resources.py
|
||||
|
||||
# Test with memory fix
|
||||
python test_memory_fix.py
|
||||
|
||||
# Run with reduced workers
|
||||
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
|
||||
```
|
||||
|
||||
### Cost Tracking Issues
|
||||
```bash
|
||||
# Verify cost tracking works
|
||||
python test_cost_calculator.py
|
||||
|
||||
# Test integration
|
||||
python test_cost_tracking_integration.py
|
||||
|
||||
# Run with tracking enabled
|
||||
python cli.py --test --hybrid --enable-cost-tracking
|
||||
```
|
||||
|
||||
### Panel Splitting Issues
|
||||
```bash
|
||||
# Test splitting accuracy
|
||||
python test_panel_accuracy.py
|
||||
|
||||
# Tune parameters
|
||||
python tune_14_panel_split.py
|
||||
|
||||
# Debug specific layout
|
||||
python test_6786505_cli.py
|
||||
```
|
||||
|
||||
## API Costs (Current Pricing)
|
||||
|
||||
### OpenAI O3 (2025)
|
||||
- Input tokens: $2.00 / million
|
||||
- Cached input: $0.50 / million
|
||||
- Output tokens: $8.00 / million
|
||||
|
||||
### Typical Usage
|
||||
- Hybrid mode: ~$0.01-0.02 per layout
|
||||
- OpenAI mode: ~$0.02-0.05 per layout
|
||||
- One-at-a-time: ~$0.50-1.00 per layout
|
||||
|
||||
### Cost Optimization
|
||||
- Hybrid mode: 97.6% reduction vs one-at-a-time
|
||||
- Caching: Reduces input token costs
|
||||
- Batch processing: Amortizes overhead
|
||||
|
||||
## Future Enhancement Ideas
|
||||
|
||||
1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration
|
||||
2. **Incremental processing** - Resume from saved progress
|
||||
3. **Web interface** - Browser-based detection and visualization
|
||||
4. **Active learning** - Use detection results to improve models
|
||||
5. **Custom training** - Fine-tune models on domain-specific data
|
||||
6. **Real-time processing** - Stream processing for live detection
|
||||
7. **Distributed processing** - Multi-machine coordination
|
||||
8. **Advanced caching** - Persistent result caching across runs
|
||||
|
||||
## Contact and Support
|
||||
|
||||
For questions or issues:
|
||||
1. Check logs in `master_adapt_detect_*.log`
|
||||
2. Review cost reports in `results/`
|
||||
3. Run diagnostic scripts
|
||||
4. Check system resources
|
||||
5. Review error messages carefully
|
||||
|
||||
## Version History
|
||||
|
||||
Current implementation includes:
|
||||
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
|
||||
- Three panel splitting strategies
|
||||
- Cost tracking and reporting
|
||||
- Memory management and safety
|
||||
- Parallel processing with coordination
|
||||
- Dynamic worker adjustment
|
||||
- Comprehensive logging and debugging
|
||||
- Extensive configuration options
|
||||
|
||||
Last major update: January 2025
|
||||
BIN
docs/master adapt detect technical overview.pdf
Normal file
BIN
docs/master adapt detect technical overview.pdf
Normal file
Binary file not shown.
1618
docs/master_adapt_detection_technical_overview.md
Normal file
1618
docs/master_adapt_detection_technical_overview.md
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Reference in a new issue