revised documentation, added technical overview

This commit is contained in:
michael 2025-10-01 16:02:40 -05:00
parent 69f2f4cbe9
commit 380020b8a2
4 changed files with 2666 additions and 67 deletions

521
README.md
View file

@ -1,118 +1,505 @@
# Master Image Detection Application
# Master Adapt Detect
This application uses Google Gemini 2.5 Pro API to detect which master images appear in layout images.
A sophisticated AI-powered image detection system that identifies master images within multi-panel layout images using multiple detection strategies, with advanced panel splitting and cost optimization features.
## Features
## Overview
- **Filename-based IDs**: Master images are identified by their filenames (without .jpg extension)
- **Comprehensive Detection**: Finds exact matches, cropped versions, scaled/rotated images
- **Detailed Results**: JSON output with layout filenames and detected master filenames
- **Optimized Processing**: Sequential processing with master images uploaded only once
- **Progress Tracking**: Real-time progress updates and periodic saves during batch processing
- **Error Handling**: Automatic retries and graceful error recovery
This application provides a flexible, multi-strategy approach to detecting which master images appear in layout images (such as marketing materials, comic/manga pages, or multi-panel graphics). It supports four detection modes:
## Setup
1. **Hybrid Mode** (Recommended) - Combines OpenAI O3 for panel analysis with local computer vision
2. **OpenAI Mode** - Full AI-powered detection using OpenAI O3 mini
3. **Vector Mode** - Google Vertex AI multimodal embeddings for similarity search
4. **Gemini Mode** - Google Gemini 2.5 Pro for visual analysis
1. **Install Dependencies**:
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
## Key Features
2. **Configure API Key**:
- API key is already set in `.env` file
- Ensure `.env` file exists with your Gemini API key
### Detection Capabilities
- **Multi-strategy detection** - Choose from 4 different detection engines
- **Panel counting** - Automatic detection of number of panels in layouts
- **Censorship detection** - Identifies censored vs uncensored content with CEN refinement
- **Smart matching** - Handles cropped, scaled, rotated, and transformed images
- **Confidence scoring** - Provides match confidence based on panel count and detected matches
### Hybrid Mode (Primary Feature)
- **Cost optimization** - 97.6% reduction in API costs vs one-at-a-time detection
- **Intelligent routing** - Uses local analysis for simple layouts (≤2 panels), split method for complex
- **Panel splitting** - Three splitting strategies: traditional, advanced edge detection, simple division
- **Local inlier analysis** - OpenCV AKAZE features with multiprocessing for fast matching
- **Vector similarity** - Optional Google Vertex AI embeddings for semantic matching
- **Fallback support** - Automatic fallback to OpenAI one-at-a-time when needed
### Processing Options
- **Parallel processing** - Concurrent layout processing with serial inlier analysis coordination
- **Memory management** - Dynamic worker adjustment based on system resources
- **Cost tracking** - Comprehensive OpenAI API usage and cost monitoring
- **Batch processing** - Process hundreds of layouts efficiently
- **Progress tracking** - Real-time progress updates with ETA
## Installation
### Prerequisites
- Python 3.8+
- OpenCV
- Google Cloud credentials (for Vector mode)
- OpenAI API key (for OpenAI/Hybrid modes)
- Google AI API key (for Gemini mode)
### Setup
```bash
# Clone the repository
git clone <repository-url>
cd master_adapt_detect
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# GOOGLE_API_KEY=your_google_ai_key
# GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
```
## Usage
Activate the virtual environment first:
### Command Line Interface
The main entry point is `cli.py` which provides a comprehensive CLI for all detection modes.
```bash
source venv/bin/activate
# Basic usage - hybrid mode with test
python cli.py --test --hybrid
# Process first 10 layouts in hybrid mode
python cli.py --limit 10 --hybrid
# Process all layouts with parallel processing
python cli.py --all --hybrid --parallel-layouts
# OpenAI mode with one-at-a-time comparison
python cli.py --limit 10 --openai --one-at-a-time
# Vector mode with similarity search
python cli.py --all --vector
# Enable cost tracking
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
```
### Command Line Options
### Detection Modes
#### Hybrid Mode (Recommended)
Best balance of speed, cost, and accuracy.
```bash
# Test with 1 layout
python image_detector.py --test
# Simple layouts (≤2 panels) use local analysis
python cli.py --all --hybrid --panel-threshold 2
# Process first 10 layouts
python image_detector.py --limit 10
# With panel splitting for complex layouts
python cli.py --all --hybrid --split-simple
# Process all layouts
python image_detector.py --all
# Advanced edge detection splitting
python cli.py --all --hybrid --split-advanced
# Custom output filename
python image_detector.py --limit 50 --output my_batch_results
# Vector similarity instead of inlier analysis
python cli.py --all --hybrid --vector-mode
# Process all layouts (sequential but optimized)
python image_detector.py --all
# Custom paths
python image_detector.py --all --master-path /path/to/masters --layout-path /path/to/layouts
# With fallback to OpenAI if needed
python cli.py --all --hybrid --fallback-one-at-a-time
```
### Help
#### OpenAI Mode
Full AI-powered detection with optional refinement.
```bash
python image_detector.py --help
# Standard mode (all masters in one API call)
python cli.py --limit 10 --openai
# One-at-a-time mode (one API call per master)
python cli.py --limit 10 --openai --one-at-a-time
# With CEN refinement for censorship handling
python cli.py --limit 10 --openai --cen-refinement
```
### Common Commands
#### Vector Mode
Semantic similarity using embeddings.
```bash
# Quick test
python image_detector.py --test
# Process with vector embeddings
python cli.py --all --vector
# Small batch
python image_detector.py --limit 10
# Adjust similarity threshold
python cli.py --all --vector --similarity-threshold 0.8
```
# Full processing (all 306 layouts) - optimized sequential
python image_detector.py --all
#### Gemini Mode
Google Gemini 2.5 Pro detection.
```bash
# Standard Gemini detection
python cli.py --limit 10 --gemini
```
### Key Options
**Detection Mode:**
- `--hybrid` - Hybrid detection mode (default)
- `--openai` - OpenAI detection mode
- `--vector` - Vector similarity mode
- `--gemini` - Gemini detection mode
**Processing:**
- `--test` - Test with 1 layout
- `--limit N` - Process first N layouts
- `--all` - Process all layouts
- `--specific-file FILE` - Process specific file
**Hybrid Options:**
- `--panel-threshold N` - Panel threshold for routing (default: 2)
- `--split-simple` - Use simple even division splitting
- `--split-advanced` - Use advanced edge detection splitting
- `--vector-mode` - Use vector similarity instead of inlier analysis
- `--fallback-one-at-a-time` - Enable OpenAI fallback
- `--parallel-layouts` - Enable parallel layout processing
- `--no-truncation` - Disable match truncation to panel count
**Cost Tracking:**
- `--enable-cost-tracking` - Enable cost tracking (disabled by default)
- `--cost-report` - Generate detailed cost report
- `--cost-estimate N` - Estimate monthly cost for N layouts
**Worker Configuration:**
- `--openai-workers N` - OpenAI worker count (default: auto)
- `--local-workers N` - Local analysis workers (default: auto)
- `--layout-workers N` - Parallel layout workers (default: auto)
**Other:**
- `--output NAME` - Custom output filename
- `--help` - Show all options
## Architecture
### Core Components
#### Detection Engines
1. **HybridImageDetector** (`hybrid_detector.py`)
- Main hybrid detection implementation
- Routes layouts based on panel count
- Integrates OpenAI, local analysis, and splitting
- Handles parallel processing coordination
2. **OpenAIImageDetector** (`openai_detector.py`)
- OpenAI O3 mini integration
- Panel counting and censorship detection
- One-at-a-time and batch detection modes
- CEN refinement for censored content
3. **VectorDetector** (`vector_detector.py`)
- Google Vertex AI multimodal embeddings
- Cosine similarity matching
- Embedding caching for performance
4. **GeminiDetector** (`gemini_detector.py`)
- Google Gemini 2.5 Pro integration
- Visual reasoning and analysis
#### Panel Splitting
1. **PanelSplitter** (`panel_splitter.py`)
- Multi-method panel splitting
- Optimized Canny edge detection
- Hough line transform for separators
- Tuned for 14-panel detection
2. **AdvancedPanelSplitter** (`advanced_splitter.py`)
- Edge detection and gutter analysis
- Sobel gradient detection
- Configurable percentile thresholds
3. **SimplePanelSplitter** (`simple_splitter.py`)
- Simple even division
- Fast horizontal splitting
- Grid layout support
#### Supporting Systems
1. **Cost Calculator** (`cost_calculator.py`)
- Tracks OpenAI API usage
- Per-layout and session cost tracking
- Monthly cost estimation
- Detailed JSON reports
2. **Memory Manager** (`memory_manager.py`)
- Prevents memory exhaustion
- Dynamic worker adjustment
- System resource monitoring
3. **Logging Config** (`logging_config.py`)
- Dual output (terminal + file)
- Crash tracking
- System diagnostics
4. **InlierAnalysisCoordinator** (in `hybrid_detector.py`)
- Serial execution of inlier analysis
- Task queue management
- Prevents system overload
### Workflow
#### Hybrid Mode Workflow
1. **OpenAI Analysis** (1 API call)
- Count panels in layout
- Detect censorship status
- Consolidated analysis
2. **Detection Routing**
- ≤ panel_threshold: Direct local/vector analysis
- > panel_threshold: Split + local/vector analysis
3. **Local Analysis** (no API calls)
- OpenCV AKAZE feature detection
- Multiprocessing for speed
- RANSAC homography estimation
- Inlier-based confidence scoring
4. **Post-Processing**
- CEN refinement (if enabled)
- Deduplication
- Truncation to panel count
- Confidence scoring
5. **Optional Fallback** (if enabled)
- Triggers when matches < panels
- OpenAI one-at-a-time detection
- Additional API calls only when needed
## Directory Structure
```
master_adapt_detect/
├── cli.py # Main command-line interface
├── hybrid_detector.py # Hybrid detection engine
├── openai_detector.py # OpenAI detection engine
├── vector_detector.py # Vector similarity engine
├── gemini_detector.py # Gemini detection engine
├── panel_splitter.py # Traditional panel splitter
├── advanced_splitter.py # Advanced edge detection splitter
├── simple_splitter.py # Simple even division splitter
├── cost_calculator.py # Cost tracking system
├── memory_manager.py # Memory management
├── logging_config.py # Logging configuration
├── requirements.txt # Python dependencies
├── .env # API keys (not in git)
├── master_images/ # Master images to detect (41 images)
├── layouts/ # Layout images to process (299+ images)
├── results/ # JSON output files
└── embeddings_cache/ # Cached vector embeddings
```
## Output Format
Results are saved as JSON with this structure:
Results are saved as JSON files with detailed metadata.
### Example Output
```json
{
"metadata": {
"total_layouts_processed": 1,
"total_layouts_processed": 10,
"total_master_images": 41,
"master_images_available": ["1011A_1011_05", "1011A_1011_06", ...]
"provider": "hybrid",
"model": "openai_o3_plus_local_analysis",
"panel_threshold": 2,
"processing_mode": "hybrid"
},
"results": {
"6814786": {
"layout_filename": "6814786.jpg",
"detected_master_ids": ["1011A_1011_05"],
"detected_master_filenames": ["1011A_1011_05.jpg"],
"analysis": "Detailed analysis of what was found..."
"detected_master_ids": ["1011A_1011_05", "1011A_1011_06"],
"detected_master_filenames": ["1011A_1011_05.jpg", "1011A_1011_06.jpg"],
"detection_method": "local_inlier_analysis",
"panel_count": 2,
"confidence_score": 100.0,
"panel_analysis": {
"panel_count": 2,
"confidence": "high"
},
"censorship_analysis": {
"is_censored": false,
"confidence": "high"
}
}
}
}
```
## Key Output Fields
## Cost Tracking
- **layout_filename**: The layout image filename
- **detected_master_ids**: Master image IDs (filenames without .jpg)
- **detected_master_filenames**: Full master image filenames with .jpg extension
- **analysis**: Gemini's detailed explanation of the detection
Cost tracking monitors OpenAI API usage and provides detailed reports.
## Directory Structure
### Enable Cost Tracking
```
├── master_images/ # 41 master images to detect
├── layouts/ # 299+ layout images to analyze
├── results/ # JSON output files
├── venv/ # Python virtual environment
├── image_detector.py # Main application
├── test_simple.py # API connection tester
├── requirements.txt # Dependencies
└── .env # API configuration
```bash
# Enable tracking
python cli.py --test --hybrid --enable-cost-tracking
# With detailed report
python cli.py --limit 10 --hybrid --enable-cost-tracking --cost-report
# With monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
```
## Example Results
### Cost Report Output
Layout `6814786.jpg` contains master image `1011A_1011_05.jpg` (cropped version).
- **Session summary** - Total cost, tokens, API calls
- **Per-layout breakdown** - Cost for each layout
- **Operation analysis** - Cost by operation type
- **Monthly estimates** - Projected monthly/annual costs
- **JSON reports** - Detailed cost data in `results/`
See `COST_TRACKING_README.md` for complete documentation.
## Performance
### Hybrid Mode Benefits
- **97.6% cost reduction** vs OpenAI one-at-a-time mode
- **1 API call per layout** for panel analysis
- **Zero API calls** for matching (local analysis)
- **Parallel processing** for throughput
- **Memory-safe** with dynamic adjustment
### Benchmarks
- **Simple layouts (≤2 panels)**: ~2-3 seconds per layout
- **Complex layouts (>2 panels)**: ~5-7 seconds per layout
- **Parallel mode**: ~50-100 layouts per minute (system dependent)
- **Memory usage**: Dynamic adjustment prevents exhaustion
## Advanced Features
### Parallel Layout Processing
Process multiple layouts concurrently with coordinated inlier analysis.
```bash
python cli.py --all --hybrid --parallel-layouts --layout-workers 4
```
### CEN Refinement
Automatically switch between censored (CEN) and uncensored versions.
```bash
python cli.py --all --hybrid --cen-refinement
```
### Custom Splitting Parameters
Fine-tune panel splitting behavior.
```bash
# Advanced splitter with custom thresholds
python cli.py --all --hybrid --split-advanced --percentile 15 --min-gap 10
# Adjust inlier thresholds
python cli.py --all --hybrid --inlier-threshold 0.7 --inlier-ratio-threshold 0.5
```
### Image Preprocessing
Enhance detection accuracy with preprocessing.
```bash
# Greyscale conversion
python cli.py --all --hybrid --enable-greyscale
# Contrast enhancement
python cli.py --all --hybrid --enable-contrast --contrast-factor 1.5
```
## Troubleshooting
### Common Issues
**"Cost tracking is disabled"**
- Add `--enable-cost-tracking` flag to enable cost monitoring
**"Memory usage too high"**
- System will auto-adjust workers
- Reduce `--local-workers` or `--layout-workers` manually
**"Too many open files"**
- Reduce concurrent workers
- System will auto-recover and limit workers
**"No matches found"**
- Try different detection modes
- Adjust inlier thresholds
- Enable fallback mode
### Memory Management
The system includes automatic memory management:
- Monitors RAM and swap usage
- Dynamically adjusts worker counts
- Prevents system crashes
- Logs resource usage
### Logging
All processing is logged to both terminal and file:
- Log files: `master_adapt_detect_TIMESTAMP.log`
- Includes system diagnostics
- Crash tracking with full traceback
- Resource usage at crash time
## Development
### Running Tests
```bash
# Test hybrid mode
python test_hybrid.py
# Test cost tracking
python test_cost_calculator.py
# Test panel splitting
python test_split_mode.py
```
### Adding New Detection Modes
1. Create new detector class inheriting from base
2. Implement required methods:
- `detect_images_in_layout()`
- `process_all_layouts()`
3. Add CLI integration in `cli.py`
4. Update documentation
## OpenAI Pricing (2025)
- **Input tokens**: $2.00 per million
- **Cached input**: $0.50 per million
- **Output tokens**: $8.00 per million
Hybrid mode achieves significant cost savings by minimizing API calls.
## License
[License information]
## Credits
Developed for master image detection in marketing materials, comics, manga, and multi-panel layouts.

594
claude.md Normal file
View file

@ -0,0 +1,594 @@
# Master Adapt Detect - Developer Documentation
## For AI Assistants and Developers
This document provides a comprehensive technical overview of the Master Adapt Detect codebase for AI assistants (like Claude) and developers working on the project.
## Project Purpose
Master Adapt Detect is a sophisticated image detection system designed to identify which master images appear in multi-panel layout images. It was originally developed for detecting comic/manga page layouts in marketing materials but is generalizable to any multi-panel image detection task.
## Core Architecture
### System Design Philosophy
The system follows a **multi-strategy detection** approach with these design principles:
1. **Cost Optimization** - Minimize API costs while maintaining accuracy
2. **Flexibility** - Support multiple detection engines for different use cases
3. **Performance** - Parallel processing with memory management
4. **Robustness** - Automatic fallbacks and error recovery
### Detection Modes
The system provides 4 detection modes, each with specific use cases:
#### 1. Hybrid Mode (Primary/Recommended)
- **File**: `hybrid_detector.py` (2939 lines)
- **Purpose**: Balance speed, cost, and accuracy
- **Strategy**: OpenAI O3 for panel analysis + local CV for matching
- **Cost**: ~1 API call per layout (97.6% reduction vs one-at-a-time)
**How it works:**
1. Single OpenAI API call to count panels and detect censorship
2. Route based on panel count:
- ≤ threshold: Direct local inlier analysis
- > threshold: Split layout first, then inlier analysis on each panel
3. Post-process with deduplication, CEN refinement, truncation
4. Optional fallback to OpenAI one-at-a-time if insufficient matches
**Key classes:**
- `HybridImageDetector` - Main orchestrator
- `InlierAnalysisCoordinator` - Serial execution coordinator for parallel mode
- `ProgressTracker` - Thread-safe progress monitoring
#### 2. OpenAI Mode
- **File**: `openai_detector.py`
- **Purpose**: Pure AI-powered detection
- **Strategy**: GPT-4 vision for direct image comparison
- **Cost**: 1-41 API calls per layout depending on mode
**Modes:**
- Standard: All masters in one API call
- One-at-a-time: Separate API call per master (expensive but thorough)
#### 3. Vector Mode
- **File**: `vector_detector.py`
- **Purpose**: Semantic similarity matching
- **Strategy**: Google Vertex AI multimodal embeddings (1408 dimensions)
- **Cost**: No OpenAI costs, uses Google Cloud
**Features:**
- Embedding caching for performance
- Cosine similarity matching
- Threshold-based filtering
#### 4. Gemini Mode
- **File**: `gemini_detector.py`
- **Purpose**: Alternative AI detection
- **Strategy**: Google Gemini 2.5 Pro visual reasoning
- **Cost**: Google AI API (not OpenAI)
### Panel Splitting Strategies
The system provides 3 panel splitting approaches for complex multi-panel layouts:
#### 1. Traditional Multi-Method Splitter
- **File**: `panel_splitter.py` (857 lines)
- **Strategy**: Optimized Canny edge detection + Hough transform
- **Tuning**: Specifically tuned for 14-panel detection
- **Parameters**: Thresholds, kernel sizes, line detection params
#### 2. Advanced Edge Detection Splitter
- **File**: `advanced_splitter.py` (200+ lines)
- **Strategy**: Sobel gradient analysis + gutter detection
- **Parameters**:
- `percentile`: Low-energy column threshold (default: 10)
- `min_gap`: Minimum gutter width (default: 5)
#### 3. Simple Even Division Splitter
- **File**: `simple_splitter.py` (132 lines)
- **Strategy**: Equal division based on panel count
- **Use case**: Fast processing when layout is regular grid
### Supporting Systems
#### Cost Calculator
- **File**: `cost_calculator.py` (440 lines)
- **Purpose**: Track OpenAI API usage and costs
- **Features**:
- Per-layout cost breakdown
- Session summaries
- Monthly estimation
- JSON report generation
- **Important**: Disabled by default, requires `--enable-cost-tracking` flag
**Data structures:**
- `TokenUsage` - Track token counts for single API call
- `ApiCallCost` - Cost info for single API call
- `LayoutCostSummary` - Aggregated cost for one layout
- `CostCalculator` - Main tracking class
#### Memory Manager
- **File**: `memory_manager.py` (119 lines)
- **Purpose**: Prevent system crashes from memory exhaustion
- **Features**:
- RAM and swap monitoring
- Dynamic worker adjustment
- Safe execution decorators
- Feature count limiting
**Thresholds:**
- Max memory: 80% (configurable)
- Max swap: 80% (warning only, doesn't throttle)
#### Logging Configuration
- **File**: `logging_config.py` (128 lines)
- **Purpose**: Dual output (terminal + file) for debugging crashes
- **Features**:
- Timestamped log files
- Exception tracking with resource usage
- System diagnostics on startup
### Command-Line Interface
- **File**: `cli.py`
- **Purpose**: Unified interface for all detection modes
- **Features**:
- Argument parsing for all modes
- Mode-specific configuration
- Results aggregation
- Cost reporting
**Key command patterns:**
```bash
# Detection mode selection
--hybrid / --openai / --vector / --gemini
# Processing scope
--test / --limit N / --all / --specific-file FILE
# Hybrid-specific
--panel-threshold N
--split-simple / --split-advanced
--vector-mode
--fallback-one-at-a-time
--parallel-layouts
# Cost tracking
--enable-cost-tracking
--cost-report
--cost-estimate N
```
## Key Algorithms
### Local Inlier Analysis (Hybrid Mode)
**Algorithm**: OpenCV AKAZE features + RANSAC homography estimation
**Process**:
1. Detect AKAZE keypoints in layout and master images
2. Match descriptors using brute-force matcher with Hamming distance
3. Apply Lowe's ratio test (threshold: 0.80) to filter good matches
4. Estimate homography using RANSAC (threshold: 7.0)
5. Count inliers and calculate confidence
**Thresholds:**
- `min_good_matches`: 10 (minimum matches before RANSAC)
- `inlier_threshold`: 0.65 (relative to best match)
- `inlier_ratio_threshold`: 0.4 (minimum inlier ratio)
**Confidence levels:**
- High: ≥30 inliers, ≥50% ratio
- Medium: ≥15 inliers, ≥30% ratio
- Low: Below medium thresholds
**Implementation**: `process_single_master_inlier_analysis()` function (standalone for multiprocessing)
### Vector Similarity Analysis
**Algorithm**: Cosine similarity on 1408-dimensional embeddings
**Process**:
1. Generate embedding for layout using Vertex AI
2. Compare against cached master embeddings
3. Calculate cosine similarity for each master
4. Filter by threshold (default: 0.75)
5. Sort by similarity descending
**Formula**:
```
similarity = dot(emb1, emb2) / (norm(emb1) * norm(emb2))
```
**Caching**: Embeddings stored in `embeddings_cache/master_embeddings.pkl`
### Panel Splitting (Canny Detection)
**Algorithm**: Multi-threshold Canny + Hough line transform
**Process**:
1. Apply Canny edge detection at multiple thresholds:
- (50, 150), (100, 200), (150, 250)
2. Morphological closing with (3, 1) kernel
3. Combine edge maps with maximum operation
4. Hough line transform for horizontal lines:
- Threshold: 1324
- Min length: 3530
- Max gap: 1059
5. Filter for nearly horizontal lines (< 5% slope)
6. Create panel bounds from separator positions
**Tuning**: Parameters specifically optimized for 14-panel detection accuracy
### CEN Refinement
**Algorithm**: Censorship-aware master image selection
**Process**:
1. Detect if layout is censored (OpenAI analysis)
2. For each detected CEN (censored) master:
- If layout is uncensored and non-CEN version exists: Switch to non-CEN
- If layout is censored or no alternative: Keep CEN version
3. Update results with refinement metadata
**Naming convention**: `*CEN*` in master ID indicates censored version
## Parallel Processing Architecture
### Serial Inlier Analysis Coordinator
**Problem**: Parallel inlier analysis causes memory exhaustion and crashes
**Solution**: InlierAnalysisCoordinator provides serial execution while allowing parallel layout processing
**Architecture:**
```
Multiple Layout Workers (parallel)
Task Queue
Single Inlier Worker (serial)
Results back to layout workers
```
**Components:**
- `InlierAnalysisCoordinator` - Manages serial execution
- Task queue - Queues inlier analysis requests
- Worker thread - Processes tasks one at a time
- Futures - Async communication between layout and inlier workers
**Benefits:**
- Prevents memory explosion from too many concurrent inlier analyses
- Allows multiple layouts to be processed in parallel
- Coordinates resource usage across system
### Dynamic Worker Adjustment
**Monitoring:**
- Memory usage (RAM percentage)
- Swap usage (swap percentage)
- Queue size (backlog of inlier tasks)
- Open file descriptors
**Adjustment triggers:**
- Memory > 85%: Reduce workers
- Swap > 95% AND Memory > 80%: Reduce workers
- Queue size ≥ 3: Reduce layout workers (producers)
- Memory < 75% AND Swap < 80%: Increase workers
**Auto-scaling:**
- Layout workers: Start at min(4, CPU/2), adjust dynamically
- Local workers: Start at CPU-2, adjust dynamically
- OpenAI workers: Set to number of master images
## Important Implementation Details
### Multiprocessing Considerations
**Challenge**: Python multiprocessing requires pickleable functions
**Solutions:**
1. `process_single_master_inlier_analysis()` - Standalone function (not class method)
2. All imports inside function to ensure worker processes have dependencies
3. Cost calculator NOT imported in multiprocessing functions (causes pickle errors)
**Memory safety:**
- Feature limiting: Max 10,000-15,000 features per image
- Dynamic worker reduction based on feature count
- Forced garbage collection after processing
### Cost Tracking Integration
**Important**: Cost tracking is DISABLED by default
**Reason**: Avoid repetitive initialization messages from multiprocessing workers
**Integration points:**
- `openai_detector.py`: After every OpenAI API call
- `hybrid_detector.py`: Track all OpenAI operations
- Results JSON: Cost breakdown per layout
**Data flow:**
1. API call made → Extract token usage from response
2. Call `cost_calculator.track_api_call()`
3. Update session totals
4. Generate reports on demand
### Error Handling Patterns
**OpenAI API errors:**
```python
try:
response = openai_call()
except Exception as e:
# Automatic retry logic
# Fallback to alternative method
# Return error result dict
```
**Memory errors:**
```python
try:
result = memory_intensive_operation()
except MemoryError:
# Reduce worker count
# Force garbage collection
# Retry with lower concurrency
```
**File descriptor exhaustion:**
```python
except OSError as e:
if "Too many open files" in str(e):
# Limit concurrent workers
# Clean up temp files
# Force resource release
```
## File Organization
### Core Detection Files
- `hybrid_detector.py` - Hybrid detection (2939 lines)
- `openai_detector.py` - OpenAI detection
- `vector_detector.py` - Vector similarity
- `gemini_detector.py` - Gemini detection
### Panel Splitting Files
- `panel_splitter.py` - Traditional multi-method
- `advanced_splitter.py` - Edge detection
- `simple_splitter.py` - Even division
### Supporting Files
- `cost_calculator.py` - Cost tracking
- `memory_manager.py` - Memory management
- `logging_config.py` - Logging configuration
- `cli.py` - Command-line interface
### Test Files
- `test_hybrid.py` - Hybrid mode tests
- `test_cost_calculator.py` - Cost tracking tests
- `test_split_mode.py` - Panel splitting tests
- `test_panel_accuracy.py` - Panel detection accuracy
- Various tuning and debug scripts
### Data Directories
- `master_images/` - 41 master images to detect
- `layouts/` - 299+ layout images to process
- `results/` - JSON output files
- `embeddings_cache/` - Cached vector embeddings
## Development Guidelines
### Adding New Features
1. **New Detection Mode:**
- Create new detector class
- Inherit from base detector if applicable
- Implement `detect_images_in_layout()` method
- Add CLI integration in `cli.py`
- Update tests and documentation
2. **New Panel Splitting Method:**
- Create new splitter class
- Implement `split_panels(image_path, panel_count)` method
- Return list of dicts with keys: `image`, `bounds`, `confidence`, `method`
- Add CLI flag for selection
- Test with various panel counts
3. **Cost Tracking for New API:**
- Add extraction function for token usage
- Track calls with `cost_calculator.track_api_call()`
- Update operation types
- Add to cost reports
### Testing Strategy
1. **Unit tests** - Individual components
2. **Integration tests** - Full detection pipeline
3. **Performance tests** - Memory and speed benchmarks
4. **Accuracy tests** - Panel detection accuracy
5. **Cost tests** - Verify tracking accuracy
### Performance Optimization Tips
1. **Reduce API calls** - Primary cost driver
2. **Cache embeddings** - Avoid regenerating
3. **Limit features** - Prevent memory explosion
4. **Use multiprocessing** - Parallel CPU work
5. **Monitor memory** - Dynamic adjustment
6. **Profile bottlenecks** - Optimize hot paths
### Common Pitfalls
1. **Multiprocessing pickle errors** - Use standalone functions
2. **Memory exhaustion** - Limit concurrent workers
3. **File descriptor limits** - Close files properly
4. **Cost calculator in workers** - Keep in main process only
5. **Swap as error condition** - Swap usage is OK, not error
## Configuration Reference
### Environment Variables
```
OPENAI_API_KEY - OpenAI API authentication
GOOGLE_API_KEY - Google AI API authentication
GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account JSON
```
### Default Values
```python
# Hybrid mode
panel_threshold = 2
inlier_threshold = 0.65
inlier_ratio_threshold = 0.4
min_good_matches = 10
similarity_threshold = 0.75 # vector mode
# Workers (auto-detected)
openai_workers = len(master_images)
local_workers = max(1, cpu_count - 2)
layout_workers = min(4, cpu_count // 2)
# Memory management
max_memory_percent = 75
max_swap_percent = 80
# Cost tracking
enable_tracking = False # Must explicitly enable
```
## Output Format Specification
### Results JSON Structure
```json
{
"metadata": {
"total_layouts_processed": int,
"total_master_images": int,
"master_images_available": [str],
"provider": str,
"model": str,
"panel_threshold": int,
"inlier_threshold": float,
"processing_mode": str,
"cost_tracking": {dict} | null
},
"results": {
"layout_id": {
"layout_filename": str,
"detected_master_ids": [str],
"detected_master_filenames": [str],
"detection_method": str,
"panel_count": int,
"confidence_score": float,
"panel_analysis": {dict},
"censorship_analysis": {dict},
"truncation_applied": bool,
"deduplication_applied": bool,
"cost_breakdown": {dict} | null
}
}
}
```
## Debugging Tips
### Enable Debug Logging
```python
# In code
import logging
logging.basicConfig(level=logging.DEBUG)
# Via environment
export LOG_LEVEL=DEBUG
```
### Memory Issues
```bash
# Check current memory
python check_system_resources.py
# Test with memory fix
python test_memory_fix.py
# Run with reduced workers
python cli.py --all --hybrid --local-workers 1 --layout-workers 1
```
### Cost Tracking Issues
```bash
# Verify cost tracking works
python test_cost_calculator.py
# Test integration
python test_cost_tracking_integration.py
# Run with tracking enabled
python cli.py --test --hybrid --enable-cost-tracking
```
### Panel Splitting Issues
```bash
# Test splitting accuracy
python test_panel_accuracy.py
# Tune parameters
python tune_14_panel_split.py
# Debug specific layout
python test_6786505_cli.py
```
## API Costs (Current Pricing)
### OpenAI O3 (2025)
- Input tokens: $2.00 / million
- Cached input: $0.50 / million
- Output tokens: $8.00 / million
### Typical Usage
- Hybrid mode: ~$0.01-0.02 per layout
- OpenAI mode: ~$0.02-0.05 per layout
- One-at-a-time: ~$0.50-1.00 per layout
### Cost Optimization
- Hybrid mode: 97.6% reduction vs one-at-a-time
- Caching: Reduces input token costs
- Batch processing: Amortizes overhead
## Future Enhancement Ideas
1. **Multi-GPU support** - Parallel inlier analysis with GPU acceleration
2. **Incremental processing** - Resume from saved progress
3. **Web interface** - Browser-based detection and visualization
4. **Active learning** - Use detection results to improve models
5. **Custom training** - Fine-tune models on domain-specific data
6. **Real-time processing** - Stream processing for live detection
7. **Distributed processing** - Multi-machine coordination
8. **Advanced caching** - Persistent result caching across runs
## Contact and Support
For questions or issues:
1. Check logs in `master_adapt_detect_*.log`
2. Review cost reports in `results/`
3. Run diagnostic scripts
4. Check system resources
5. Review error messages carefully
## Version History
Current implementation includes:
- Multiple detection modes (Hybrid, OpenAI, Vector, Gemini)
- Three panel splitting strategies
- Cost tracking and reporting
- Memory management and safety
- Parallel processing with coordination
- Dynamic worker adjustment
- Comprehensive logging and debugging
- Extensive configuration options
Last major update: January 2025

Binary file not shown.

File diff suppressed because it is too large Load diff