initial commit

This commit is contained in:
michael 2025-10-01 14:32:55 -05:00
commit 69f2f4cbe9
41 changed files with 14950 additions and 0 deletions

139
.gitignore vendored Normal file
View file

@ -0,0 +1,139 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Virtual Environment
venv/
env/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# Environment variables and secrets
.env
.env.local
.env.*.local
service-account.json
# Project specific - Debug outputs
debug_*/
crops_debug/
debug_improved/
debug_splitting/
debug_canny_params.py
test_crops/
panel_test_crops/
improved_crops/
# Project specific - Results and reports
results/
reports/
tuning_results_*.json
# Project specific - Test outputs
test_splits/
test_layouts/
# Project specific - Temporary files
temp_*/
unused-layouts/
# Project specific - Data files
layouts/
layouts.zip
layouts_and_masters.zip
master_images/
# Project specific - Cache
embeddings_cache/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# pipenv
Pipfile.lock
# PEP 582
__pypackages__/
# Celery
celerybeat-schedule
celerybeat.pid
# SageMath
*.sage.py
# Environments
.spyderproject
.spyproject
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre
.pyre/
# pytest
.pytest_cache/
.tox/
# Coverage
htmlcov/
.coverage
.coverage.*
coverage.xml
*.cover
.hypothesis/
# macOS
.DS_Store
.AppleDouble
.LSOverride
._*
# Windows
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db
Desktop.ini
$RECYCLE.BIN/
# Linux
*~
.directory
.Trash-*

310
COST_TRACKING_README.md Normal file
View file

@ -0,0 +1,310 @@
# Cost Tracking for Master Image Detection
This document describes the cost tracking features added to the master image detection application to monitor and report OpenAI API usage costs.
## Overview
The cost tracking system provides comprehensive monitoring of OpenAI o3 API usage, including:
- **Real-time cost calculation** for all API calls
- **Per-layout cost breakdown** with detailed token usage
- **Session summaries** with totals and averages
- **Monthly cost estimation** based on usage patterns
- **Detailed cost reports** in JSON format
- **Integration with all detection modes** (Gemini, OpenAI, Vector, Hybrid)
## Current OpenAI o3 Pricing
- **Input tokens**: $2.00 per million tokens
- **Cached input tokens**: $0.50 per million tokens
- **Output tokens**: $8.00 per million tokens
## CLI Usage
### Enable Cost Tracking
**Important**: Cost tracking is **disabled by default**. You must use the `--enable-cost-tracking` flag to enable it.
```bash
# Enable cost tracking for any detection mode
python cli.py --test --hybrid --enable-cost-tracking
# Enable cost tracking with detailed report generation
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report
# Enable cost tracking with monthly cost estimation
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
```
### What "tracking: disabled" means
If you see "Cost Calculator initialized (tracking: disabled)" in the logs, it means:
1. **Cost tracking is turned off** - no costs are being calculated or stored
2. **You need to add the `--enable-cost-tracking` flag** to enable cost monitoring
3. **API calls are still being made** but their costs aren't being tracked
### Why the repetitive initialization messages?
The cost calculator may be initialized multiple times due to:
1. **Multiprocessing workers** - Each worker process loads the module
2. **Normal behavior** - This doesn't affect functionality
3. **Only main process shows full details** - Worker processes show minimal output
### Cost Tracking Options
- `--enable-cost-tracking`: Enable detailed cost tracking and real-time reporting
- `--cost-report`: Generate detailed JSON cost report after processing
- `--cost-estimate N`: Show monthly cost estimate based on N layouts per month
### Example Usage
```bash
# Test hybrid mode with cost tracking enabled
python cli.py --test --hybrid --enable-cost-tracking
# Process 10 layouts with OpenAI and generate cost report
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report
# Full hybrid run with cost tracking and monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300
```
## Cost Tracking Features
### 1. Real-time Cost Monitoring
- Tracks every OpenAI API call with token usage and cost
- Displays running totals during processing
- Shows cost per layout in progress updates
### 2. Detailed Cost Breakdown
Each processed layout includes cost information:
```json
{
"layout_filename": "example.jpg",
"detected_master_ids": ["1011A_1011_05"],
"cost_breakdown": {
"total_cost": 0.0234,
"cost_breakdown": {
"input_tokens": 1500,
"output_tokens": 800,
"cached_tokens": 200,
"api_calls_made": 1,
"operation_types": ["panel_counting_censorship"]
}
}
}
```
### 3. Session Summary
Displays comprehensive cost statistics:
```
COST TRACKING SUMMARY
============================================================
Total cost: $2.4567
Total tokens: 145,678
- Input tokens: 98,456
- Output tokens: 47,222
- Cached tokens: 12,345
Total API calls: 156
Layouts processed: 150
Averages:
- Cost per layout: $0.0164
- Tokens per layout: 971.2
- API calls per layout: 1.0
- Cost per 1K tokens: $0.0169
Operation breakdown:
- panel_counting_censorship: 150 calls
- detection: 0 calls
- one_at_a_time_detection: 0 calls
============================================================
```
### 4. Monthly Cost Estimation
Estimates monthly costs based on current usage patterns:
```
MONTHLY COST ESTIMATE
Based on 150 processed layouts:
Average cost per layout: $0.0164
Estimated monthly cost (300 layouts): $4.92
Estimated annual cost: $59.04
```
### 5. Cost Reports
Generates detailed JSON reports saved to `results/cost_report_[timestamp].json`:
```json
{
"session_summary": {
"total_cost": 2.4567,
"total_input_tokens": 98456,
"total_output_tokens": 47222,
"layouts_processed": 150
},
"layout_costs": {
"example.jpg": {
"total_cost": 0.0234,
"total_input_tokens": 1500,
"total_output_tokens": 800,
"api_calls_made": 1
}
},
"detailed_api_calls": [
{
"operation_type": "panel_counting_censorship",
"timestamp": "2025-01-15T10:30:45.123456",
"token_usage": {
"prompt_tokens": 1500,
"completion_tokens": 800,
"total_tokens": 2300,
"cached_tokens": 200
},
"total_cost": 0.0234,
"layout_name": "example.jpg"
}
]
}
```
## Integration with Detection Modes
### Hybrid Mode (Primary Focus)
Cost tracking is fully integrated with hybrid mode:
- **Panel counting + censorship detection**: 1 API call per layout
- **Local inlier analysis**: No API calls (zero cost)
- **Vector similarity**: No API calls (zero cost)
- **Fallback to OpenAI**: Additional API calls when needed
### OpenAI Mode
Tracks all OpenAI API usage patterns:
- **Regular detection**: 1 API call per layout (all masters compared)
- **One-at-a-time mode**: 41 API calls per layout (one per master)
- **Censorship detection**: Additional API calls for CEN refinement
### Vector Mode
No API costs (uses Google Vertex AI, not OpenAI)
### Gemini Mode
No API costs (uses Google Gemini, not OpenAI)
## Operation Types Tracked
The system tracks different types of API operations:
1. **`panel_counting_censorship`**: Combined panel counting and censorship detection
2. **`detection`**: Main master image detection
3. **`censorship_detection`**: Standalone censorship analysis
4. **`one_at_a_time_detection`**: Individual master comparisons
## Cost Optimization Benefits
The cost tracking system helps identify optimization opportunities:
### Hybrid Mode Savings
Hybrid mode significantly reduces costs compared to one-at-a-time processing:
- **One-at-a-time mode**: 41 API calls per layout
- **Hybrid mode**: 1 API call per layout (97.6% reduction)
- **Estimated savings**: Shows percentage savings in session summary
### Usage Pattern Analysis
- Identify expensive operations
- Track token efficiency by operation type
- Monitor cost per detected master
- Analyze cost trends over time
## Testing
Run the cost calculator tests to verify functionality:
```bash
python test_cost_calculator.py
```
This will test all cost tracking features including:
- Basic cost calculation
- API call tracking
- Layout cost breakdown
- Session summaries
- Monthly cost estimation
- Cost report generation
## Technical Implementation
### Core Components
1. **`cost_calculator.py`**: Main cost tracking module
2. **Token extraction**: Automatic token usage extraction from API responses
3. **Integration points**: All OpenAI API calls instrumented
4. **Data structures**: Efficient tracking of costs and token usage
### Integration Points
Cost tracking is integrated at these key locations:
- `openai_detector.py`: All OpenAI API calls
- `hybrid_detector.py`: Hybrid mode processing
- `cli.py`: Command-line interface and reporting
- Results JSON: Cost breakdowns included in output
### Error Handling
- Graceful degradation when API responses lack usage data
- Optional feature (disabled by default)
- No impact on existing functionality when disabled
## Future Enhancements
Potential future improvements include:
- **Cost budgeting**: Set spending limits and alerts
- **Historical tracking**: Long-term cost trend analysis
- **Token optimization**: Automatic prompt and image optimization
- **Multi-provider support**: Track costs across different AI providers
- **Real-time alerts**: Notifications when costs exceed thresholds
## Troubleshooting
### Common Issues
1. **"No cost data available"**: Cost tracking is disabled (use `--enable-cost-tracking`)
2. **"API usage data missing"**: OpenAI response lacks usage information
3. **"Cost report empty"**: No API calls were made during processing
### Debugging
Enable cost tracking and run a test:
```bash
python cli.py --test --hybrid --enable-cost-tracking
```
This will show real-time cost information and help identify any issues.
## Support
For questions or issues with cost tracking:
1. Check the session summary output for diagnostic information
2. Review the cost report JSON for detailed API call information
3. Run the test suite to verify functionality
4. Ensure OpenAI API responses include usage data

103
MEMORY_FIX_SUMMARY.md Normal file
View file

@ -0,0 +1,103 @@
# Memory Management Fix Summary
## Problem Analysis
The application was crashing due to memory exhaustion when processing images with high feature counts (64,509 features detected). The issue occurred in the hybrid detector's local inlier analysis when 14 concurrent processes were trying to process 41 masters simultaneously, causing massive memory usage and swap thrashing.
## Root Cause
- **High feature count**: 64,509 features in layout image
- **Concurrent processing**: 14 processes × 41 masters = 574 concurrent operations
- **Memory multiplication**: Each process holding large feature sets in memory
- **No memory limits**: No safeguards against memory exhaustion
## Solutions Implemented
### 1. Memory Manager (`memory_manager.py`)
- **Real-time monitoring**: Tracks memory and swap usage percentages
- **Safety checks**: Prevents execution when memory > 80% (swap usage only warns, does not block)
- **Dynamic process limiting**: Adjusts worker count based on available memory
- **Memory-safe execution decorator**: Ensures functions run only when memory is safe
### 2. Feature Limiting
- **Maximum features per image**: Limited to 10,000 features max
- **Smart reduction**: Keeps best features based on response strength
- **Dynamic adjustment**: Reduces features based on total count (e.g., 64K → 32K → 10K)
### 3. Dynamic Worker Adjustment
- **Feature-based scaling**:
- >50,000 features: workers ÷ 2
- >30,000 features: workers × 0.75
- <30,000 features: normal workers
- **Memory-based limiting**: Further reduces based on available memory
- **Conservative defaults**: Assumes 2GB per process for safety
### 4. Enhanced Monitoring
- **Progress with memory**: Shows memory usage every 10 completed masters
- **Early warnings**: Alerts when memory > 80% or swap > 20%
- **Detailed crash logging**: Logs system and process memory at crash time
### 5. Memory Cleanup
- **Forced garbage collection**: Runs `gc.collect()` after processing
- **Process isolation**: Each master processed in separate process
- **Resource cleanup**: Proper cleanup of temporary files and objects
## Key Changes Made
### hybrid_detector.py
- Added memory manager initialization
- Modified `process_single_master_inlier_analysis()` to limit features
- Updated `detect_with_local_inlier_analysis()` for dynamic worker adjustment
- Added memory monitoring during processing
- Added memory cleanup after processing
### memory_manager.py (NEW)
- `MemoryManager` class for monitoring and control
- `memory_safe_execution` decorator
- `reduce_feature_count()` function for feature limiting
- Dynamic process count calculation
### logging_config.py
- Enhanced crash logging with system memory details
- Added memory warning logging function
- Improved resource usage reporting
## Memory Protection Features
### Before Processing
- Check if memory usage is safe (< 75%)
- Wait for memory to return to safe levels if needed
- Dynamically adjust worker count based on available memory
### During Processing
- Monitor memory usage every 10 completed masters
- Log warnings when memory > 80% or swap > 20%
- Limit features to prevent memory explosion
### After Processing
- Force garbage collection to free memory
- Clean up temporary files and objects
- Log final memory usage
## Expected Results
- **No more crashes**: Memory usage stays within safe limits
- **Better performance**: Reduced memory pressure = less swap usage
- **Graceful degradation**: Automatically reduces parallelism when needed
- **Better monitoring**: Real-time memory usage reporting
## Usage
The fixes are automatically applied when using the hybrid detector. No changes needed to command line usage:
```bash
python cli.py --all --hybrid # Will now use memory-safe processing
```
## Testing
Run the test suite to verify fixes:
```bash
python test_memory_fix.py
```
## Memory Thresholds
- **Maximum memory**: 75% (was unlimited)
- **Maximum swap**: 30% (was unlimited)
- **Feature limit**: 10,000 per image (was unlimited)
- **Dynamic workers**: Based on feature count and memory availability

View file

@ -0,0 +1,253 @@
# One-at-a-Time Cost Tracking Implementation
This document describes the implementation of detailed cost tracking for the one-at-a-time detection mode, which makes individual API calls for each master image.
## Implementation Overview
The one-at-a-time mode now tracks the cost of each individual API call made to the OpenAI o3 model, providing detailed insights into the cost structure of this high-accuracy detection method.
## Key Features Implemented
### 1. **Token Usage Extraction in Multiprocessing**
Each worker process now extracts token usage data from the OpenAI API response:
```python
# In process_single_master_detection_openai()
token_usage_data = None
if hasattr(response, 'usage') and response.usage:
token_usage_data = {
'prompt_tokens': response.usage.prompt_tokens,
'completion_tokens': response.usage.completion_tokens,
'total_tokens': response.usage.total_tokens,
'cached_tokens': getattr(response.usage, 'cached_tokens', 0)
}
# Include in return value
result['token_usage'] = token_usage_data
```
### 2. **Cost Tracking in Main Process**
The main process collects token usage data from all worker processes and tracks costs:
```python
# Track cost for this API call if token usage data is available
if 'token_usage' in result and result['token_usage']:
token_data = result['token_usage']
api_call_cost = cost_calculator.track_api_call(
operation_type="one_at_a_time_detection",
prompt_tokens=token_data['prompt_tokens'],
completion_tokens=token_data['completion_tokens'],
cached_tokens=token_data['cached_tokens'],
layout_name=layout_name,
master_id=master_id
)
```
### 3. **Real-time Cost Progress**
During processing, the system shows cost progress every 10 completed masters:
```
Processing 10/41 masters...
→ API call cost: $0.0234 (Running total: $0.2340)
Processing 20/41 masters...
→ API call cost: $0.0198 (Running total: $0.4538)
```
### 4. **Detailed Cost Analysis**
The final results include comprehensive cost information:
```python
'analysis': 'Process-based one-at-a-time analysis completed. Made 41 separate API calls (one per master). Found 2 exact matches out of 41 masters checked using 8 concurrent processes.',
'api_calls_made': 41, # One API call per master
```
## Cost Comparison: One-at-a-Time vs Hybrid Mode
### One-at-a-Time Mode
- **API Calls**: 41 calls (one per master image)
- **Typical Cost**: $0.50 - $2.00 per layout
- **Accuracy**: Highest (individual comparison)
- **Use Case**: When maximum accuracy is required
### Hybrid Mode
- **API Calls**: 1 call (panel counting + censorship)
- **Typical Cost**: $0.01 - $0.05 per layout
- **Accuracy**: Very good (local analysis for simple layouts)
- **Use Case**: Cost-efficient processing of large batches
### Cost Savings
Hybrid mode provides approximately **95-98% cost savings** compared to one-at-a-time mode while maintaining good accuracy for most layouts.
## Usage Examples
### Enable One-at-a-Time Cost Tracking
```bash
# Basic one-at-a-time with cost tracking
python cli.py --test --openai --one-at-a-time --enable-cost-tracking
# With detailed cost report
python cli.py --test --openai --one-at-a-time --enable-cost-tracking --cost-report
# With lower concurrency for better cost monitoring
python cli.py --test --openai --one-at-a-time --concurrent-workers 3 --enable-cost-tracking
```
### Hybrid Mode with Fallback
```bash
# Hybrid mode with fallback to one-at-a-time when needed
python cli.py --test --hybrid --fallback-one-at-a-time --enable-cost-tracking
```
## Cost Tracking Output
### Session Summary
```
COST TRACKING SUMMARY
============================================================
Total cost: $1.2345
Total tokens: 45,678
- Input tokens: 28,456
- Output tokens: 17,222
- Cached tokens: 3,456
Total API calls: 41
Layouts processed: 1
Averages:
- Cost per layout: $1.2345
- Tokens per layout: 45,678.0
- API calls per layout: 41.0
- Cost per 1K tokens: $0.0270
Operation breakdown:
- one_at_a_time_detection: 41 calls
============================================================
```
### Cost Report JSON
```json
{
"session_summary": {
"operation_breakdown": {
"one_at_a_time_detection": 41
}
},
"detailed_api_calls": [
{
"operation_type": "one_at_a_time_detection",
"master_id": "1011A_1011_05",
"token_usage": {
"prompt_tokens": 1200,
"completion_tokens": 150,
"total_tokens": 1350,
"cached_tokens": 0
},
"total_cost": 0.0036,
"layout_name": "test_layout.jpg"
}
]
}
```
## Integration with Hybrid Mode
The one-at-a-time cost tracking also works with the hybrid mode's fallback mechanism:
### Hybrid Mode Fallback
When hybrid mode uses the fallback to one-at-a-time detection:
1. **Initial API call**: Panel counting + censorship detection
2. **Fallback API calls**: 41 individual master comparisons
3. **Total API calls**: 42 (1 + 41)
4. **Cost tracking**: Tracks both operation types separately
### Example Hybrid Fallback Cost Breakdown
```
Operation breakdown:
- panel_counting_censorship: 1 call
- one_at_a_time_detection: 41 calls
```
## Testing
Run the comprehensive test to see one-at-a-time cost tracking in action:
```bash
python test_one_at_a_time_cost_tracking.py
```
This test will:
1. Run one-at-a-time mode with cost tracking
2. Show real-time cost progress
3. Generate detailed cost report
4. Compare costs with hybrid mode
## Technical Details
### Multiprocessing Architecture
1. **Worker processes**: Extract token usage from API responses
2. **Main process**: Collects token data and tracks costs
3. **No shared state**: Each process handles its own API calls
4. **Thread-safe**: Cost tracking is done in the main process only
### Error Handling
- **Missing token data**: Warns when API response lacks usage information
- **API failures**: Handles cases where individual API calls fail
- **Graceful degradation**: Cost tracking failure doesn't break processing
### Performance Impact
- **Minimal overhead**: Token extraction adds negligible processing time
- **Memory efficient**: Token data is small and temporary
- **No API rate impact**: No additional API calls are made
## Benefits
### 1. **Detailed Cost Visibility**
- See exact cost per master image comparison
- Identify expensive vs. cheap operations
- Track cost trends over time
### 2. **Cost Optimization**
- Compare one-at-a-time vs. hybrid mode costs
- Make informed decisions about detection method
- Optimize concurrent workers for cost efficiency
### 3. **Budget Planning**
- Accurate cost estimates for large batches
- Understand cost implications of different modes
- Set appropriate spending limits
### 4. **Performance Analysis**
- Correlate cost with accuracy
- Identify optimal worker counts
- Monitor API efficiency
## Future Enhancements
Potential improvements include:
1. **Per-master cost optimization**: Identify which masters are most expensive
2. **Dynamic worker adjustment**: Reduce workers when costs are high
3. **Cost-based fallback**: Use cost thresholds to decide between modes
4. **Master image prioritization**: Process cheaper masters first
## Conclusion
The one-at-a-time cost tracking implementation provides complete visibility into the cost structure of the most accurate detection method. Combined with hybrid mode cost tracking, users can make informed decisions about the trade-offs between accuracy and cost.
The implementation maintains the existing performance characteristics while adding comprehensive cost monitoring capabilities that help optimize both accuracy and budget.

View file

@ -0,0 +1,220 @@
# OpenAI API Cost Tracking Verification Report
## Executive Summary
✅ **All OpenAI API calls are properly instrumented with cost tracking**
After comprehensive code examination, I can confirm that **all 4 OpenAI API calls** in the codebase have been properly instrumented with token usage extraction and cost tracking.
## Complete API Call Inventory
### API Call 1: One-at-a-Time Detection (Multiprocessing)
- **Location**: `openai_detector.py:140`
- **Function**: `process_single_master_detection_openai()`
- **Operation Type**: `"one_at_a_time_detection"`
- **Cost Tracking**: ✅ **IMPLEMENTED**
- **Method**: Token usage extracted in worker process, cost tracked in main process
- **Usage**: Individual master image comparisons with multiprocessing
```python
# Line 140: API call in worker process
response = client.chat.completions.create(...)
# Lines 167-173: Token usage extraction
token_usage_data = {
'prompt_tokens': response.usage.prompt_tokens,
'completion_tokens': response.usage.completion_tokens,
'total_tokens': response.usage.total_tokens,
'cached_tokens': getattr(response.usage, 'cached_tokens', 0)
}
# Lines 617-626: Cost tracking in main process
cost_calculator.track_api_call(
operation_type="one_at_a_time_detection",
prompt_tokens=token_data['prompt_tokens'],
completion_tokens=token_data['completion_tokens'],
cached_tokens=token_data['cached_tokens'],
layout_name=layout_name,
master_id=master_id
)
```
### API Call 2: Regular Detection (Batch)
- **Location**: `openai_detector.py:424`
- **Function**: `make_robust_api_call()`
- **Operation Type**: `"detection"`
- **Cost Tracking**: ✅ **IMPLEMENTED**
- **Method**: Direct cost tracking in same process
- **Usage**: Batch comparison of all masters against layout
```python
# Line 424: API call
response = self.client.chat.completions.create(...)
# Lines 436-444: Cost tracking
if hasattr(response, 'usage') and response.usage:
token_usage = extract_token_usage_from_response(response)
cost_calculator.track_api_call(
operation_type="detection",
prompt_tokens=token_usage.prompt_tokens,
completion_tokens=token_usage.completion_tokens,
cached_tokens=token_usage.cached_tokens,
layout_name=operation_name
)
```
### API Call 3: Censorship Detection (Standalone)
- **Location**: `openai_detector.py:1012`
- **Function**: `detect_layout_censorship()`
- **Operation Type**: `"censorship_detection"`
- **Cost Tracking**: ✅ **IMPLEMENTED**
- **Method**: Direct cost tracking in same process
- **Usage**: Standalone censorship analysis
```python
# Line 1012: API call
response = self.client.chat.completions.create(...)
# Lines 1034-1041: Cost tracking
if hasattr(response, 'usage') and response.usage:
token_usage = extract_token_usage_from_response(response)
cost_calculator.track_api_call(
operation_type="censorship_detection",
prompt_tokens=token_usage.prompt_tokens,
completion_tokens=token_usage.completion_tokens,
cached_tokens=token_usage.cached_tokens,
layout_name=Path(layout_path).name
)
```
### API Call 4: Combined Panel Counting + Censorship
- **Location**: `openai_detector.py:1283`
- **Function**: `count_panels_and_detect_censorship()`
- **Operation Type**: `"panel_counting_censorship"`
- **Cost Tracking**: ✅ **IMPLEMENTED**
- **Method**: Direct cost tracking in same process
- **Usage**: Hybrid mode primary API call
```python
# Line 1283: API call
response = self.client.chat.completions.create(...)
# Lines 1304-1312: Cost tracking
if hasattr(response, 'usage') and response.usage:
token_usage = extract_token_usage_from_response(response)
cost_calculator.track_api_call(
operation_type="panel_counting_censorship",
prompt_tokens=token_usage.prompt_tokens,
completion_tokens=token_usage.completion_tokens,
cached_tokens=token_usage.cached_tokens,
layout_name=layout_name
)
```
## Cost Tracking Architecture
### Operation Types Tracked
1. **`one_at_a_time_detection`**: Individual master comparisons (41 calls per layout)
2. **`detection`**: Batch master comparisons (1 call per layout)
3. **`censorship_detection`**: Standalone censorship analysis (1 call per layout)
4. **`panel_counting_censorship`**: Combined analysis for hybrid mode (1 call per layout)
### Multiprocessing Handling
- **Worker processes**: Extract token usage data from API responses
- **Main process**: Collects token data and performs cost calculations
- **Thread-safe**: No shared state between processes
- **Error handling**: Graceful handling of missing token data
### Cost Tracking Features
- **Real-time tracking**: Cost calculated immediately after each API call
- **Per-layout breakdown**: Cost associated with specific layout files
- **Master-level granularity**: Individual costs for one-at-a-time mode
- **Session summaries**: Comprehensive cost reporting across all operations
## Verification Methods Used
### 1. **Code Search**
- Searched for all `client.chat.completions.create` calls
- Verified each call has corresponding cost tracking
- Confirmed no orphaned API calls exist
### 2. **Manual Code Review**
- Examined each API call location
- Verified token extraction implementation
- Confirmed cost tracking integration
### 3. **Architecture Analysis**
- Analyzed multiprocessing token data flow
- Verified main process cost collection
- Confirmed operation type categorization
## Cost Tracking Coverage Summary
| API Call Location | Function | Operation Type | Cost Tracking | Status |
|------------------|----------|----------------|---------------|---------|
| `openai_detector.py:140` | `process_single_master_detection_openai()` | `one_at_a_time_detection` | ✅ | Complete |
| `openai_detector.py:424` | `make_robust_api_call()` | `detection` | ✅ | Complete |
| `openai_detector.py:1012` | `detect_layout_censorship()` | `censorship_detection` | ✅ | Complete |
| `openai_detector.py:1283` | `count_panels_and_detect_censorship()` | `panel_counting_censorship` | ✅ | Complete |
## Usage Mode Coverage
### ✅ **OpenAI Mode (Regular)**
- **API Call**: `detection` (1 call per layout)
- **Cost Tracking**: Fully implemented
- **Usage**: `--openai`
### ✅ **OpenAI Mode (One-at-a-Time)**
- **API Call**: `one_at_a_time_detection` (41 calls per layout)
- **Cost Tracking**: Fully implemented with multiprocessing support
- **Usage**: `--openai --one-at-a-time`
### ✅ **Hybrid Mode**
- **API Call**: `panel_counting_censorship` (1 call per layout)
- **Cost Tracking**: Fully implemented
- **Usage**: `--hybrid`
### ✅ **Hybrid Mode with Fallback**
- **API Calls**: `panel_counting_censorship` + `one_at_a_time_detection` (1 + 41 calls)
- **Cost Tracking**: Both operation types tracked separately
- **Usage**: `--hybrid --fallback-one-at-a-time`
### ✅ **CEN Refinement**
- **API Call**: `censorship_detection` (additional call when needed)
- **Cost Tracking**: Fully implemented
- **Usage**: `--refinement-mode`
## Token Usage Data Captured
For each API call, the following token data is captured:
- **Prompt tokens**: Input tokens sent to the API
- **Completion tokens**: Output tokens generated by the API
- **Total tokens**: Sum of prompt and completion tokens
- **Cached tokens**: Tokens from cached input (if applicable)
## Cost Calculation
Using OpenAI o3 pricing:
- **Input tokens**: $2.00 per million tokens
- **Cached input**: $0.50 per million tokens
- **Output tokens**: $8.00 per million tokens
## Error Handling
All API calls include proper error handling for cost tracking:
- **Missing usage data**: Graceful handling when API response lacks token information
- **API failures**: Cost tracking doesn't interfere with error handling
- **Multiprocessing errors**: Worker process failures don't break cost tracking
## Testing Coverage
Cost tracking can be tested with:
- **Unit tests**: `test_cost_calculator.py`
- **Integration tests**: `test_cost_tracking_integration.py`
- **One-at-a-time tests**: `test_one_at_a_time_cost_tracking.py`
## Conclusion
**VERIFICATION COMPLETE**: All OpenAI API calls in the codebase are properly instrumented with comprehensive cost tracking. The implementation covers all usage modes, operation types, and edge cases including multiprocessing and error handling.
The cost tracking system provides complete visibility into OpenAI API usage costs across all detection modes and operational scenarios.

118
README.md Normal file
View file

@ -0,0 +1,118 @@
# Master Image Detection Application
This application uses Google Gemini 2.5 Pro API to detect which master images appear in layout images.
## Features
- **Filename-based IDs**: Master images are identified by their filenames (without .jpg extension)
- **Comprehensive Detection**: Finds exact matches, cropped versions, scaled/rotated images
- **Detailed Results**: JSON output with layout filenames and detected master filenames
- **Optimized Processing**: Sequential processing with master images uploaded only once
- **Progress Tracking**: Real-time progress updates and periodic saves during batch processing
- **Error Handling**: Automatic retries and graceful error recovery
## Setup
1. **Install Dependencies**:
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
2. **Configure API Key**:
- API key is already set in `.env` file
- Ensure `.env` file exists with your Gemini API key
## Usage
Activate the virtual environment first:
```bash
source venv/bin/activate
```
### Command Line Options
```bash
# Test with 1 layout
python image_detector.py --test
# Process first 10 layouts
python image_detector.py --limit 10
# Process all layouts
python image_detector.py --all
# Custom output filename
python image_detector.py --limit 50 --output my_batch_results
# Process all layouts (sequential but optimized)
python image_detector.py --all
# Custom paths
python image_detector.py --all --master-path /path/to/masters --layout-path /path/to/layouts
```
### Help
```bash
python image_detector.py --help
```
### Common Commands
```bash
# Quick test
python image_detector.py --test
# Small batch
python image_detector.py --limit 10
# Full processing (all 306 layouts) - optimized sequential
python image_detector.py --all
```
## Output Format
Results are saved as JSON with this structure:
```json
{
"metadata": {
"total_layouts_processed": 1,
"total_master_images": 41,
"master_images_available": ["1011A_1011_05", "1011A_1011_06", ...]
},
"results": {
"6814786": {
"layout_filename": "6814786.jpg",
"detected_master_ids": ["1011A_1011_05"],
"detected_master_filenames": ["1011A_1011_05.jpg"],
"analysis": "Detailed analysis of what was found..."
}
}
}
```
## Key Output Fields
- **layout_filename**: The layout image filename
- **detected_master_ids**: Master image IDs (filenames without .jpg)
- **detected_master_filenames**: Full master image filenames with .jpg extension
- **analysis**: Gemini's detailed explanation of the detection
## Directory Structure
```
├── master_images/ # 41 master images to detect
├── layouts/ # 299+ layout images to analyze
├── results/ # JSON output files
├── venv/ # Python virtual environment
├── image_detector.py # Main application
├── test_simple.py # API connection tester
├── requirements.txt # Dependencies
└── .env # API configuration
```
## Example Results
Layout `6814786.jpg` contains master image `1011A_1011_05.jpg` (cropped version).

109
SPLITTING_TEST_RESULTS.md Normal file
View file

@ -0,0 +1,109 @@
# Panel Splitting Test Results
## ✅ Test Summary: SUCCESS
The panel splitting functionality has been successfully implemented and tested with the multi-panel layout image `6786505.jpg`.
## 🧪 Test Results
### Image Details
- **Test Image**: `6786505.jpg`
- **Dimensions**: 10592 x 802 pixels (horizontal strip layout)
- **Type**: Multi-panel fashion advertisement layout
### Splitting Performance
| Target Count | Generated Splits | Status |
|-------------|------------------|--------|
| 5 | 4 | ✅ Good |
| 8 | 8 | ✅ Perfect |
| 10 | 7 | ✅ Good |
| 12 | 6 | ✅ Reasonable |
### Individual Method Performance
| Method | Boundaries Generated | Status |
|--------|---------------------|--------|
| Enhanced Gradient Analysis | 5 | ✅ Working |
| Advanced Canny Detection | 1 | ✅ Working |
| Template Matching | 15 | ✅ Working |
| Contour Analysis | 0 | ⚠️ No results |
| Texture Analysis | 3 | ✅ Working |
| Clustering Method | 1 | ✅ Working |
## 🔬 Technical Analysis
### Consensus System
- **Template Matching** performed best with 15 detailed boundaries
- **Enhanced Gradient Analysis** provided good 5-boundary results
- **Consensus system** successfully combined multiple methods
- **Confidence scoring** worked effectively (0.8-1.0 range)
### Split Quality
- **Coverage**: Good coverage of original image
- **No overlaps**: Clean boundary detection
- **Reasonable aspect ratios**: Splits maintain good proportions
- **Debug output**: Comprehensive visualization available
## 📁 Generated Files
### Split Images Created:
- `6786505_target5_split_01.jpg` through `6786505_target5_split_04.jpg`
- `6786505_target8_split_01.jpg` through `6786505_target8_split_08.jpg`
- `6786505_target10_split_01.jpg` through `6786505_target10_split_07.jpg`
- `6786505_target12_split_01.jpg` through `6786505_target12_split_06.jpg`
### Debug Files:
- Debug visualization saved to `debug_splitting/` directory
- Individual method results analyzed and logged
## 🚀 Implementation Features
### ✅ Completed Features:
1. **Multi-Method Approach**: 6 different CV techniques
2. **Consensus System**: Weighted voting and boundary clustering
3. **Target Count Guidance**: Adaptive splitting based on expected panels
4. **Quality Validation**: Overlap detection and coverage analysis
5. **Debug Mode**: Comprehensive visualization and logging
6. **Fallback Mechanisms**: Graceful degradation when methods fail
### 🔧 Technical Implementation:
- **Gradient Peak Analysis**: Multi-scale processing with prominence detection
- **Canny Edge Detection**: Multi-threshold with morphological operations
- **Template Matching**: Common separator pattern detection
- **Contour Analysis**: Rectangular panel detection
- **Texture Analysis**: LBP-based separator identification
- **Clustering**: K-means based region segmentation
## 🎯 Next Steps
The panel splitting implementation is ready for:
1. **CLI Integration**: `--split` flag fully implemented
2. **OpenAI Guidance**: Panel count targeting system ready
3. **Detector Integration**: Works with all detector types
4. **Refinement Mode**: Compatible with existing CEN refinement
## 💡 Usage Examples
```bash
# Test basic splitting
python test_simple_split.py
# Test with CLI
python cli.py --test --split
# With OpenAI guidance
python cli.py --test --openai --split
# With refinement mode
python cli.py --test --split --refinement-mode
```
## 📊 Performance Notes
- **Processing Time**: Fast for most methods
- **Memory Usage**: Reasonable for large images
- **Accuracy**: Good boundary detection for horizontal strips
- **Reliability**: Multiple fallback mechanisms ensure results
The implementation successfully demonstrates robust multi-panel layout splitting with comprehensive testing and validation.

409
advanced_splitter.py Normal file
View file

@ -0,0 +1,409 @@
#!/usr/bin/env python3
"""
Advanced Panel Splitter Module - Edge detection and gutter analysis for panel splitting
"""
import os
import cv2
import numpy as np
from typing import List, Dict, Tuple, Optional
from pathlib import Path
from PIL import Image
class AdvancedPanelSplitter:
"""
Advanced panel splitter using edge detection and gutter analysis
for more accurate splitting of horizontal multi-panel marketing layouts.
"""
def __init__(self, percentile: float = 10, min_gap: int = 5, debug: bool = False):
"""
Initialize the AdvancedPanelSplitter
Args:
percentile (float): Percentile threshold for detecting gutters (0-100)
min_gap (int): Minimum consecutive low-energy columns for gutter detection
debug (bool): Enable debug mode for visualization
"""
self.percentile = percentile
self.min_gap = min_gap
self.debug = debug
self.debug_dir = "debug_advanced_splitting"
if self.debug and not os.path.exists(self.debug_dir):
os.makedirs(self.debug_dir)
def find_boundaries_auto(self, img_gray: np.ndarray) -> List[int]:
"""
Locate column indices that represent gutters between panels.
Parameters
----------
img_gray : np.ndarray
Grayscale image (H, W).
Returns
-------
List[int]
Sorted list of boundary xcoordinates (including 0 and width1).
"""
# Vertical Sobel to highlight vertical edges
sobelx = cv2.Sobel(img_gray, cv2.CV_64F, 1, 0, ksize=3)
col_energy = np.abs(sobelx).sum(axis=0) # 1D edge energy profile
thresh = np.percentile(col_energy, self.percentile)
low_energy_cols = np.where(col_energy < thresh)[0]
if low_energy_cols.size == 0: # fallback to equidistant split in worst case
return [0, img_gray.shape[1] - 1]
# Group consecutive columns
clusters, current = [], [low_energy_cols[0]]
for c in low_energy_cols[1:]:
if c == current[-1] + 1:
current.append(c)
else:
clusters.append(current)
current = [c]
clusters.append(current)
# Keep clusters that are wide enough (filter noise)
clusters = [cl for cl in clusters if len(cl) >= self.min_gap]
# Use the centre of each cluster as the boundary position
boundaries = [0] + [int(np.mean(cl)) for cl in clusters] + [img_gray.shape[1] - 1]
boundaries = sorted(list(set(boundaries))) # deduplicate & sort
return boundaries
def split_image(self, img: Image.Image, boundaries: List[int], out_dir: Path, stem: str) -> List[Dict]:
"""
Crop and save each panel, returning split information.
Parameters
----------
img : PIL.Image.Image
boundaries : List[int]
Sorted x positions of panel borders.
out_dir : Path
Where to write files.
stem : str
Base name for panel files.
Returns
-------
List[Dict]
List of split information with image data and metadata
"""
out_dir.mkdir(parents=True, exist_ok=True)
splits = []
for i in range(len(boundaries) - 1):
left = boundaries[i]
right = boundaries[i + 1]
if right - left < 5: # skip spurious zerowidth crops
continue
panel = img.crop((left, 0, right, img.height))
# Convert PIL image to OpenCV format for consistency
panel_cv = cv2.cvtColor(np.array(panel), cv2.COLOR_RGB2BGR)
# Save panel file if debug mode
if self.debug:
panel_path = out_dir / f"{stem}_panel_{i+1:02d}.png"
panel.save(panel_path, "PNG")
print(f"Saved panel {i+1}{stem}_panel_{i+1:02d}.png")
splits.append({
'image': panel_cv,
'bounds': (left, 0, right - left, img.height),
'confidence': 0.9, # High confidence for advanced method
'method': 'advanced_edge_detection'
})
return splits
def split_layout_and_match(self, layout_path: str, master_images: List[str],
detector_instance=None, n_panels: Optional[int] = None) -> Dict:
"""
Main method to split a layout using advanced edge detection and match splits to master images
Args:
layout_path (str): Path to the layout image
master_images (List[str]): List of master image paths
detector_instance: The detector instance to use for matching
n_panels (int, optional): If provided, split into this many equalwidth panels
Returns:
Dict: Detection results with matches from all splits
"""
# Load image
img = Image.open(layout_path).convert("RGB")
img_gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
print(f"Processing {os.path.basename(layout_path)} with advanced splitting")
print(f"Image dimensions: {img.width}x{img.height}")
print(f"Percentile threshold: {self.percentile}, Min gap: {self.min_gap}")
# Determine split boundaries
if n_panels:
# Equally spaced boundaries
w = img.width
step = w / n_panels
boundaries = [0] + [int(round(step * k)) for k in range(1, n_panels)] + [w - 1]
print(f"Using fixed {n_panels} panels with equal spacing")
else:
boundaries = self.find_boundaries_auto(img_gray)
print(f"Auto-detected {len(boundaries) - 1} panels")
# Create output directory for splits if debug mode
out_dir = Path(self.debug_dir) if self.debug else Path("/tmp/advanced_splits")
stem = Path(layout_path).stem
# Split the image
splits = self.split_image(img, boundaries, out_dir, stem)
if not splits:
print("No splits detected, returning empty results")
return {
'layout_path': layout_path,
'detected_masters': [],
'panel_count': 0,
'split_mode': 'advanced',
'splits_generated': 0,
'percentile': self.percentile,
'min_gap': self.min_gap
}
print(f"Generated {len(splits)} splits using advanced method")
# Match each split to master images
all_matches = []
split_results = []
for i, split_info in enumerate(splits):
print(f"Processing split {i+1}/{len(splits)}")
# Save split image temporarily for matching
split_image = split_info['image']
temp_split_path = f"/tmp/advanced_split_{i}.jpg"
cv2.imwrite(temp_split_path, split_image)
# Match this split to master images using existing inlier analysis
if hasattr(detector_instance, 'match_split_to_masters'):
split_matches = detector_instance.match_split_to_masters(
temp_split_path, master_images
)
else:
# Use basic inlier analysis if method doesn't exist
split_matches = self._match_split_basic(temp_split_path, master_images)
# Add split metadata to matches
for match in split_matches:
match['split_index'] = i
match['split_bounds'] = split_info['bounds']
match['split_confidence'] = split_info['confidence']
match['split_method'] = 'advanced_edge_detection'
all_matches.append(match)
split_results.append({
'split_index': i,
'bounds': split_info['bounds'],
'confidence': split_info['confidence'],
'method': 'advanced_edge_detection',
'matches': split_matches
})
# Clean up temporary file
if os.path.exists(temp_split_path):
os.remove(temp_split_path)
# Aggregate results
result = {
'layout_path': layout_path,
'detected_masters': [match['master_id'] for match in all_matches],
'panel_count': len(splits),
'split_mode': 'advanced',
'splits_generated': len(splits),
'split_results': split_results,
'all_matches': all_matches,
'percentile': self.percentile,
'min_gap': self.min_gap,
'boundaries': boundaries
}
# Remove duplicates while preserving highest confidence matches
result = self._deduplicate_matches(result)
return result
def split_panels(self, image_path: str, target_panel_count: int) -> List[Dict]:
"""
Split a layout image into individual panels (compatibility method for hybrid detector)
Args:
image_path (str): Path to the layout image
target_panel_count (int): Target number of panels to split into
Returns:
List[Dict]: List of split information with image data and metadata
"""
# Load image
img = Image.open(image_path).convert("RGB")
img_gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
print(f"Advanced splitting: Processing {os.path.basename(image_path)}")
print(f"Image dimensions: {img.width}x{img.height}")
print(f"Target panels: {target_panel_count}, Percentile: {self.percentile}, Min gap: {self.min_gap}")
# Determine split boundaries
boundaries = self.find_boundaries_auto(img_gray)
print(f"Auto-detected {len(boundaries) - 1} panels using advanced method")
# Create output directory for splits if debug mode
out_dir = Path(self.debug_dir) if self.debug else Path("/tmp/advanced_splits")
stem = Path(image_path).stem
# Split the image
splits = self.split_image(img, boundaries, out_dir, stem)
if not splits:
print("No splits detected, falling back to equal division")
# Fallback to equal division if no splits detected
w = img.width
h = img.height
panel_width = w // target_panel_count
splits = []
for i in range(target_panel_count):
x = i * panel_width
width = panel_width if i < target_panel_count - 1 else w - x
panel_img = img.crop((x, 0, x + width, h))
panel_cv = cv2.cvtColor(np.array(panel_img), cv2.COLOR_RGB2BGR)
splits.append({
'image': panel_cv,
'bounds': (x, 0, width, h),
'confidence': 0.7,
'method': 'advanced_fallback_equal_division'
})
return splits
def _match_split_basic(self, split_path: str, master_images: List[str]) -> List[Dict]:
"""Basic matching using OpenCV features (fallback)"""
matches = []
try:
# Load the split image
split_img = cv2.imread(split_path, cv2.IMREAD_GRAYSCALE)
if split_img is None:
return matches
# Initialize feature detector
akaze = cv2.AKAZE_create()
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
# Detect keypoints and descriptors for split image
kp_split, des_split = akaze.detectAndCompute(split_img, None)
if des_split is None:
return matches
# Load master images from the master_images directory
master_images_path = Path("master_images")
for master_id in master_images:
master_path = master_images_path / f"{master_id}.jpg"
if not master_path.exists():
continue
# Load master image
master_img = cv2.imread(str(master_path), cv2.IMREAD_GRAYSCALE)
if master_img is None:
continue
# Detect keypoints and descriptors for master image
kp_master, des_master = akaze.detectAndCompute(master_img, None)
if des_master is None:
continue
# Match features
matches_raw = bf.knnMatch(des_split, des_master, k=2)
# Apply Lowe's ratio test
good_matches = []
for match_pair in matches_raw:
if len(match_pair) == 2:
m, n = match_pair
if m.distance < 0.7 * n.distance:
good_matches.append(m)
# If we have enough good matches, try to find homography
if len(good_matches) >= 10:
src_pts = np.float32([kp_split[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp_master[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
try:
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
if M is not None:
inliers = int(np.sum(mask))
inlier_ratio = inliers / len(good_matches)
# Basic confidence scoring
if inliers >= 15 and inlier_ratio >= 0.6:
confidence = 'high'
elif inliers >= 8 and inlier_ratio >= 0.4:
confidence = 'medium'
else:
confidence = 'low'
# Only include medium and high confidence matches
if confidence in ['medium', 'high']:
matches.append({
'master_id': master_id,
'confidence': confidence,
'inliers': inliers,
'match_details': {
'inliers': inliers,
'good_matches': len(good_matches),
'inlier_ratio': round(inlier_ratio, 3)
}
})
except:
continue
except Exception as e:
print(f"Error in basic matching: {e}")
return matches
def _deduplicate_matches(self, result: Dict) -> Dict:
"""Remove duplicate matches, keeping highest confidence ones"""
if not result['all_matches']:
return result
# Group matches by master_id
master_groups = {}
for match in result['all_matches']:
master_id = match['master_id']
if master_id not in master_groups:
master_groups[master_id] = []
master_groups[master_id].append(match)
# Keep only the highest confidence match for each master
deduplicated_matches = []
for master_id, matches in master_groups.items():
# Sort by confidence (high > medium > low) and inliers
confidence_order = {'high': 3, 'medium': 2, 'low': 1}
best_match = max(matches, key=lambda x: (
confidence_order.get(x.get('confidence', 'low'), 0),
x.get('inliers', 0)
))
deduplicated_matches.append(best_match)
result['all_matches'] = deduplicated_matches
result['detected_masters'] = [match['master_id'] for match in deduplicated_matches]
return result

133
check_system_resources.py Normal file
View file

@ -0,0 +1,133 @@
#!/usr/bin/env python3
"""
Check and display system resource limits and current usage
"""
import os
import subprocess
import resource
import psutil
def check_file_descriptors():
"""Check file descriptor limits and usage"""
print("🔍 FILE DESCRIPTOR LIMITS")
print("=" * 40)
# Get current limits
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_NOFILE)
print(f"Current soft limit: {soft_limit:,}")
print(f"Current hard limit: {hard_limit:,}")
# Get current usage
pid = os.getpid()
try:
result = subprocess.run(['lsof', '-p', str(pid)], capture_output=True, text=True)
open_files = len(result.stdout.strip().split('\n')) - 1
print(f"Current open files: {open_files}")
print(f"Usage: {open_files/soft_limit*100:.1f}% of soft limit")
except:
print("Could not determine current usage")
print()
# Recommendations
if soft_limit < 10000:
print("⚠️ RECOMMENDATION: Your soft limit is quite low")
print(" Consider increasing it with: ulimit -n 65536")
elif soft_limit < 65536:
print(" RECOMMENDATION: Consider increasing soft limit for heavy processing")
print(" Command: ulimit -n 65536")
else:
print("✅ File descriptor limits look good")
print()
def check_memory():
"""Check memory usage"""
print("🧠 MEMORY USAGE")
print("=" * 40)
memory = psutil.virtual_memory()
swap = psutil.swap_memory()
print(f"RAM: {memory.percent:.1f}% used ({memory.used/1024**3:.1f}GB / {memory.total/1024**3:.1f}GB)")
print(f"Swap: {swap.percent:.1f}% used ({swap.used/1024**3:.1f}GB / {swap.total/1024**3:.1f}GB)")
if memory.percent > 80:
print("⚠️ High RAM usage detected")
if swap.percent > 50:
print("⚠️ High swap usage detected")
print()
def check_processes():
"""Check running processes"""
print("🔄 PROCESS INFORMATION")
print("=" * 40)
# CPU info
print(f"CPU cores: {os.cpu_count()}")
print(f"CPU usage: {psutil.cpu_percent(interval=1):.1f}%")
# Load average
try:
load1, load5, load15 = os.getloadavg()
print(f"Load average: {load1:.2f}, {load5:.2f}, {load15:.2f}")
except:
print("Load average: unavailable")
print()
def recommend_settings():
"""Recommend optimal settings"""
print("🎯 RECOMMENDED SETTINGS")
print("=" * 40)
cpu_count = os.cpu_count()
memory_gb = psutil.virtual_memory().total / (1024**3)
print(f"For your system ({cpu_count} cores, {memory_gb:.1f}GB RAM):")
print()
# Layout workers
if memory_gb < 16:
layout_workers = min(2, cpu_count // 2)
print(f"--layout-workers {layout_workers} (conservative for {memory_gb:.1f}GB RAM)")
elif memory_gb < 32:
layout_workers = min(4, cpu_count // 2)
print(f"--layout-workers {layout_workers} (balanced for {memory_gb:.1f}GB RAM)")
else:
layout_workers = min(6, cpu_count // 2)
print(f"--layout-workers {layout_workers} (aggressive for {memory_gb:.1f}GB RAM)")
# Local workers
local_workers = max(1, cpu_count - 2)
print(f"--local-workers {local_workers} (CPU cores - 2)")
print()
print("Full command suggestion:")
print("python cli.py --all --hybrid --split-simple --refinement-mode \\")
print(" --inlier-threshold 0.15 --inlier-ratio-threshold 0.2 \\")
print(" --fallback-one-at-a-time --enable-cost-tracking --cost-report \\")
print(f" --parallel-layouts --layout-workers {layout_workers} --local-workers {local_workers}")
print()
def main():
print("🔧 SYSTEM RESOURCE CHECK")
print("=" * 50)
print()
check_file_descriptors()
check_memory()
check_processes()
recommend_settings()
print("💡 TROUBLESHOOTING TIPS:")
print("- If you get 'Too many open files': restart terminal and run 'ulimit -n 65536'")
print("- If memory usage is high: reduce --layout-workers")
print("- If processing is slow: check if swap usage is very high")
print("- Monitor with: Activity Monitor or 'top' command")
if __name__ == "__main__":
main()

641
cli.py Normal file
View file

@ -0,0 +1,641 @@
#!/usr/bin/env python3
"""
CLI Module - Command Line Interface for Master Image Detection
"""
import argparse
import multiprocessing
import sys
from logging_config import setup_dual_logging, log_system_info, log_exception, DualLogger
from gemini_detector import ImageDetector
from openai_detector import OpenAIImageDetector
from vector_detector import VectorImageDetector
from hybrid_detector import HybridImageDetector
from cost_calculator import cost_calculator
def parse_arguments():
"""Parse command line arguments"""
parser = argparse.ArgumentParser(
description="Detect master images in layout images using LLM APIs (Gemini, OpenAI), Vector Embeddings, or Hybrid mode",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --test # Test with 1 layout (Gemini LLM)
%(prog)s --test --openai # Test with 1 layout (OpenAI o3)
%(prog)s --file 6786505.jpg --openai --split # Process specific file with OpenAI and splitting
%(prog)s --limit 10 # Process first 10 layouts (Gemini LLM)
%(prog)s --limit 10 --openai # Process first 10 layouts (OpenAI o3)
%(prog)s --all # Process all layouts (Gemini LLM)
%(prog)s --all --openai # Process all layouts (OpenAI o3)
%(prog)s --test --vector-mode # Test with 1 layout (Vector embeddings)
%(prog)s --limit 10 --vector-mode # Process first 10 layouts (Vector embeddings)
%(prog)s --all --vector-mode # Process all layouts (Vector embeddings)
%(prog)s --test --vector-mode --splitting-mode grid # Test with grid-based image splitting
%(prog)s --limit 5 --vector-mode --splitting-mode auto # Auto-detect splitting for 5 layouts
%(prog)s --all --vector-mode --similarity-threshold 0.8 # Process all with higher similarity threshold
%(prog)s --all --output my_results # Process all with custom output name
%(prog)s --limit 50 --output batch_50 # Process 50 layouts with custom name
%(prog)s --all --no-greyscale # Process all without greyscale conversion (Gemini only)
%(prog)s --all --no-contrast # Process all without contrast enhancement (Gemini only)
%(prog)s --all --contrast-factor 2.0 # Process all with higher contrast factor (Gemini only)
%(prog)s --test --refinement-mode # Test with CEN refinement (1 layout)
%(prog)s --limit 5 --refinement-mode # Process 5 layouts with CEN refinement
%(prog)s --all --refinement-mode --output refined_results # Full run with CEN refinement
%(prog)s --test --one-at-a-time # Test one-at-a-time mode (1 layout, 41 concurrent API calls)
%(prog)s --test --openai --one-at-a-time # Test OpenAI one-at-a-time mode
%(prog)s --all --one-at-a-time --refinement-mode # Full run with one-at-a-time and CEN refinement
%(prog)s --test --one-at-a-time --concurrent-workers 8 # Test with 8 concurrent workers (faster)
%(prog)s --all --openai --one-at-a-time --concurrent-workers 3 # OpenAI full run with 3 concurrent workers
%(prog)s --test --openai --one-at-a-time --panel-aware-refinement # Test with panel counting and OpenCV refinement
%(prog)s --all --openai --one-at-a-time --panel-aware-refinement # Full run with panel-aware match refinement
%(prog)s --test --openai --one-at-a-time --refinement-mode --panel-aware-refinement # Test with both CEN and panel-aware refinement
%(prog)s --test --hybrid # Test with hybrid mode (OpenAI panel counting + local analysis)
%(prog)s --limit 10 --hybrid # Process first 10 layouts with hybrid mode
%(prog)s --all --hybrid # Process all layouts with hybrid mode
%(prog)s --test --hybrid --panel-threshold 3 # Test hybrid mode with custom panel threshold
%(prog)s --all --hybrid --refinement-mode # Full run with hybrid mode and CEN refinement
%(prog)s --test --hybrid --enable-greyscale # Test hybrid mode with greyscale override
%(prog)s --limit 5 --hybrid --enable-contrast # Test hybrid mode with contrast enhancement override
%(prog)s --test --hybrid --concurrent-workers 8 # Test hybrid mode with 8 concurrent workers (both paths)
%(prog)s --all --hybrid --concurrent-workers 3 # Full hybrid run with 3 concurrent workers (both paths)
%(prog)s --test --hybrid --openai-workers 10 --local-workers 8 # Test with separate worker counts
%(prog)s --all --hybrid --local-workers 12 # Full run with 12 local workers (OpenAI auto-detects)
%(prog)s --test --hybrid --split-advanced # Test with advanced edge detection splitting
%(prog)s --limit 10 --hybrid --split-advanced --percentile 15 --min-gap 3 # Advanced splitting with custom parameters
%(prog)s --test --hybrid --vector-mode # Test hybrid mode with vector similarity instead of inlier analysis
%(prog)s --limit 10 --hybrid --vector-mode --similarity-threshold 0.8 # Hybrid with higher similarity threshold
%(prog)s --all --hybrid --vector-mode --split-simple # Full run with vector similarity and simple splitting
%(prog)s --all --hybrid --split-simple --fallback-one-at-a-time # Hybrid with fallback to OpenAI one-at-a-time when needed
%(prog)s --test --hybrid --parallel-layouts # Test hybrid mode with parallel layout processing
%(prog)s --limit 10 --hybrid --parallel-layouts --layout-workers 4 # Process 10 layouts with 4 parallel workers
%(prog)s --all --hybrid --parallel-layouts --layout-workers 6 --max-concurrent-layouts 4 # Full run with controlled parallelism
%(prog)s --test --hybrid --enable-cost-tracking # Test with detailed cost tracking enabled
%(prog)s --limit 10 --openai --enable-cost-tracking --cost-report # OpenAI mode with cost tracking and report generation
%(prog)s --all --hybrid --enable-cost-tracking --cost-estimate 300 # Full run with cost tracking and monthly estimate
"""
)
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('--test', action='store_true',
help='Test mode: process only 1 layout image')
group.add_argument('--limit', type=int, metavar='N',
help='Process first N layout images')
group.add_argument('--all', action='store_true',
help='Process all layout images')
group.add_argument('--file', type=str, metavar='FILENAME',
help='Process a specific layout file (e.g., --file 6786505.jpg)')
parser.add_argument('--output', type=str, default=None, metavar='NAME',
help='Output filename (without .json extension). Default: auto-generated based on mode')
# Image processing options
parser.add_argument('--no-greyscale', action='store_true',
help='Disable greyscale conversion (enabled by default for Gemini/OpenAI, disabled for hybrid)')
parser.add_argument('--no-contrast', action='store_true',
help='Disable contrast enhancement (enabled by default for Gemini/OpenAI, disabled for hybrid)')
parser.add_argument('--contrast-factor', type=float, default=1.5, metavar='FACTOR',
help='Contrast enhancement factor (default: 1.5)')
# Hybrid mode image processing overrides
parser.add_argument('--enable-greyscale', action='store_true',
help='Enable greyscale conversion for hybrid mode (disabled by default)')
parser.add_argument('--enable-contrast', action='store_true',
help='Enable contrast enhancement for hybrid mode (disabled by default)')
# Operating mode options
provider_group = parser.add_mutually_exclusive_group()
provider_group.add_argument('--openai', action='store_true',
help='Use OpenAI o3 model instead of Gemini (requires OPENAI_API_KEY)')
provider_group.add_argument('--hybrid', action='store_true',
help='Use hybrid mode: OpenAI panel counting + local analysis (≤2 panels) or full OpenAI (≥3 panels)')
# Vector mode option (can be combined with hybrid mode)
parser.add_argument('--vector-mode', action='store_true',
help='Use Google Vertex AI vector embeddings for similarity matching. Can be combined with --hybrid to replace inlier analysis with vector similarity.')
parser.add_argument('--similarity-threshold', type=float, default=0.75, metavar='THRESHOLD',
help='Similarity threshold for vector mode (0.0-1.0, default: 0.75)')
parser.add_argument('--no-truncation', action='store_true',
help='Disable truncation of match results in hybrid mode (keeps all matches instead of limiting to panel count)')
parser.add_argument('--splitting-mode', type=str, default='none',
choices=['none', 'auto', 'grid'], metavar='MODE',
help='Image splitting mode for vector mode: none, auto, grid (default: none)')
parser.add_argument('--min-crop-size', type=int, default=200, metavar='PIXELS',
help='Minimum crop size in pixels for splitting (default: 200)')
parser.add_argument('--crop-padding', type=int, default=20, metavar='PIXELS',
help='Padding around detected crops in pixels (default: 20)')
parser.add_argument('--refinement-mode', action='store_true',
help='Enable CEN refinement mode (after initial detection, refines CEN vs non-CEN matches)')
parser.add_argument('--one-at-a-time', action='store_true',
help='Process masters one at a time using separate processes (makes 41 separate API calls per layout for exact matching)')
parser.add_argument('--concurrent-workers', type=int, default=None, metavar='N',
help='Number of concurrent processes (auto-detects optimal values: OpenAI=total_masters, Local=CPU_cores if not specified)')
parser.add_argument('--openai-workers', type=int, default=None, metavar='N',
help='Number of concurrent processes for OpenAI analysis (default: total number of master images)')
parser.add_argument('--local-workers', type=int, default=None, metavar='N',
help='Number of concurrent processes for local inlier analysis (default: number of CPU cores)')
parser.add_argument('--panel-aware-refinement', action='store_true',
help='Enable panel-aware refinement: count panels with OpenAI o3, then use OpenCV inlier analysis to select best matches (only works with --openai --one-at-a-time)')
parser.add_argument('--split', action='store_true',
help='Enable panel splitting mode: split multi-panel layouts into individual images using multiple CV methods, then match each split to masters')
parser.add_argument('--split-advanced', action='store_true',
help='Enable advanced panel splitting mode: use edge detection and gutter analysis for more accurate splitting')
parser.add_argument('--split-simple', action='store_true',
help='Enable simple panel splitting mode: evenly split layout into panels based on OpenAI analysis count (hybrid mode only)')
parser.add_argument('--percentile', type=float, default=10, metavar='THRESHOLD',
help='Percentile threshold (0-100) for detecting gutters in advanced splitting; lower = stricter (default: 10)')
parser.add_argument('--min-gap', type=int, default=5, metavar='PIXELS',
help='Minimum consecutive low-energy columns needed to mark a gutter in advanced splitting (default: 5)')
# Hybrid mode specific options
parser.add_argument('--panel-threshold', type=int, default=2, metavar='N',
help='Panel threshold for hybrid mode: ≤N panels use local analysis, >N panels use OpenAI (default: 2)')
parser.add_argument('--inlier-threshold', type=float, default=0.65, metavar='THRESHOLD',
help='Inlier confidence threshold for hybrid local analysis (default: 0.65)')
parser.add_argument('--inlier-ratio-threshold', type=float, default=0.4, metavar='THRESHOLD',
help='Minimum inlier ratio for confident matches (default: 0.4)')
parser.add_argument('--fallback-one-at-a-time', action='store_true',
help='Enable fallback to OpenAI one-at-a-time method when matched masters < detected panels. Uses multiprocessing with number of workers equal to number of masters.')
# Parallel processing options
parser.add_argument('--parallel-layouts', action='store_true',
help='Enable parallel layout processing with serial inlier analysis coordination')
parser.add_argument('--layout-workers', type=int, default=None, metavar='N',
help='Number of concurrent layout workers for parallel processing (default: auto-detect based on CPU cores)')
parser.add_argument('--max-concurrent-layouts', type=int, default=None, metavar='N',
help='Maximum layouts processing simultaneously (default: same as layout-workers)')
# Cost tracking options
parser.add_argument('--enable-cost-tracking', action='store_true',
help='Enable detailed cost tracking and reporting for OpenAI API usage')
parser.add_argument('--cost-report', action='store_true',
help='Generate detailed cost report after processing')
parser.add_argument('--cost-estimate', type=int, metavar='LAYOUTS',
help='Estimate monthly cost based on specified number of layouts per month (default: 300)')
return parser.parse_args()
def main():
"""Main execution function"""
# Initialize dual logging first
logger = setup_dual_logging()
dual_logger = DualLogger(logger)
# Log system information
log_system_info(logger)
try:
args = parse_arguments()
# Initialize cost tracking if enabled
if args.enable_cost_tracking:
cost_calculator.enable_tracking = True
dual_logger.print("Cost tracking enabled")
# Determine processing parameters
if args.test:
limit = 1
default_output = "test_results"
dual_logger.print("Running in TEST mode - processing 1 layout image...")
elif args.limit:
limit = args.limit
default_output = f"batch_{limit}_results"
dual_logger.print(f"Processing first {limit} layout images...")
elif args.file:
# Process specific file
specific_file = args.file
if not specific_file.endswith('.jpg'):
specific_file += '.jpg'
limit = 1
default_output = f"file_{specific_file[:-4]}_results"
dual_logger.print(f"Processing specific file: {specific_file}...")
else: # args.all
limit = None
specific_file = None
default_output = "full_results"
dual_logger.print("Processing ALL layout images...")
# Set specific_file to None for other modes
if not args.file:
specific_file = None
# Determine which detector to use
if args.hybrid:
analysis_method = "vector similarity" if args.vector_mode else "local analysis"
dual_logger.print(f"Using HYBRID mode with OpenAI panel counting + {analysis_method}")
# Validate hybrid mode arguments
if args.panel_aware_refinement:
dual_logger.warning("--panel-aware-refinement is not needed in hybrid mode (panel analysis is built-in), ignoring...")
if args.one_at_a_time:
dual_logger.warning("--one-at-a-time is handled automatically in hybrid mode, ignoring...")
if args.openai:
dual_logger.error("--hybrid cannot be used with --openai")
return 1
if args.split and args.split_advanced:
dual_logger.error("Cannot use both --split and --split-advanced at the same time")
return 1
if args.split_simple and not args.hybrid:
dual_logger.error("--split-simple can only be used with --hybrid mode")
return 1
if args.split_simple and (args.split or args.split_advanced):
dual_logger.error("--split-simple cannot be used with --split or --split-advanced")
return 1
# Handle image processing options for hybrid mode
if args.enable_greyscale:
enable_greyscale = True
dual_logger.print("Greyscale processing enabled (override)")
else:
enable_greyscale = False
dual_logger.print("Greyscale processing disabled (default for hybrid mode)")
if args.enable_contrast:
enable_contrast_enhancement = True
dual_logger.print("Contrast enhancement enabled (override)")
else:
enable_contrast_enhancement = False
dual_logger.print("Contrast enhancement disabled (default for hybrid mode)")
# Determine worker counts for hybrid mode
if args.concurrent_workers is not None:
openai_workers = args.concurrent_workers
local_workers = args.concurrent_workers
else:
openai_workers = args.openai_workers
local_workers = args.local_workers
# Initialize hybrid detector
detector = HybridImageDetector(
panel_threshold=args.panel_threshold,
inlier_threshold=args.inlier_threshold,
inlier_ratio_threshold=args.inlier_ratio_threshold,
enable_greyscale=enable_greyscale,
enable_contrast_enhancement=enable_contrast_enhancement,
contrast_factor=args.contrast_factor,
refinement_mode=args.refinement_mode,
openai_workers=openai_workers,
local_workers=local_workers,
split_mode=args.split,
split_advanced=args.split_advanced,
split_simple=args.split_simple,
percentile=args.percentile,
min_gap=args.min_gap,
vector_mode=args.vector_mode,
similarity_threshold=args.similarity_threshold,
fallback_one_at_a_time=args.fallback_one_at_a_time,
parallel_layouts=args.parallel_layouts,
layout_workers=args.layout_workers,
max_concurrent_layouts=args.max_concurrent_layouts,
no_truncation=args.no_truncation
)
# Add hybrid mode suffix to default output name
if not args.output:
default_output += "_hybrid"
if args.panel_threshold != 2:
default_output += f"_threshold{args.panel_threshold}"
if args.refinement_mode:
default_output += "_refined"
if args.split:
default_output += "_split"
if args.split_advanced:
default_output += "_split_advanced"
if args.split_simple:
default_output += "_split_simple"
if args.vector_mode:
default_output += "_vector"
if args.fallback_one_at_a_time:
default_output += "_fallback"
if args.parallel_layouts:
default_output += "_parallel"
output_name = args.output if args.output else default_output
dual_logger.print(f"Results will be saved as: {output_name}.json")
analysis_method = "vector similarity" if args.vector_mode else "local analysis"
dual_logger.print(f"Panel threshold: ≤{args.panel_threshold} panels → {analysis_method}, ≥{args.panel_threshold + 1} panels → split + {analysis_method}")
dual_logger.print(f"Inlier threshold: {args.inlier_threshold}")
dual_logger.print(f"CEN refinement: {'enabled' if args.refinement_mode else 'disabled'}")
dual_logger.print(f"Vector mode: {'enabled' if args.vector_mode else 'disabled'}")
if args.vector_mode:
dual_logger.print(f"Similarity threshold: {args.similarity_threshold}")
dual_logger.print(f"Fallback one-at-a-time: {'enabled' if args.fallback_one_at_a_time else 'disabled'}")
dual_logger.print(f"Parallel layouts: {'enabled' if args.parallel_layouts else 'disabled'}")
if args.parallel_layouts:
dual_logger.print(f"Layout workers: {detector.layout_workers}")
dual_logger.print(f"Max concurrent layouts: {detector.max_concurrent_layouts}")
dual_logger.print(f"OpenAI workers: {detector.openai_workers}")
dual_logger.print(f"Local workers: {detector.local_workers}")
dual_logger.print("-" * 60)
try:
# Use parallel processing if requested
if args.parallel_layouts:
results = detector.process_all_layouts_hybrid_parallel(limit=limit, specific_file=specific_file)
else:
results = detector.process_all_layouts_hybrid(limit=limit, specific_file=specific_file)
output_file = detector.save_results(results, output_name)
summary = detector.generate_summary(results)
dual_logger.print("\n" + "="*60)
dual_logger.print("HYBRID PROCESSING SUMMARY")
dual_logger.print("="*60)
dual_logger.print(f"Total layouts processed: {summary['total_layouts_processed']}")
dual_logger.print(f"Layouts with matches: {summary['layouts_with_matches']}")
dual_logger.print(f"Layouts without matches: {summary['layouts_without_matches']}")
dual_logger.print(f"Local analysis used: {summary['local_analysis_used']} ({summary['local_analysis_percentage']}%)")
dual_logger.print(f"Split + inlier analysis used: {summary['split_analysis_used']} ({summary['split_analysis_percentage']}%)")
dual_logger.print(f"Panel threshold: ≤{summary['panel_threshold']}")
dual_logger.print(f"Inlier threshold: {summary['inlier_threshold']}")
if summary['total_duplicates_removed'] > 0:
dual_logger.print(f"\nDEDUPLICATION RESULTS:")
dual_logger.print(f"Layouts with duplicates removed: {summary['layouts_with_deduplication']}")
dual_logger.print(f"Total duplicate masters removed: {summary['total_duplicates_removed']}")
dual_logger.print(f"Deduplication rate: {summary['deduplication_rate']}%")
if summary['most_used_masters']:
dual_logger.print(f"\nTop {min(10, len(summary['most_used_masters']))} most frequently detected masters:")
for master_id, count in summary['most_used_masters']:
dual_logger.print(f" {master_id}.jpg: {count} times")
dual_logger.print(f"\nFull results saved to: {output_file}")
dual_logger.print("="*60)
# Print cost summary if tracking is enabled
if args.enable_cost_tracking:
cost_calculator.print_cost_summary()
# Generate cost report if requested
if args.cost_report:
cost_report_file = cost_calculator.save_cost_report()
if cost_report_file:
dual_logger.print(f"Cost report saved to: {cost_report_file}")
# Show cost estimate if requested
if args.cost_estimate:
estimate = cost_calculator.estimate_monthly_cost(args.cost_estimate)
if 'error' not in estimate:
dual_logger.print(f"\nMONTHLY COST ESTIMATE")
dual_logger.print(f"Based on {estimate['based_on_layouts']} processed layouts:")
dual_logger.print(f" Average cost per layout: ${estimate['average_cost_per_layout']:.4f}")
dual_logger.print(f" Estimated monthly cost ({estimate['layouts_per_month']} layouts): ${estimate['estimated_monthly_cost']:.2f}")
dual_logger.print(f" Estimated annual cost: ${estimate['estimated_annual_cost']:.2f}")
except KeyboardInterrupt:
dual_logger.print("\n\nProcessing interrupted by user.")
dual_logger.print("Partial results may have been saved automatically.")
except Exception as e:
dual_logger.error(f"\nError during hybrid processing: {e}")
log_exception(logger)
return 1
finally:
detector.cleanup_temp_files()
elif args.vector_mode and not args.hybrid:
dual_logger.print("Using VECTOR EMBEDDING mode with Google Vertex AI")
# Validate vector mode arguments
if args.refinement_mode:
dual_logger.warning("--refinement-mode is not supported in vector mode, ignoring...")
if args.one_at_a_time:
dual_logger.warning("--one-at-a-time is not applicable in vector mode, ignoring...")
if args.panel_aware_refinement:
dual_logger.error("--panel-aware-refinement is only supported with --openai --one-at-a-time mode")
return 1
if args.split_advanced:
dual_logger.warning("--split-advanced is only supported in hybrid mode, ignoring...")
if not args.no_greyscale or not args.no_contrast or args.contrast_factor != 1.5:
dual_logger.warning("Image processing options (greyscale, contrast) are not used in vector mode...")
# Initialize vector detector
detector = VectorImageDetector(
similarity_threshold=args.similarity_threshold,
splitting_mode=args.splitting_mode,
min_crop_size=args.min_crop_size,
crop_padding=args.crop_padding,
split_mode=args.split
)
# Add vector mode suffix to default output name
if not args.output:
default_output += "_vector"
if args.splitting_mode != "none":
default_output += f"_{args.splitting_mode}"
if args.similarity_threshold != 0.75:
default_output += f"_thresh{args.similarity_threshold}"
if args.split:
default_output += "_split"
output_name = args.output if args.output else default_output
dual_logger.print(f"Results will be saved as: {output_name}.json")
dual_logger.print(f"Similarity threshold: {args.similarity_threshold}")
dual_logger.print(f"Splitting mode: {args.splitting_mode}")
if args.splitting_mode != "none":
dual_logger.print(f"Min crop size: {args.min_crop_size}px, Crop padding: {args.crop_padding}px")
dual_logger.print("-" * 60)
try:
results = detector.process_all_layouts_vector(limit=limit, specific_file=specific_file)
output_file = detector.save_results(results, output_name)
summary = detector.generate_summary(results)
dual_logger.print("\n" + "="*60)
dual_logger.print("VECTOR PROCESSING SUMMARY")
dual_logger.print("="*60)
dual_logger.print(f"Total layouts processed: {summary['total_layouts_processed']}")
dual_logger.print(f"Layouts with matches: {summary['layouts_with_matches']}")
dual_logger.print(f"Layouts without matches: {summary['layouts_without_matches']}")
dual_logger.print(f"Similarity threshold: {summary['similarity_threshold']}")
dual_logger.print(f"Embedding dimensions: {summary['embedding_dimensions']}")
if summary['most_used_masters']:
dual_logger.print(f"\nTop {min(10, len(summary['most_used_masters']))} most frequently detected masters:")
for master_id, count in summary['most_used_masters']:
dual_logger.print(f" {master_id}.jpg: {count} times")
dual_logger.print(f"\nFull results saved to: {output_file}")
dual_logger.print("="*60)
except KeyboardInterrupt:
dual_logger.print("\n\nProcessing interrupted by user.")
dual_logger.print("Partial results may have been saved automatically.")
except Exception as e:
dual_logger.error(f"\nError during vector processing: {e}")
log_exception(logger)
return 1
elif args.openai:
dual_logger.print("Using OPENAI LLM mode with o3 model")
# Validate panel-aware refinement requirements
if args.panel_aware_refinement:
if not args.one_at_a_time:
dual_logger.error("--panel-aware-refinement requires --one-at-a-time mode")
return 1
dual_logger.print("Panel-aware refinement ENABLED - will count panels and refine matches using OpenCV inlier analysis")
if args.split_advanced:
dual_logger.warning("--split-advanced is only supported in hybrid mode, ignoring...")
# Initialize OpenAI detector with image processing settings
detector = OpenAIImageDetector(
enable_greyscale=not args.no_greyscale,
enable_contrast_enhancement=not args.no_contrast,
contrast_factor=args.contrast_factor,
refinement_mode=args.refinement_mode,
one_at_a_time_mode=args.one_at_a_time,
max_concurrent_workers=args.concurrent_workers,
panel_aware_refinement=args.panel_aware_refinement,
split_mode=args.split
)
# Add mode suffix to default output name
if not args.output:
default_output += "_openai"
if args.one_at_a_time:
default_output += "_one_at_a_time"
if args.panel_aware_refinement:
default_output += "_panel_aware"
if args.split:
default_output += "_split"
output_name = args.output if args.output else default_output
dual_logger.print(f"Results will be saved as: {output_name}.json")
if args.one_at_a_time:
dual_logger.print(f"Concurrent processes for one-at-a-time mode: {args.concurrent_workers}")
if args.concurrent_workers and args.concurrent_workers > 10:
dual_logger.print("WARNING: High concurrency (>10) may cause API rate limits!")
dual_logger.print("Recommended range: 3-8 processes for stable performance.")
dual_logger.print("-" * 60)
try:
results = detector.process_all_layouts(limit=limit, specific_file=specific_file)
output_file = detector.save_results(results, output_name)
summary = detector.generate_summary(results)
dual_logger.print("\n" + "="*60)
dual_logger.print("OPENAI PROCESSING SUMMARY")
dual_logger.print("="*60)
dual_logger.print(f"Total layouts processed: {summary['total_layouts_processed']}")
dual_logger.print(f"Layouts with matches: {summary['layouts_with_matches']}")
dual_logger.print(f"Layouts without matches: {summary['layouts_without_matches']}")
dual_logger.print(f"Provider: {summary['provider']}")
dual_logger.print(f"Model: {summary['model']}")
if 'total_duplicates_removed' in summary and summary['total_duplicates_removed'] > 0:
dual_logger.print(f"\nDEDUPLICATION RESULTS:")
dual_logger.print(f"Layouts with duplicates removed: {summary['layouts_with_deduplication']}")
dual_logger.print(f"Total duplicate masters removed: {summary['total_duplicates_removed']}")
dual_logger.print(f"Deduplication rate: {summary['deduplication_rate']}%")
if summary['most_used_masters']:
dual_logger.print(f"\nTop {min(10, len(summary['most_used_masters']))} most frequently detected masters:")
for master_id, count in summary['most_used_masters']:
dual_logger.print(f" {master_id}.jpg: {count} times")
dual_logger.print(f"\nFull results saved to: {output_file}")
dual_logger.print("="*60)
except KeyboardInterrupt:
dual_logger.print("\n\nProcessing interrupted by user.")
dual_logger.print("Partial results may have been saved automatically.")
except Exception as e:
dual_logger.error(f"\nError during OpenAI processing: {e}")
log_exception(logger)
return 1
finally:
detector.cleanup_temp_files()
else:
dual_logger.print("Using GEMINI LLM mode")
# Validate panel-aware refinement requirements
if args.panel_aware_refinement:
dual_logger.error("--panel-aware-refinement is only supported with --openai mode")
return 1
if args.split_advanced:
dual_logger.warning("--split-advanced is only supported in hybrid mode, ignoring...")
# Initialize Gemini detector with image processing settings
detector = ImageDetector(
enable_greyscale=not args.no_greyscale,
enable_contrast_enhancement=not args.no_contrast,
contrast_factor=args.contrast_factor,
refinement_mode=args.refinement_mode,
one_at_a_time_mode=args.one_at_a_time,
max_concurrent_workers=args.concurrent_workers,
split_mode=args.split
)
# Add mode suffix to default output name
if not args.output:
if args.one_at_a_time:
default_output += "_one_at_a_time"
if args.refinement_mode:
default_output += "_refined"
if args.split:
default_output += "_split"
output_name = args.output if args.output else default_output
dual_logger.print(f"Results will be saved as: {output_name}.json")
if args.one_at_a_time:
dual_logger.print(f"Concurrent processes for one-at-a-time mode: {args.concurrent_workers}")
if args.concurrent_workers and args.concurrent_workers > 10:
dual_logger.print("WARNING: High concurrency (>10) may cause API rate limits!")
dual_logger.print("Recommended range: 3-8 processes for stable performance.")
dual_logger.print("-" * 60)
try:
results = detector.process_all_layouts(limit=limit, specific_file=specific_file)
output_file = detector.save_results(results, output_name)
summary = detector.generate_summary(results)
dual_logger.print("\n" + "="*60)
dual_logger.print("GEMINI PROCESSING SUMMARY")
dual_logger.print("="*60)
dual_logger.print(f"Total layouts processed: {summary['total_layouts_processed']}")
dual_logger.print(f"Layouts with matches: {summary['layouts_with_matches']}")
dual_logger.print(f"Layouts without matches: {summary['layouts_without_matches']}")
if 'total_duplicates_removed' in summary and summary['total_duplicates_removed'] > 0:
dual_logger.print(f"\nDEDUPLICATION RESULTS:")
dual_logger.print(f"Layouts with duplicates removed: {summary['layouts_with_deduplication']}")
dual_logger.print(f"Total duplicate masters removed: {summary['total_duplicates_removed']}")
dual_logger.print(f"Deduplication rate: {summary['deduplication_rate']}%")
if summary['most_used_masters']:
dual_logger.print(f"\nTop {min(10, len(summary['most_used_masters']))} most frequently detected masters:")
for master_id, count in summary['most_used_masters']:
dual_logger.print(f" {master_id}.jpg: {count} times")
dual_logger.print(f"\nFull results saved to: {output_file}")
dual_logger.print("="*60)
except KeyboardInterrupt:
dual_logger.print("\n\nProcessing interrupted by user.")
dual_logger.print("Partial results may have been saved automatically.")
except Exception as e:
dual_logger.error(f"\nError during Gemini processing: {e}")
log_exception(logger)
return 1
finally:
detector.cleanup_temp_files()
return 0
except Exception as e:
dual_logger.error(f"Unexpected error: {e}")
log_exception(logger)
return 1
if __name__ == "__main__":
# Required for multiprocessing on macOS and Windows
multiprocessing.set_start_method('spawn', force=True)
exit(main())

440
cost_calculator.py Normal file
View file

@ -0,0 +1,440 @@
#!/usr/bin/env python3
"""
Cost Calculator Module
Tracks OpenAI API usage and calculates costs for the master image detection application
"""
import json
import time
from datetime import datetime
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, asdict
from pathlib import Path
@dataclass
class TokenUsage:
"""Data class to track token usage for a single API call"""
prompt_tokens: int
completion_tokens: int
total_tokens: int
cached_tokens: int = 0
def __post_init__(self):
"""Validate token counts"""
if self.prompt_tokens < 0 or self.completion_tokens < 0:
raise ValueError("Token counts cannot be negative")
if self.total_tokens != self.prompt_tokens + self.completion_tokens:
raise ValueError("Total tokens must equal prompt + completion tokens")
@dataclass
class ApiCallCost:
"""Data class to track cost information for a single API call"""
operation_type: str
timestamp: str
token_usage: TokenUsage
input_cost: float
output_cost: float
cached_cost: float
total_cost: float
layout_name: str = ""
master_id: str = ""
def to_dict(self) -> Dict:
"""Convert to dictionary for JSON serialization"""
return asdict(self)
@dataclass
class LayoutCostSummary:
"""Data class to track cost summary for a single layout"""
layout_name: str
total_cost: float
total_input_tokens: int
total_output_tokens: int
total_cached_tokens: int
api_calls_made: int
operation_types: List[str]
processing_time: float = 0.0
detected_masters: List[str] = None
def __post_init__(self):
if self.detected_masters is None:
self.detected_masters = []
def to_dict(self) -> Dict:
"""Convert to dictionary for JSON serialization"""
return asdict(self)
class CostCalculator:
"""
Main cost calculator class for tracking OpenAI API usage and costs
"""
# OpenAI o3 pricing as of 2025
INPUT_COST_PER_MILLION = 2.00
CACHED_INPUT_COST_PER_MILLION = 0.50
OUTPUT_COST_PER_MILLION = 8.00
def __init__(self, enable_tracking: bool = True):
"""
Initialize the cost calculator
Args:
enable_tracking: Whether to enable cost tracking (default: True)
"""
self.enable_tracking = enable_tracking
self.api_calls: List[ApiCallCost] = []
self.layout_costs: Dict[str, LayoutCostSummary] = {}
self.session_start_time = time.time()
# Session totals
self.total_input_tokens = 0
self.total_output_tokens = 0
self.total_cached_tokens = 0
self.total_cost = 0.0
self.total_api_calls = 0
# Only print initialization message once and only in main process
import multiprocessing
if multiprocessing.current_process().name == 'MainProcess':
if not hasattr(CostCalculator, '_main_process_initialized'):
CostCalculator._main_process_initialized = True
print(f"Cost Calculator initialized (tracking: {'enabled' if enable_tracking else 'disabled'})")
if enable_tracking:
print(f"Current OpenAI o3 pricing:")
print(f" Input tokens: ${self.INPUT_COST_PER_MILLION:.2f} per million")
print(f" Cached input: ${self.CACHED_INPUT_COST_PER_MILLION:.2f} per million")
print(f" Output tokens: ${self.OUTPUT_COST_PER_MILLION:.2f} per million")
def calculate_cost(self, prompt_tokens: int, completion_tokens: int, cached_tokens: int = 0) -> Tuple[float, float, float, float]:
"""
Calculate cost for a single API call
Args:
prompt_tokens: Number of input tokens
completion_tokens: Number of output tokens
cached_tokens: Number of cached input tokens
Returns:
Tuple of (input_cost, output_cost, cached_cost, total_cost)
"""
if not self.enable_tracking:
return 0.0, 0.0, 0.0, 0.0
# Calculate costs
input_cost = (prompt_tokens * self.INPUT_COST_PER_MILLION) / 1_000_000
output_cost = (completion_tokens * self.OUTPUT_COST_PER_MILLION) / 1_000_000
cached_cost = (cached_tokens * self.CACHED_INPUT_COST_PER_MILLION) / 1_000_000
total_cost = input_cost + output_cost + cached_cost
return input_cost, output_cost, cached_cost, total_cost
def track_api_call(self, operation_type: str, prompt_tokens: int, completion_tokens: int,
cached_tokens: int = 0, layout_name: str = "", master_id: str = "") -> ApiCallCost:
"""
Track a single API call and calculate its cost
Args:
operation_type: Type of operation (e.g., 'panel_counting', 'detection', 'fallback')
prompt_tokens: Number of input tokens
completion_tokens: Number of output tokens
cached_tokens: Number of cached input tokens
layout_name: Name of the layout being processed
master_id: ID of the master image (if applicable)
Returns:
ApiCallCost object with tracking information
"""
if not self.enable_tracking:
# Return dummy cost object when tracking is disabled
return ApiCallCost(
operation_type=operation_type,
timestamp=datetime.now().isoformat(),
token_usage=TokenUsage(0, 0, 0, 0),
input_cost=0.0,
output_cost=0.0,
cached_cost=0.0,
total_cost=0.0,
layout_name=layout_name,
master_id=master_id
)
# Create token usage object
token_usage = TokenUsage(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens + completion_tokens,
cached_tokens=cached_tokens
)
# Calculate costs
input_cost, output_cost, cached_cost, total_cost = self.calculate_cost(
prompt_tokens, completion_tokens, cached_tokens
)
# Create cost tracking object
api_call_cost = ApiCallCost(
operation_type=operation_type,
timestamp=datetime.now().isoformat(),
token_usage=token_usage,
input_cost=input_cost,
output_cost=output_cost,
cached_cost=cached_cost,
total_cost=total_cost,
layout_name=layout_name,
master_id=master_id
)
# Add to tracking
self.api_calls.append(api_call_cost)
# Update session totals
self.total_input_tokens += prompt_tokens
self.total_output_tokens += completion_tokens
self.total_cached_tokens += cached_tokens
self.total_cost += total_cost
self.total_api_calls += 1
# Update layout-specific tracking
if layout_name:
self._update_layout_cost(layout_name, api_call_cost)
return api_call_cost
def _update_layout_cost(self, layout_name: str, api_call_cost: ApiCallCost):
"""Update cost tracking for a specific layout"""
if layout_name not in self.layout_costs:
self.layout_costs[layout_name] = LayoutCostSummary(
layout_name=layout_name,
total_cost=0.0,
total_input_tokens=0,
total_output_tokens=0,
total_cached_tokens=0,
api_calls_made=0,
operation_types=[]
)
layout_summary = self.layout_costs[layout_name]
layout_summary.total_cost += api_call_cost.total_cost
layout_summary.total_input_tokens += api_call_cost.token_usage.prompt_tokens
layout_summary.total_output_tokens += api_call_cost.token_usage.completion_tokens
layout_summary.total_cached_tokens += api_call_cost.token_usage.cached_tokens
layout_summary.api_calls_made += 1
if api_call_cost.operation_type not in layout_summary.operation_types:
layout_summary.operation_types.append(api_call_cost.operation_type)
def get_layout_cost_breakdown(self, layout_name: str) -> Optional[Dict]:
"""
Get detailed cost breakdown for a specific layout
Args:
layout_name: Name of the layout
Returns:
Dictionary with cost breakdown or None if layout not found
"""
if not self.enable_tracking or layout_name not in self.layout_costs:
return None
layout_summary = self.layout_costs[layout_name]
return {
'layout_name': layout_name,
'total_cost': round(layout_summary.total_cost, 4),
'cost_breakdown': {
'input_tokens': layout_summary.total_input_tokens,
'output_tokens': layout_summary.total_output_tokens,
'cached_tokens': layout_summary.total_cached_tokens,
'api_calls_made': layout_summary.api_calls_made,
'operation_types': layout_summary.operation_types
},
'cost_per_token': {
'input': round(layout_summary.total_cost / max(layout_summary.total_input_tokens, 1) * 1000, 4),
'output': round(layout_summary.total_cost / max(layout_summary.total_output_tokens, 1) * 1000, 4)
}
}
def get_session_summary(self) -> Dict:
"""
Get summary of costs for the entire session
Returns:
Dictionary with session cost summary
"""
if not self.enable_tracking:
return {
'tracking_enabled': False,
'message': 'Cost tracking is disabled'
}
session_duration = time.time() - self.session_start_time
layouts_processed = len(self.layout_costs)
# Calculate averages
avg_cost_per_layout = self.total_cost / max(layouts_processed, 1)
avg_tokens_per_layout = (self.total_input_tokens + self.total_output_tokens) / max(layouts_processed, 1)
avg_api_calls_per_layout = self.total_api_calls / max(layouts_processed, 1)
# Calculate cost efficiency
total_tokens = self.total_input_tokens + self.total_output_tokens
cost_per_thousand_tokens = (self.total_cost / max(total_tokens, 1)) * 1000
# Operation type breakdown
operation_counts = {}
for api_call in self.api_calls:
op_type = api_call.operation_type
operation_counts[op_type] = operation_counts.get(op_type, 0) + 1
return {
'tracking_enabled': True,
'session_totals': {
'total_cost': round(self.total_cost, 4),
'total_input_tokens': self.total_input_tokens,
'total_output_tokens': self.total_output_tokens,
'total_cached_tokens': self.total_cached_tokens,
'total_api_calls': self.total_api_calls,
'layouts_processed': layouts_processed,
'session_duration_minutes': round(session_duration / 60, 2)
},
'averages': {
'cost_per_layout': round(avg_cost_per_layout, 4),
'tokens_per_layout': round(avg_tokens_per_layout, 1),
'api_calls_per_layout': round(avg_api_calls_per_layout, 1),
'cost_per_thousand_tokens': round(cost_per_thousand_tokens, 4)
},
'operation_breakdown': operation_counts,
'pricing_info': {
'input_cost_per_million': self.INPUT_COST_PER_MILLION,
'output_cost_per_million': self.OUTPUT_COST_PER_MILLION,
'cached_input_cost_per_million': self.CACHED_INPUT_COST_PER_MILLION
}
}
def save_cost_report(self, filename: str = None) -> str:
"""
Save detailed cost report to JSON file
Args:
filename: Output filename (optional)
Returns:
Path to saved file
"""
if not self.enable_tracking:
print("Cost tracking is disabled, no report to save")
return ""
if filename is None:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"cost_report_{timestamp}.json"
# Ensure .json extension
if not filename.endswith('.json'):
filename += '.json'
output_path = Path("results") / filename
output_path.parent.mkdir(exist_ok=True)
report_data = {
'session_summary': self.get_session_summary(),
'layout_costs': {name: summary.to_dict() for name, summary in self.layout_costs.items()},
'detailed_api_calls': [call.to_dict() for call in self.api_calls],
'generated_at': datetime.now().isoformat(),
'pricing_model': 'OpenAI o3'
}
with open(output_path, 'w') as f:
json.dump(report_data, f, indent=2)
print(f"Cost report saved to: {output_path}")
return str(output_path)
def print_cost_summary(self):
"""Print a formatted cost summary to console"""
if not self.enable_tracking:
print("Cost tracking is disabled")
return
summary = self.get_session_summary()
print("\n" + "="*60)
print("COST TRACKING SUMMARY")
print("="*60)
session = summary['session_totals']
averages = summary['averages']
print(f"Total cost: ${session['total_cost']:.4f}")
print(f"Total tokens: {session['total_input_tokens'] + session['total_output_tokens']:,}")
print(f" - Input tokens: {session['total_input_tokens']:,}")
print(f" - Output tokens: {session['total_output_tokens']:,}")
print(f" - Cached tokens: {session['total_cached_tokens']:,}")
print(f"Total API calls: {session['total_api_calls']}")
print(f"Layouts processed: {session['layouts_processed']}")
print(f"\nAverages:")
print(f" - Cost per layout: ${averages['cost_per_layout']:.4f}")
print(f" - Tokens per layout: {averages['tokens_per_layout']:.1f}")
print(f" - API calls per layout: {averages['api_calls_per_layout']:.1f}")
print(f" - Cost per 1K tokens: ${averages['cost_per_thousand_tokens']:.4f}")
if summary['operation_breakdown']:
print(f"\nOperation breakdown:")
for op_type, count in summary['operation_breakdown'].items():
print(f" - {op_type}: {count} calls")
print("="*60)
def estimate_monthly_cost(self, layouts_per_month: int = 300) -> Dict:
"""
Estimate monthly cost based on current usage patterns
Args:
layouts_per_month: Estimated number of layouts to process per month
Returns:
Dictionary with cost estimates
"""
if not self.enable_tracking or len(self.layout_costs) == 0:
return {'error': 'No cost data available for estimation'}
avg_cost_per_layout = self.total_cost / len(self.layout_costs)
estimated_monthly_cost = avg_cost_per_layout * layouts_per_month
return {
'average_cost_per_layout': round(avg_cost_per_layout, 4),
'layouts_per_month': layouts_per_month,
'estimated_monthly_cost': round(estimated_monthly_cost, 2),
'estimated_annual_cost': round(estimated_monthly_cost * 12, 2),
'based_on_layouts': len(self.layout_costs)
}
def extract_token_usage_from_response(response) -> TokenUsage:
"""
Extract token usage from OpenAI API response
Args:
response: OpenAI API response object
Returns:
TokenUsage object with extracted token counts
"""
if not hasattr(response, 'usage') or response.usage is None:
# Fallback if usage information is not available
return TokenUsage(prompt_tokens=0, completion_tokens=0, total_tokens=0, cached_tokens=0)
usage = response.usage
return TokenUsage(
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
total_tokens=usage.total_tokens,
cached_tokens=getattr(usage, 'cached_tokens', 0)
)
# Global cost calculator instance (can be configured from CLI)
cost_calculator = CostCalculator(enable_tracking=False) # Disabled by default

View file

@ -0,0 +1,311 @@
import sys
import subprocess
import tkinter as tk
from tkinter import messagebox, filedialog, scrolledtext, ttk
import json
import csv
import threading
import time
import os
try:
import cv2
import numpy as np
from PIL import Image, ImageDraw, ImageFont
except ImportError:
root = tk.Tk()
root.withdraw()
if messagebox.askyesno("Dependency Error", "Required libraries are not installed. Would you like to try and install them now?"):
try:
subprocess.check_call([sys.executable, "-m", "pip", "install", "opencv-python", "numpy", "Pillow"])
messagebox.showinfo("Success", "Dependencies installed successfully. Please restart the application.")
except Exception as e:
messagebox.showerror("Installation Failed", f"Could not install dependencies. Please run 'pip install -r requirements.txt' manually.\n\nError: {e}")
root.destroy()
sys.exit()
class ToolTip:
def __init__(self, widget, text):
self.widget = widget
self.text = text
self.tooltip = None
self.widget.bind("<Enter>", self.enter)
self.widget.bind("<Leave>", self.leave)
def enter(self, event=None):
x, y, _, _ = self.widget.bbox("insert")
x += self.widget.winfo_rootx() + 25
y += self.widget.winfo_rooty() + 25
self.tooltip = tk.Toplevel(self.widget)
self.tooltip.wm_overrideredirect(True)
self.tooltip.wm_geometry(f"+{x}+{y}")
label = tk.Label(self.tooltip, text=self.text, background="#ffffe0", relief="solid", borderwidth=1, wraplength=200)
label.pack()
def leave(self, event=None):
if self.tooltip:
self.tooltip.destroy()
self.tooltip = None
class MasterImageFinderApp:
def __init__(self, root):
self.root = root
self.root.title("Master Image Finder")
self.root.geometry("800x700")
self.layouts_path = tk.StringVar()
self.masters_path = tk.StringVar()
self.upscale = tk.BooleanVar()
self.denoise = tk.BooleanVar()
self.sharpen = tk.BooleanVar()
self.contrast = tk.BooleanVar()
tk.Label(root, text="Layouts Folder:").pack(pady=5)
tk.Entry(root, textvariable=self.layouts_path, width=100).pack(pady=5)
tk.Button(root, text="Select Layouts Folder", command=self.select_layouts_folder).pack(pady=5)
tk.Label(root, text="Master Images Folder:").pack(pady=5)
tk.Entry(root, textvariable=self.masters_path, width=100).pack(pady=5)
tk.Button(root, text="Select Master Images Folder", command=self.select_masters_folder).pack(pady=5)
enhancement_frame = tk.LabelFrame(root, text="Advanced Enhancement Options", padx=10, pady=10)
enhancement_frame.pack(pady=10, padx=10, fill="x")
upscale_check = tk.Checkbutton(enhancement_frame, text="Smart Upscaling", variable=self.upscale)
upscale_check.grid(row=0, column=0, sticky="w")
ToolTip(upscale_check, "Enlarges small images to improve feature detection. Best for images under 400x400px.")
denoise_check = tk.Checkbutton(enhancement_frame, text="Denoising", variable=self.denoise)
denoise_check.grid(row=0, column=1, sticky="w", padx=10)
ToolTip(denoise_check, "Removes digital noise and compression artifacts. Can be slow on large images.")
sharpen_check = tk.Checkbutton(enhancement_frame, text="Sharpening", variable=self.sharpen)
sharpen_check.grid(row=1, column=0, sticky="w")
ToolTip(sharpen_check, "Enhances edges and fine details. Very fast.")
contrast_check = tk.Checkbutton(enhancement_frame, text="Contrast Enhancement", variable=self.contrast)
contrast_check.grid(row=1, column=1, sticky="w", padx=10)
ToolTip(contrast_check, "Improves local contrast, making features in dark or washed-out areas more distinct.")
self.run_button = tk.Button(root, text="Find Matches", command=self.run_finder_thread)
self.run_button.pack(pady=20)
self.progress = ttk.Progressbar(root, orient="horizontal", length=780, mode="determinate")
self.progress.pack(pady=10)
self.log_area = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=100, height=15)
self.log_area.pack(pady=10, padx=10)
def select_layouts_folder(self):
self.layouts_path.set(filedialog.askdirectory())
def select_masters_folder(self):
self.masters_path.set(filedialog.askdirectory())
def log(self, message):
self.root.after(0, self._log, message)
def _log(self, message):
self.log_area.insert(tk.END, message + "\n")
self.log_area.see(tk.END)
def update_progress(self, value):
self.root.after(0, self.progress.config, {'value': value})
def run_finder_thread(self):
self.run_button.config(state=tk.DISABLED)
self.log_area.delete(1.0, tk.END)
self.progress['value'] = 0
thread = threading.Thread(target=self.run_finder)
thread.start()
def run_finder(self):
start_time = time.time()
layouts_dir = self.layouts_path.get()
masters_dir = self.masters_path.get()
if not layouts_dir or not masters_dir:
self.log("Error: Please select both folders.")
self.run_button.config(state=tk.NORMAL)
return
output_dir = os.path.join(layouts_dir, "reports")
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
self.log(f"Created output directory at {output_dir}")
try:
results = self.find_matches(layouts_dir, masters_dir)
self.create_html_report(results, output_dir, layouts_dir, masters_dir)
end_time = time.time()
total_matches = sum(1 for item in results if item['found'])
self.log("\n--- Process Complete! ---")
self.log(f"Found matches for {total_matches} out of {len(results)} layout images.")
self.log(f"Total time: {end_time - start_time:.2f} seconds")
self.log(f"Reports saved in: {output_dir}")
except Exception as e:
self.log(f"An error occurred: {e}")
finally:
self.run_button.config(state=tk.NORMAL)
def find_matches(self, layouts_path, masters_path, min_good_matches=10, inlier_threshold_ratio=0.5):
akaze = cv2.AKAZE_create()
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
layout_images = [f for f in os.listdir(layouts_path) if f.endswith(('.png', '.jpg', '.jpeg'))]
master_images = [f for f in os.listdir(masters_path) if f.endswith(('.png', '.jpg', '.jpeg'))]
results = []
master_descriptors = {}
self.log("Preprocessing master images...")
for i, master_image_name in enumerate(master_images):
self.log(f" - Preprocessing {i+1}/{len(master_images)}: {master_image_name}")
master_image_path = os.path.join(masters_path, master_image_name)
master_img = cv2.imread(master_image_path, cv2.IMREAD_GRAYSCALE)
if master_img is None: continue
kp, des = akaze.detectAndCompute(master_img, None)
if des is not None: master_descriptors[master_image_name] = (kp, des)
total_layouts = len(layout_images)
self.log("\nProcessing layout images...")
for i, layout_image_name in enumerate(layout_images):
self.update_progress((i / total_layouts) * 100)
self.log(f" - Processing {i+1}/{total_layouts}: {layout_image_name}")
layout_image_path = os.path.join(layouts_path, layout_image_name)
layout_img_gray = cv2.imread(layout_image_path, cv2.IMREAD_GRAYSCALE)
if layout_img_gray is None:
self.log(f" - Could not read layout image.")
continue
enhancements_applied = []
if self.upscale.get() and (layout_img_gray.shape[0] < 400 or layout_img_gray.shape[1] < 400):
layout_img_gray = cv2.resize(layout_img_gray, (0,0), fx=2.0, fy=2.0, interpolation=cv2.INTER_LANCZOS4)
enhancements_applied.append("Upscaled")
if self.denoise.get():
layout_img_gray = cv2.fastNlMeansDenoising(layout_img_gray, None, 10, 7, 21)
enhancements_applied.append("Denoised")
if self.sharpen.get():
layout_img_gray = cv2.filter2D(layout_img_gray, -1, sharpen_kernel)
enhancements_applied.append("Sharpened")
if self.contrast.get():
layout_img_gray = clahe.apply(layout_img_gray)
enhancements_applied.append("Contrast Enhanced")
if enhancements_applied:
self.log(f" - Applied: {', '.join(enhancements_applied)}")
kp1, des1 = akaze.detectAndCompute(layout_img_gray, None)
if des1 is None:
self.log(f" - No features found.")
results.append({"layout": layout_image_name, "masters": [], "found": False, "enhancements": enhancements_applied})
continue
all_possible_matches = []
for master_image_name, (kp2, des2) in master_descriptors.items():
matches = bf.knnMatch(des1, des2, k=2)
good_matches = [m for m, n in matches if len(matches) > 1 and len(matches[0]) > 1 and m.distance < 0.75 * n.distance]
if len(good_matches) > min_good_matches:
src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
if mask is not None:
inliers = np.sum(mask)
if inliers > min_good_matches:
all_possible_matches.append({"master": master_image_name, "inliers": int(inliers)})
if not all_possible_matches:
results.append({"layout": layout_image_name, "masters": [], "found": False, "enhancements": enhancements_applied})
continue
best_match = max(all_possible_matches, key=lambda x: x['inliers'])
max_inliers = best_match['inliers']
confident_matches = [best_match]
for match in all_possible_matches:
if match != best_match and match['inliers'] > max_inliers * inlier_threshold_ratio:
confident_matches.append(match)
if confident_matches:
self.log(f" - Found {len(confident_matches)} confident master image(s).")
results.append({"layout": layout_image_name, "masters": confident_matches, "found": True, "enhancements": enhancements_applied})
else:
results.append({"layout": layout_image_name, "masters": [], "found": False, "enhancements": enhancements_applied})
self.update_progress(100)
return results
def create_html_report(self, data, output_path, layouts_abs_path, masters_abs_path):
report_path = os.path.join(output_path, 'report.html')
total_matches = sum(1 for item in data if item['found'])
total_layouts = len(data)
html = f"""
<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"><title>Image Match Report</title>
<style>
body {{ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; margin: 20px; background-color: #f9f9f9; }}
.container {{ max-width: 1600px; margin: auto; }}
h1, h2 {{ text-align: center; color: #333; border-bottom: 2px solid #eee; padding-bottom: 10px; }}
.summary {{ text-align: center; margin-bottom: 20px; font-size: 1.2em; }}
.card {{ background: #fff; border-radius: 8px; box-shadow: 0 4px 8px rgba(0,0,0,0.1); margin-bottom: 20px; padding: 20px; }}
.layout-container {{ display: flex; gap: 20px; align-items: flex-start; }}
.layout-box {{ flex: 1; text-align: center; }}
.layout-box img {{ max-width: 400px; max-height: 400px; height: auto; border-radius: 4px; cursor: pointer; }}
.masters-grid {{ flex: 2; display: grid; grid-template-columns: repeat(auto-fill, minmax(150px, 1fr)); gap: 15px; }}
.master-box {{ text-align: center; border: 1px solid #ddd; border-radius: 8px; padding: 10px; }}
.master-box img {{ width: 100%; height: 150px; object-fit: cover; border-radius: 4px; cursor: pointer; }}
.filename {{ white-space: nowrap; overflow: hidden; text-overflow: ellipsis; font-weight: bold; }}
.alternatives {{ font-size: 0.8em; color: #666; margin-top: 5px; }}
.unmatched-grid {{ display: grid; grid-template-columns: repeat(auto-fill, minmax(150px, 1fr)); gap: 15px; }}
.unmatched-card img {{ width: 100%; height: 150px; object-fit: cover; }}
.modal {{ display: none; position: fixed; z-index: 1000; padding-top: 60px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); }}
.modal-content {{ margin: auto; display: block; max-width: 90%; max-height: 90vh; }}
.close {{ position: absolute; top: 15px; right: 35px; color: #f1f1f1; font-size: 40px; font-weight: bold; cursor: pointer; }}
</style>
</head><body><div class="container"><h1>Image Match Report</h1><div class="summary">Found matches for {total_matches} out of {total_layouts} layout images.</div>"""
html += "<h2>Matched Layouts</h2>"
for item in data:
if item['found']:
layout_img_path = os.path.join(layouts_abs_path, item['layout']).replace('\\', '/')
enhancements_str = f"<p class='enhancements'>Enhancements: {', '.join(item['enhancements'])}</p>" if item['enhancements'] else ""
html += f"<div class='card'><div class='layout-container'>"
html += f"<div class='layout-box'><h3>Layout Image</h3><img src='file:///{layout_img_path}' onclick='openModal(this.src)'><p class='filename' title='{item['layout']}'>{item['layout']}</p>{enhancements_str}</div>"
html += "<div class='masters-grid'>"
for master_item in item['masters']:
master_img_path = os.path.join(masters_abs_path, master_item['master']).replace('\\', '/')
html += f"<div class='master-box'><img src='file:///{master_img_path}' onclick='openModal(this.src)'><p class='filename' title='{master_item['master']}'>{master_item['master']}</p><p>({master_item['inliers']} inliers)</p></div>"
html += "</div></div></div>"
html += "<h2>Unmatched Layouts</h2><p style='text-align:center;'>Please review these manually.</p><div class='unmatched-grid'>"
for item in data:
if not item['found']:
layout_img_path = os.path.join(layouts_abs_path, item['layout']).replace('\\', '/')
html += f"<div class='card unmatched-card'><img src='file:///{layout_img_path}' onclick='openModal(this.src)'><p class='filename' title='{item['layout']}'>{item['layout']}</p></div>"
html += """</div></div>
<div id="myModal" class="modal"><span class="close" onclick="closeModal()">&times;</span><img class="modal-content" id="img01"></div>
<script>
var modal = document.getElementById("myModal");
var modalImg = document.getElementById("img01");
function openModal(src) { modal.style.display = "block"; modalImg.src = src; }
function closeModal() { modal.style.display = "none"; }
</script>
</body></html>"""
with open(report_path, 'w', encoding='utf-8') as f:
f.write(html)
self.log(f"HTML report saved to {report_path}")
if __name__ == "__main__":
root = tk.Tk()
app = MasterImageFinderApp(root)
root.mainloop()

61
fix_stalled_processing.py Normal file
View file

@ -0,0 +1,61 @@
#!/usr/bin/env python3
"""
Fix for stalled parallel processing - reduces bottleneck by adjusting worker counts
"""
import argparse
import sys
import os
def main():
print("🔧 Parallel Processing Stall Fix")
print("=" * 50)
print("\n📊 ANALYSIS OF THE STALL:")
print("1. Inlier analysis queue has consistent 3 items (bottleneck)")
print("2. Each inlier analysis takes 60-167 seconds (very slow)")
print("3. 4 layout workers are waiting for 1 serial inlier analysis")
print("4. High swap usage (72.7%) triggering unnecessary memory pressure")
print("5. Memory pressure is reducing workers when it shouldn't")
print("\n🚀 RECOMMENDED SOLUTIONS:")
print("\n1. IMMEDIATE FIX (restart with reduced workers):")
print(" python cli.py --all --hybrid --split-simple --refinement-mode \\")
print(" --inlier-threshold 0.15 --inlier-ratio-threshold 0.2 \\")
print(" --fallback-one-at-a-time --enable-cost-tracking --cost-report \\")
print(" --parallel-layouts --layout-workers 2")
print(" (Reduces from 4 to 2 layout workers to reduce queue pressure)")
print("\n2. CONSERVATIVE FIX (single layout worker):")
print(" python cli.py --all --hybrid --split-simple --refinement-mode \\")
print(" --inlier-threshold 0.15 --inlier-ratio-threshold 0.2 \\")
print(" --fallback-one-at-a-time --enable-cost-tracking --cost-report \\")
print(" --parallel-layouts --layout-workers 1")
print(" (Essentially sequential with queue coordination)")
print("\n3. OPTIMAL FIX (disable parallel layouts for now):")
print(" python cli.py --all --hybrid --split-simple --refinement-mode \\")
print(" --inlier-threshold 0.15 --inlier-ratio-threshold 0.2 \\")
print(" --fallback-one-at-a-time --enable-cost-tracking --cost-report")
print(" (Use original sequential processing - more reliable)")
print("\n💡 TECHNICAL EXPLANATIONS:")
print("- The 72.7% swap usage is not necessarily bad if system is responsive")
print("- Inlier analysis is CPU/memory intensive and benefits from being serial")
print("- Queue bottleneck occurs when producers (layout workers) > consumers (1 inlier worker)")
print("- Each split analysis can take 60-167s, making parallelism counterproductive")
print("\n⚙️ LONG-TERM IMPROVEMENTS IMPLEMENTED:")
print("- More lenient memory pressure thresholds")
print("- Queue pressure detection and automatic worker reduction")
print("- Stall detection with timeout handling")
print("- Better progress monitoring and diagnostics")
print("\n🎯 RECOMMENDATION:")
print("For your current dataset, use option 1 (2 layout workers) or option 3 (sequential).")
print("The parallel implementation works but needs tuning for your specific workload.")
print("\n" + "=" * 50)
if __name__ == "__main__":
main()

1501
gemini_detector.py Normal file

File diff suppressed because it is too large Load diff

2939
hybrid_detector.py Normal file

File diff suppressed because it is too large Load diff

34
image_detector.py Normal file
View file

@ -0,0 +1,34 @@
#!/usr/bin/env python3
"""
Master Image Detection in Layout Images - Modular Version
Uses Google Gemini 2.5 Pro API to detect which master images appear in layout images
This is the main entry point that imports from the refactored modules:
- process_detection.py: Standalone process function for detection
- gemini_detector.py: ImageDetector class using Gemini LLM
- vector_detector.py: VectorImageDetector class using embeddings
- cli.py: Command line interface
For backward compatibility, this module re-exports the main classes and functions.
"""
# Import all components from the refactored modules
from process_detection import process_single_master_detection
from gemini_detector import ImageDetector
from vector_detector import VectorImageDetector
from cli import main, parse_arguments
# Re-export for backward compatibility
__all__ = [
'process_single_master_detection',
'ImageDetector',
'VectorImageDetector',
'main',
'parse_arguments'
]
if __name__ == "__main__":
# Import the main execution from cli module
import multiprocessing
multiprocessing.set_start_method('spawn', force=True)
exit(main())

264
improved_splitting.py Normal file
View file

@ -0,0 +1,264 @@
#!/usr/bin/env python3
"""
Improved horizontal splitting algorithm for fashion layout panels
"""
import cv2
import numpy as np
from pathlib import Path
import os
from scipy.ndimage import gaussian_filter1d
from scipy.signal import find_peaks
def improved_horizontal_splitting(image_path: str, debug=False):
"""
Improved algorithm for horizontal panel detection
Focuses on major structural separators, not text/content details
"""
print(f"\nTesting improved algorithm on: {Path(image_path).name}")
# Load image
img = cv2.imread(image_path)
height, width = img.shape[:2]
print(f"Image dimensions: {width}x{height}")
# Only process wide images
if width <= height * 1.2:
print("Not a wide layout, treating as single panel")
return [{
'bbox': (0, 0, width, height),
'width': width,
'height': height,
'crop_id': "single"
}]
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Method 1: Structural edge detection
# Focus on strong vertical edges that span most of the height
edges = cv2.Canny(gray, 30, 100)
# Create a tall vertical kernel to detect full-height separators
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, height // 3))
vertical_edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, vertical_kernel)
# Get vertical projection of strong edges
edge_projection = np.sum(vertical_edges, axis=0)
# Method 2: Intensity histogram analysis
# Look for consistent dark/light vertical bands
horizontal_hist = np.sum(gray, axis=0)
# Smooth both signals
smoothed_edges = gaussian_filter1d(edge_projection, sigma=15)
smoothed_hist = gaussian_filter1d(horizontal_hist, sigma=15)
# Invert histogram to find valleys (potential separators)
inverted_hist = np.max(smoothed_hist) - smoothed_hist
# Adaptive parameters based on image size
if width < 2000:
# Small images: likely 1-2 panels
min_panel_width = width // 4 # At least 25% of image width per panel
max_panels = 3
elif width < 5000:
# Medium images: likely 2-4 panels
min_panel_width = width // 6 # At least 16% of image width per panel
max_panels = 6
else:
# Large images: multi-panel layouts
min_panel_width = width // 12 # At least 8% of image width per panel
max_panels = 15
print(f"Min panel width: {min_panel_width}px, Max panels: {max_panels}")
# Find separator candidates using both methods
edge_threshold = np.max(smoothed_edges) * 0.4 # Strong edges only
hist_threshold = np.max(inverted_hist) * 0.3 # Significant valleys only
# Edge-based separators
edge_peaks, _ = find_peaks(smoothed_edges,
distance=min_panel_width,
height=edge_threshold,
prominence=np.max(smoothed_edges) * 0.2)
# Histogram-based separators
hist_peaks, _ = find_peaks(inverted_hist,
distance=min_panel_width,
height=hist_threshold,
prominence=np.max(inverted_hist) * 0.15)
print(f"Edge peaks: {len(edge_peaks)}, Histogram peaks: {len(hist_peaks)}")
# Combine and validate separators
all_separators = set(edge_peaks) | set(hist_peaks)
# Filter separators that are too close to image boundaries
boundary_margin = width * 0.05 # 5% margin from edges
valid_separators = [s for s in all_separators
if boundary_margin < s < width - boundary_margin]
# Sort separators
valid_separators = sorted(valid_separators)
# Remove separators that are too close to each other
final_separators = []
for sep in valid_separators:
if not final_separators or sep - final_separators[-1] >= min_panel_width:
final_separators.append(sep)
# Limit to reasonable number of panels
if len(final_separators) >= max_panels:
# Keep only the strongest separators
separator_scores = []
for sep in final_separators:
edge_score = smoothed_edges[sep] if sep < len(smoothed_edges) else 0
hist_score = inverted_hist[sep] if sep < len(inverted_hist) else 0
combined_score = edge_score + hist_score
separator_scores.append((sep, combined_score))
# Sort by score and take top ones
separator_scores.sort(key=lambda x: x[1], reverse=True)
final_separators = [s[0] for s in separator_scores[:max_panels-1]]
final_separators.sort()
print(f"Final separators: {final_separators}")
# Create crops
x_boundaries = [0] + final_separators + [width]
crops = []
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
# Ensure minimum panel width
if x2 - x1 >= min_panel_width:
crops.append({
'bbox': (x1, 0, x2, height),
'width': x2 - x1,
'height': height,
'crop_id': f"panel_{i}"
})
print(f"Generated {len(crops)} panels")
# Debug visualization
if debug:
debug_dir = Path("debug_improved")
debug_dir.mkdir(exist_ok=True)
import matplotlib.pyplot as plt
fig, axes = plt.subplots(4, 1, figsize=(15, 12))
# Original image
axes[0].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
axes[0].set_title("Original Image")
for sep in final_separators:
axes[0].axvline(x=sep, color='red', linewidth=2)
# Edge projection
axes[1].plot(smoothed_edges)
axes[1].set_title("Edge Projection (Smoothed)")
axes[1].axhline(y=edge_threshold, color='red', linestyle='--', alpha=0.7)
for sep in edge_peaks:
axes[1].axvline(x=sep, color='red', alpha=0.7)
# Histogram analysis
axes[2].plot(inverted_hist)
axes[2].set_title("Inverted Histogram (Smoothed)")
axes[2].axhline(y=hist_threshold, color='red', linestyle='--', alpha=0.7)
for sep in hist_peaks:
axes[2].axvline(x=sep, color='blue', alpha=0.7)
# Final result
axes[3].plot(smoothed_edges, label='Edges', alpha=0.7)
axes[3].plot(inverted_hist, label='Histogram', alpha=0.7)
axes[3].set_title("Combined Analysis with Final Separators")
for sep in final_separators:
axes[3].axvline(x=sep, color='red', linewidth=2, label='Final Separator')
axes[3].legend()
plt.tight_layout()
debug_file = debug_dir / f"{Path(image_path).stem}_analysis.png"
plt.savefig(debug_file, dpi=150, bbox_inches='tight')
plt.close()
print(f"Debug visualization saved: {debug_file}")
return crops
def test_improved_algorithm():
"""Test the improved algorithm on various layouts"""
test_cases = [
# Single panels
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6785934.jpg", "expected": 1, "type": "Single"},
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6813573.jpg", "expected": 1, "type": "Single"},
# Double panels
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6785852.jpg", "expected": 2, "type": "Double"},
# 4-panel layouts
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6799150.jpg", "expected": 4, "type": "4-Panel"},
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6813643.jpg", "expected": 4, "type": "4-Panel"},
# Multi-panel layouts
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6791144.jpg", "expected": 8, "type": "Multi-Panel"},
{"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6786505.jpg", "expected": 10, "type": "Multi-Panel"},
]
print("TESTING IMPROVED HORIZONTAL SPLITTING ALGORITHM")
print("="*60)
results = []
crops_dir = Path("improved_crops")
crops_dir.mkdir(exist_ok=True)
for test_case in test_cases:
if not os.path.exists(test_case["path"]):
print(f"⚠️ File not found: {test_case['path']}")
continue
crops = improved_horizontal_splitting(test_case["path"], debug=True)
# Save crop previews
img = cv2.imread(test_case["path"])
base_name = Path(test_case["path"]).stem
for i, crop in enumerate(crops):
x1, y1, x2, y2 = crop['bbox']
cropped = img[y1:y2, x1:x2]
crop_filename = f"{base_name}_improved_crop{i+1:02d}.jpg"
cv2.imwrite(str(crops_dir / crop_filename), cropped)
# Analyze result
detected = len(crops)
expected = test_case["expected"]
accurate = abs(detected - expected) <= 1
status = "" if accurate else ""
print(f"{status} {base_name}: {detected}/{expected} panels ({test_case['type']})")
results.append({
"file": base_name,
"type": test_case["type"],
"expected": expected,
"detected": detected,
"accurate": accurate
})
# Summary
print(f"\n{'='*60}")
print("IMPROVED ALGORITHM SUMMARY")
print(f"{'='*60}")
accurate_count = sum(1 for r in results if r["accurate"])
total_count = len(results)
print(f"Accurate results: {accurate_count}/{total_count} ({accurate_count/total_count*100:.1f}%)")
print(f"Crop previews saved to: {crops_dir}/")
print(f"Debug visualizations saved to: debug_improved/")
if __name__ == "__main__":
test_improved_algorithm()

128
logging_config.py Normal file
View file

@ -0,0 +1,128 @@
#!/usr/bin/env python3
"""
Logging Configuration - Dual output to terminal and file for crash tracking
"""
import logging
import sys
import os
from datetime import datetime
def setup_dual_logging(log_level=logging.INFO):
"""
Configure logging to output to both terminal and file
Args:
log_level: Logging level (default: logging.INFO)
Returns:
logger: Configured logger instance
"""
# Create logger
logger = logging.getLogger('master_adapt_detect')
logger.setLevel(log_level)
# Prevent duplicate handlers if called multiple times
if logger.handlers:
return logger
# Create formatter
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
# Console handler (terminal output)
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(log_level)
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
# File handler (file output)
log_filename = f"master_adapt_detect_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
file_handler = logging.FileHandler(log_filename, mode='w')
file_handler.setLevel(log_level)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
# Log the setup
logger.info(f"Dual logging initialized - Console: {log_level}, File: {log_filename}")
return logger
def log_system_info(logger):
"""Log system information for debugging"""
import platform
import psutil
logger.info("="*60)
logger.info("SYSTEM INFORMATION")
logger.info("="*60)
logger.info(f"Platform: {platform.platform()}")
logger.info(f"Python version: {platform.python_version()}")
logger.info(f"CPU count: {psutil.cpu_count()}")
logger.info(f"Memory: {psutil.virtual_memory().total / (1024**3):.2f} GB total")
logger.info(f"Available memory: {psutil.virtual_memory().available / (1024**3):.2f} GB")
logger.info(f"Process ID: {os.getpid()}")
logger.info("="*60)
def log_exception(logger, exc_info=None):
"""Log exception with full traceback"""
if exc_info is None:
exc_info = sys.exc_info()
logger.error("EXCEPTION OCCURRED", exc_info=exc_info)
# Also log memory usage at time of crash
try:
import psutil
process = psutil.Process()
memory_info = process.memory_info()
system_memory = psutil.virtual_memory()
swap_memory = psutil.swap_memory()
logger.error(f"System memory at crash: {system_memory.percent:.1f}% used, "
f"{system_memory.available / (1024**3):.2f} GB available")
logger.error(f"Swap memory at crash: {swap_memory.percent:.1f}% used, "
f"{swap_memory.used / (1024**3):.2f} GB used")
logger.error(f"Process memory at crash: RSS={memory_info.rss / (1024**3):.2f} GB, "
f"VMS={memory_info.vms / (1024**3):.2f} GB")
logger.error(f"CPU usage at crash: {process.cpu_percent()}%")
except Exception as e:
logger.error(f"Could not get resource usage: {e}")
def log_memory_warning(logger, usage):
"""Log memory warning with details"""
logger.warning(f"MEMORY WARNING: {usage['memory_percent']:.1f}% memory used, "
f"{usage['memory_available_gb']:.1f} GB available")
if usage['swap_percent'] > 0:
logger.warning(f"SWAP WARNING: {usage['swap_percent']:.1f}% swap used, "
f"{usage['swap_used_gb']:.1f} GB used")
class DualLogger:
"""Wrapper class to provide print-like interface with dual logging"""
def __init__(self, logger):
self.logger = logger
def print(self, *args, **kwargs):
"""Print to both terminal and file"""
# Convert args to string like print() would
message = ' '.join(str(arg) for arg in args)
self.logger.info(message)
def error(self, *args, **kwargs):
"""Log error message"""
message = ' '.join(str(arg) for arg in args)
self.logger.error(message)
def warning(self, *args, **kwargs):
"""Log warning message"""
message = ' '.join(str(arg) for arg in args)
self.logger.warning(message)
def debug(self, *args, **kwargs):
"""Log debug message"""
message = ' '.join(str(arg) for arg in args)
self.logger.debug(message)

View file

@ -0,0 +1,444 @@
# Master Adapt Detector Architecture Diagram
This diagram illustrates the architecture and flow of the Master Adapt Detector application, which uses multiple AI models and computer vision techniques to detect master images within layout images.
## High-Level Architecture
```mermaid
graph TB
subgraph "Entry Point"
CLI[cli.py - Command Line Interface]
end
subgraph "Core Detection Engines"
GD[Gemini Detector<br/>gemini_detector.py]
OD[OpenAI Detector<br/>openai_detector.py]
VD[Vector Detector<br/>vector_detector.py]
HD[Hybrid Detector<br/>hybrid_detector.py]
end
subgraph "Panel Splitting System"
PS[Panel Splitter<br/>panel_splitter.py]
AS[Advanced Splitter<br/>advanced_splitter.py]
SS[Simple Splitter<br/>simple_splitter.py]
end
subgraph "Support Systems"
MM[Memory Manager<br/>memory_manager.py]
LC[Logging Config<br/>logging_config.py]
PD[Process Detection<br/>process_detection.py]
end
subgraph "AI APIs"
GEMINI[Google Gemini 2.5 Pro]
OPENAI[OpenAI o3]
VERTEX[Google Vertex AI<br/>Vector Embeddings]
end
subgraph "Computer Vision"
OPENCV[OpenCV<br/>Feature Detection]
AKAZE[AKAZE Features]
RANSAC[RANSAC Homography]
end
subgraph "Data Storage"
MI[Master Images<br/>master_images/]
LI[Layout Images<br/>layouts/]
RES[Results<br/>results/]
EMB[Embeddings Cache<br/>embeddings_cache/]
end
CLI --> GD
CLI --> OD
CLI --> VD
CLI --> HD
HD --> OD
HD --> VD
HD --> PS
HD --> AS
HD --> SS
GD --> GEMINI
OD --> OPENAI
VD --> VERTEX
HD --> OPENCV
HD --> AKAZE
HD --> RANSAC
PS --> OPENCV
AS --> OPENCV
SS --> OPENCV
GD --> MM
OD --> MM
VD --> MM
HD --> MM
CLI --> LC
PD --> LC
GD --> MI
OD --> MI
VD --> MI
HD --> MI
GD --> LI
OD --> LI
VD --> LI
HD --> LI
GD --> RES
OD --> RES
VD --> RES
HD --> RES
VD --> EMB
HD --> EMB
```
## Detailed Application Flow
```mermaid
flowchart TD
START([Application Start]) --> PARSE[Parse CLI Arguments]
PARSE --> MODE{Select Mode}
MODE -->|--hybrid| HYBRID[Hybrid Mode]
MODE -->|--openai| OPENAI_MODE[OpenAI Mode]
MODE -->|--vector-mode| VECTOR_MODE[Vector Mode]
MODE -->|default| GEMINI_MODE[Gemini Mode]
subgraph "Hybrid Mode Processing"
HYBRID --> LOAD_MASTERS[Load Master Images]
LOAD_MASTERS --> INIT_EMBED{Vector Mode?}
INIT_EMBED -->|Yes| GEN_EMBED[Generate Master Embeddings]
INIT_EMBED -->|No| INIT_CV[Initialize OpenCV Components]
GEN_EMBED --> PROCESS_LAYOUT[Process Layout]
INIT_CV --> PROCESS_LAYOUT
PROCESS_LAYOUT --> COUNT_PANELS[Count Panels with OpenAI o3]
COUNT_PANELS --> DETECT_CENSOR[Detect Censorship with OpenAI o3]
DETECT_CENSOR --> PANEL_CHECK{Panel Count ≤ Threshold?}
PANEL_CHECK -->|Yes| LOCAL_ANALYSIS[Local Analysis]
PANEL_CHECK -->|No| SPLIT_ANALYSIS[Split + Analysis]
LOCAL_ANALYSIS --> VECTOR_CHECK{Vector Mode?}
VECTOR_CHECK -->|Yes| VECTOR_SIM[Vector Similarity]
VECTOR_CHECK -->|No| INLIER_ANALYSIS[Inlier Analysis]
SPLIT_ANALYSIS --> SPLIT_PANELS[Split Panels]
SPLIT_PANELS --> SPLIT_VECTOR_CHECK{Vector Mode?}
SPLIT_VECTOR_CHECK -->|Yes| SPLIT_VECTOR[Split + Vector Similarity]
SPLIT_VECTOR_CHECK -->|No| SPLIT_INLIER[Split + Inlier Analysis]
VECTOR_SIM --> APPLY_REFINEMENT
INLIER_ANALYSIS --> APPLY_REFINEMENT
SPLIT_VECTOR --> APPLY_REFINEMENT
SPLIT_INLIER --> APPLY_REFINEMENT
APPLY_REFINEMENT[Apply CEN Refinement] --> DEDUP[Deduplication]
DEDUP --> TRUNCATE[Truncate to Panel Count]
TRUNCATE --> FALLBACK_CHECK{Fallback Enabled?}
FALLBACK_CHECK -->|Yes & Needed| FALLBACK[OpenAI One-at-a-Time Fallback]
FALLBACK_CHECK -->|No| SAVE_RESULTS
FALLBACK --> SAVE_RESULTS[Save Results]
end
subgraph "OpenAI Mode Processing"
OPENAI_MODE --> LOAD_MASTERS_O[Load Master Images]
LOAD_MASTERS_O --> ONE_AT_TIME{One-at-a-Time?}
ONE_AT_TIME -->|Yes| PARALLEL_MASTERS[Parallel Master Processing]
ONE_AT_TIME -->|No| BATCH_PROCESS[Batch Processing]
PARALLEL_MASTERS --> PANEL_AWARE{Panel-Aware Refinement?}
PANEL_AWARE -->|Yes| COUNT_PANELS_O[Count Panels] --> INLIER_REFINE[Inlier Refinement]
PANEL_AWARE -->|No| APPLY_CEN_O[Apply CEN Refinement]
INLIER_REFINE --> APPLY_CEN_O
BATCH_PROCESS --> APPLY_CEN_O
APPLY_CEN_O --> SAVE_RESULTS_O[Save Results]
end
subgraph "Vector Mode Processing"
VECTOR_MODE --> LOAD_MASTERS_V[Load Master Images]
LOAD_MASTERS_V --> GEN_EMBED_V[Generate Master Embeddings]
GEN_EMBED_V --> SPLITTING_CHECK{Splitting Enabled?}
SPLITTING_CHECK -->|Yes| SPLIT_LAYOUT[Split Layout]
SPLITTING_CHECK -->|No| COMPARE_EMBED[Compare Embeddings]
SPLIT_LAYOUT --> COMPARE_SPLITS[Compare Split Embeddings]
COMPARE_SPLITS --> SAVE_RESULTS_V[Save Results]
COMPARE_EMBED --> SAVE_RESULTS_V
end
subgraph "Gemini Mode Processing"
GEMINI_MODE --> LOAD_MASTERS_G[Load Master Images]
LOAD_MASTERS_G --> GEMINI_ONE_AT_TIME{One-at-a-Time?}
GEMINI_ONE_AT_TIME -->|Yes| PARALLEL_MASTERS_G[Parallel Master Processing]
GEMINI_ONE_AT_TIME -->|No| BATCH_PROCESS_G[Batch Processing]
PARALLEL_MASTERS_G --> APPLY_CEN_G[Apply CEN Refinement]
BATCH_PROCESS_G --> APPLY_CEN_G
APPLY_CEN_G --> SAVE_RESULTS_G[Save Results]
end
SAVE_RESULTS --> END([End])
SAVE_RESULTS_O --> END
SAVE_RESULTS_V --> END
SAVE_RESULTS_G --> END
```
## Panel Splitting Architecture
```mermaid
graph TB
subgraph "Panel Splitting System"
INPUT[Layout Image] --> DETECTOR{Splitter Type}
DETECTOR -->|Basic| PANEL_SPLITTER[PanelSplitter]
DETECTOR -->|Advanced| ADVANCED_SPLITTER[AdvancedPanelSplitter]
DETECTOR -->|Simple| SIMPLE_SPLITTER[SimplePanelSplitter]
subgraph "PanelSplitter Methods"
PANEL_SPLITTER --> EDGE_DETECT[Edge Detection]
PANEL_SPLITTER --> CONTOUR_FIND[Contour Finding]
PANEL_SPLITTER --> HIST_ANALYSIS[Histogram Analysis]
PANEL_SPLITTER --> KMEANS[K-Means Clustering]
end
subgraph "AdvancedPanelSplitter Methods"
ADVANCED_SPLITTER --> SOBEL[Sobel Edge Detection]
ADVANCED_SPLITTER --> GUTTER_DETECT[Gutter Detection]
ADVANCED_SPLITTER --> ENERGY_ANALYSIS[Energy Analysis]
ADVANCED_SPLITTER --> PERCENTILE_THRESH[Percentile Thresholding]
end
subgraph "SimplePanelSplitter Methods"
SIMPLE_SPLITTER --> EVEN_SPLIT[Even Division]
SIMPLE_SPLITTER --> PANEL_COUNT[Use Panel Count]
end
EDGE_DETECT --> SPLIT_RESULTS[Split Results]
CONTOUR_FIND --> SPLIT_RESULTS
HIST_ANALYSIS --> SPLIT_RESULTS
KMEANS --> SPLIT_RESULTS
SOBEL --> SPLIT_RESULTS
GUTTER_DETECT --> SPLIT_RESULTS
ENERGY_ANALYSIS --> SPLIT_RESULTS
PERCENTILE_THRESH --> SPLIT_RESULTS
EVEN_SPLIT --> SPLIT_RESULTS
PANEL_COUNT --> SPLIT_RESULTS
end
SPLIT_RESULTS --> INDIVIDUAL_PANELS[Individual Panel Images]
INDIVIDUAL_PANELS --> MATCH_PROCESS[Match Each Panel to Masters]
```
## Memory Management and Multiprocessing
```mermaid
graph TB
subgraph "Memory Management System"
MEMORY_MANAGER[Memory Manager] --> MONITOR[Monitor Usage]
MONITOR --> THRESH_CHECK{Usage > Threshold?}
THRESH_CHECK -->|Yes| THROTTLE[Throttle Processes]
THRESH_CHECK -->|No| CONTINUE[Continue Processing]
THROTTLE --> WAIT[Wait for Memory]
WAIT --> REDUCE_WORKERS[Reduce Worker Count]
REDUCE_WORKERS --> CONTINUE
CONTINUE --> PROCESS_POOL[Process Pool Executor]
PROCESS_POOL --> WORKER1[Worker Process 1]
PROCESS_POOL --> WORKER2[Worker Process 2]
PROCESS_POOL --> WORKERN[Worker Process N]
subgraph "Worker Process"
WORKER1 --> ISOLATED_ENV[Isolated Environment]
ISOLATED_ENV --> LOAD_MODELS[Load Models]
LOAD_MODELS --> PROCESS_TASK[Process Task]
PROCESS_TASK --> CLEANUP[Cleanup]
end
WORKER2 --> ISOLATED_ENV
WORKERN --> ISOLATED_ENV
end
subgraph "Feature Limiting"
PROCESS_TASK --> FEATURE_COUNT[Count Features]
FEATURE_COUNT --> FEATURE_CHECK{Features > Limit?}
FEATURE_CHECK -->|Yes| LIMIT_FEATURES[Limit Features]
FEATURE_CHECK -->|No| PROCEED[Proceed]
LIMIT_FEATURES --> PROCEED
end
```
## Data Flow and Storage
```mermaid
graph LR
subgraph "Input Data"
MI[Master Images<br/>41 images]
LI[Layout Images<br/>299+ images]
end
subgraph "Processing Cache"
TEMP[Temp Processed Images]
EMB_CACHE[Embeddings Cache]
SPLITS[Split Panel Images]
end
subgraph "Output Data"
JSON[JSON Results]
LOGS[Log Files]
DEBUG[Debug Images]
CROPS[Crop Images]
end
MI --> TEMP
LI --> TEMP
TEMP --> EMB_CACHE
TEMP --> SPLITS
EMB_CACHE --> JSON
SPLITS --> JSON
JSON --> LOGS
JSON --> DEBUG
JSON --> CROPS
subgraph "Result Structure"
JSON --> METADATA[Metadata]
JSON --> LAYOUT_RESULTS[Layout Results]
METADATA --> TOTAL_LAYOUTS[Total Layouts]
METADATA --> MASTER_COUNT[Master Count]
METADATA --> PROVIDER[Provider Info]
METADATA --> PROCESSING_MODE[Processing Mode]
LAYOUT_RESULTS --> DETECTED_MASTERS[Detected Masters]
LAYOUT_RESULTS --> ANALYSIS[Analysis Text]
LAYOUT_RESULTS --> CONFIDENCE[Confidence Score]
LAYOUT_RESULTS --> PANEL_INFO[Panel Information]
end
```
## Key Components and Their Roles
### 1. **CLI Interface (`cli.py`)**
- **Purpose**: Command-line interface for the application
- **Features**: Argument parsing, mode selection, batch processing options
- **Modes**: Gemini, OpenAI, Vector, Hybrid
- **Options**: Test mode, batch processing, custom outputs, splitting options
### 2. **Detection Engines**
#### **Hybrid Detector (`hybrid_detector.py`)**
- **Purpose**: Cost-efficient detection combining OpenAI panel counting with local analysis
- **Features**:
- Panel threshold-based routing
- Vector similarity or inlier analysis
- Automatic fallback to OpenAI one-at-a-time
- CEN refinement and deduplication
- **Workflow**: Panel count → Route to local/split analysis → Apply refinements
#### **OpenAI Detector (`openai_detector.py`)**
- **Purpose**: Uses OpenAI o3 model for image matching
- **Features**:
- One-at-a-time processing with multiprocessing
- Panel-aware refinement
- Image preprocessing (greyscale, contrast)
- **API**: OpenAI o3 vision model
#### **Vector Detector (`vector_detector.py`)**
- **Purpose**: Uses Google Vertex AI embeddings for similarity matching
- **Features**:
- 1408-dimensional embeddings
- Cosine similarity matching
- Embedding caching
- **API**: Google Vertex AI Multimodal Embeddings
#### **Gemini Detector (`gemini_detector.py`)**
- **Purpose**: Uses Google Gemini 2.5 Pro for image analysis
- **Features**:
- Batch processing
- Safety settings handling
- Image preprocessing
- **API**: Google Gemini 2.5 Pro
### 3. **Panel Splitting System**
#### **Panel Splitter (`panel_splitter.py`)**
- **Purpose**: Basic multi-method panel splitting
- **Methods**: Edge detection, contour finding, histogram analysis, K-means clustering
#### **Advanced Splitter (`advanced_splitter.py`)**
- **Purpose**: Advanced edge detection and gutter analysis
- **Methods**: Sobel edge detection, energy analysis, percentile thresholding
#### **Simple Splitter (`simple_splitter.py`)**
- **Purpose**: Simple even division based on panel count
- **Methods**: Even division, panel count-based splitting
### 4. **Support Systems**
#### **Memory Manager (`memory_manager.py`)**
- **Purpose**: Prevents memory exhaustion during processing
- **Features**: Memory monitoring, worker throttling, safe execution decorators
#### **Logging Config (`logging_config.py`)**
- **Purpose**: Dual logging to terminal and file
- **Features**: System info logging, exception tracking, memory usage logging
#### **Process Detection (`process_detection.py`)**
- **Purpose**: Standalone functions for multiprocessing
- **Features**: Process isolation, error handling, resource cleanup
### 5. **Key Algorithms**
#### **Inlier Analysis (OpenCV)**
- **Purpose**: Local feature matching using computer vision
- **Algorithm**: AKAZE features → RANSAC homography → Inlier counting
- **Advantage**: No API costs, fast processing
#### **Vector Similarity (Vertex AI)**
- **Purpose**: Semantic similarity using embeddings
- **Algorithm**: Image embeddings → Cosine similarity → Threshold matching
- **Advantage**: Semantic understanding, good for transformed images
#### **Panel Detection (OpenAI o3)**
- **Purpose**: Intelligent panel counting and censorship detection
- **Algorithm**: Vision model analysis → Panel count + censorship status
- **Advantage**: Accurate panel analysis, handles complex layouts
### 6. **Processing Modes**
#### **Hybrid Mode** (Recommended)
- **Strategy**: OpenAI panel counting + local analysis for efficiency
- **Routing**: ≤2 panels → local analysis, ≥3 panels → split + analysis
- **Fallback**: OpenAI one-at-a-time if insufficient matches
- **Cost**: ~1 API call per layout vs ~41 for pure OpenAI
#### **OpenAI Mode**
- **Strategy**: Pure OpenAI o3 processing
- **Options**: Batch or one-at-a-time with panel-aware refinement
- **Cost**: High API usage but highest accuracy
#### **Vector Mode**
- **Strategy**: Pure vector embedding similarity
- **Options**: Splitting modes for multi-panel layouts
- **Cost**: No API costs after embedding generation
#### **Gemini Mode**
- **Strategy**: Google Gemini 2.5 Pro processing
- **Options**: Batch or one-at-a-time processing
- **Cost**: Lower than OpenAI but higher than vector
This architecture provides a flexible, scalable system for master image detection with multiple processing strategies optimized for different use cases and cost requirements.

119
memory_manager.py Normal file
View file

@ -0,0 +1,119 @@
#!/usr/bin/env python3
"""
Memory Manager - Prevents memory exhaustion during concurrent processing
"""
import psutil
import os
import time
import logging
from functools import wraps
class MemoryManager:
"""Memory management utility to prevent system crashes"""
def __init__(self, max_memory_percent=80, max_swap_percent=80):
"""
Initialize memory manager
Args:
max_memory_percent: Maximum memory usage percentage before throttling
max_swap_percent: Maximum swap usage percentage before warning (does not throttle)
"""
self.max_memory_percent = max_memory_percent
self.max_swap_percent = max_swap_percent
self.logger = logging.getLogger('master_adapt_detect')
def get_memory_usage(self):
"""Get current memory and swap usage"""
memory = psutil.virtual_memory()
swap = psutil.swap_memory()
return {
'memory_percent': memory.percent,
'memory_available_gb': memory.available / (1024**3),
'swap_percent': swap.percent,
'swap_used_gb': swap.used / (1024**3)
}
def is_memory_safe(self):
"""Check if memory usage is within safe limits (only RAM, not swap)"""
usage = self.get_memory_usage()
# Warn about swap usage but don't block processing
if usage['swap_percent'] > self.max_swap_percent:
self.logger.warning(f"High swap usage: {usage['swap_percent']:.1f}% - Performance may be degraded")
# Only block processing for high RAM usage
if usage['memory_percent'] > self.max_memory_percent:
self.logger.warning(f"High memory usage: {usage['memory_percent']:.1f}%")
return False
return True
def wait_for_memory_safe(self, timeout=30):
"""Wait for memory to return to safe levels"""
start_time = time.time()
while not self.is_memory_safe():
if time.time() - start_time > timeout:
self.logger.error("Memory did not return to safe levels within timeout")
return False
self.logger.info("Waiting for memory to return to safe levels...")
time.sleep(1)
return True
def limit_concurrent_processes(self, max_processes=None):
"""Calculate safe number of concurrent processes based on memory"""
# Always get available memory for logging
available_gb = psutil.virtual_memory().available / (1024**3)
if max_processes is None:
# Conservative estimate based on available memory
# Assume each process needs ~2GB for feature processing
max_processes = max(1, int(available_gb / 2))
cpu_count = psutil.cpu_count()
# Don't exceed CPU count or memory-based limit
safe_processes = min(max_processes, cpu_count)
self.logger.info(f"Limiting concurrent processes to {safe_processes} (Memory: {available_gb:.1f}GB available)")
return safe_processes
def memory_safe_execution(memory_manager):
"""Decorator to ensure memory-safe execution of functions"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
# Check memory before execution
if not memory_manager.is_memory_safe():
memory_manager.logger.warning("Memory usage high, waiting before execution...")
if not memory_manager.wait_for_memory_safe():
raise MemoryError("Memory usage too high to safely execute function")
try:
return func(*args, **kwargs)
except MemoryError as e:
memory_manager.logger.error(f"Memory error in {func.__name__}: {e}")
raise
finally:
# Force garbage collection
import gc
gc.collect()
return wrapper
return decorator
def reduce_feature_count(features, max_features=10000):
"""Reduce feature count to prevent memory explosion"""
if len(features) > max_features:
# Keep best features based on response strength
import numpy as np
responses = [f.response for f in features]
indices = np.argsort(responses)[-max_features:]
reduced_features = [features[i] for i in indices]
return reduced_features
return features

1605
openai_detector.py Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,240 @@
#!/usr/bin/env python3
"""
Parameter Optimization Script for Advanced Panel Splitting
This script iteratively tests different percentile and min_gap parameters
to find the optimal settings that result in exactly 8 panels for layout 6791346.jpg.
"""
import os
import sys
import cv2
import numpy as np
from pathlib import Path
from PIL import Image
from advanced_splitter import AdvancedPanelSplitter
import itertools
def test_parameters(image_path: str, percentile: float, min_gap: int, target_panels: int = 8, verbose: bool = False) -> dict:
"""
Test a specific combination of parameters and return results
Args:
image_path: Path to the layout image
percentile: Percentile threshold for gutter detection
min_gap: Minimum gap size for gutter detection
target_panels: Target number of panels (default: 8)
verbose: Print detailed information
Returns:
Dict with test results
"""
try:
# Create splitter with test parameters
splitter = AdvancedPanelSplitter(percentile=percentile, min_gap=min_gap, debug=False)
# Load image
img = Image.open(image_path).convert("RGB")
img_gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
# Find boundaries
boundaries = splitter.find_boundaries_auto(img_gray)
panel_count = len(boundaries) - 1 # Number of panels is boundaries - 1
if verbose:
print(f" Percentile: {percentile:5.1f}, Min Gap: {min_gap:2d}{panel_count:2d} panels (boundaries: {boundaries})")
return {
'percentile': percentile,
'min_gap': min_gap,
'panel_count': panel_count,
'boundaries': boundaries,
'exact_match': panel_count == target_panels,
'error': abs(panel_count - target_panels)
}
except Exception as e:
if verbose:
print(f" ERROR with percentile={percentile}, min_gap={min_gap}: {e}")
return {
'percentile': percentile,
'min_gap': min_gap,
'panel_count': 0,
'boundaries': [],
'exact_match': False,
'error': float('inf'),
'exception': str(e)
}
def optimize_parameters(image_path: str, target_panels: int = 8) -> dict:
"""
Optimize parameters to achieve the target number of panels
Args:
image_path: Path to the layout image
target_panels: Target number of panels
Returns:
Dict with optimization results
"""
if not os.path.exists(image_path):
raise FileNotFoundError(f"Image not found: {image_path}")
print(f"Optimizing parameters for {os.path.basename(image_path)}")
print(f"Target panel count: {target_panels}")
print("-" * 60)
# Define parameter ranges to test
percentile_range = list(range(1, 51)) # 1% to 50%
min_gap_range = list(range(1, 21)) # 1 to 20 pixels
# Store all results
all_results = []
exact_matches = []
print("Testing parameter combinations...")
total_combinations = len(percentile_range) * len(min_gap_range)
tested = 0
# Test all combinations
for percentile in percentile_range:
for min_gap in min_gap_range:
tested += 1
if tested % 50 == 0:
print(f" Progress: {tested}/{total_combinations} ({tested/total_combinations*100:.1f}%)")
result = test_parameters(image_path, percentile, min_gap, target_panels)
all_results.append(result)
if result['exact_match']:
exact_matches.append(result)
print(f"\nCompleted testing {total_combinations} parameter combinations")
print(f"Found {len(exact_matches)} exact matches for {target_panels} panels")
# Analysis
if exact_matches:
print("\n" + "="*60)
print("EXACT MATCHES FOUND!")
print("="*60)
print(f"\nAll parameter combinations that produce exactly {target_panels} panels:")
print("Percentile | Min Gap | Boundaries")
print("-" * 40)
for match in exact_matches:
boundaries_str = str(match['boundaries'])
if len(boundaries_str) > 40:
boundaries_str = boundaries_str[:37] + "..."
print(f"{match['percentile']:9.1f} | {match['min_gap']:7d} | {boundaries_str}")
# Find the "best" match (middle percentile value for stability)
exact_matches.sort(key=lambda x: x['percentile'])
middle_idx = len(exact_matches) // 2
best_match = exact_matches[middle_idx]
print(f"\nRECOMMENDED PARAMETERS:")
print(f" Percentile: {best_match['percentile']}")
print(f" Min Gap: {best_match['min_gap']}")
print(f" Result: {best_match['panel_count']} panels")
print(f" Boundaries: {best_match['boundaries']}")
return best_match
else:
print("\n" + "="*60)
print("NO EXACT MATCHES FOUND")
print("="*60)
# Find closest matches
all_results.sort(key=lambda x: x['error'])
closest_matches = [r for r in all_results[:10] if r['error'] == all_results[0]['error']]
print(f"\nClosest matches (error = {all_results[0]['error']}):")
print("Percentile | Min Gap | Panels | Error | Boundaries")
print("-" * 60)
for match in closest_matches:
boundaries_str = str(match['boundaries'])
if len(boundaries_str) > 30:
boundaries_str = boundaries_str[:27] + "..."
print(f"{match['percentile']:9.1f} | {match['min_gap']:7d} | {match['panel_count']:6d} | {match['error']:5.1f} | {boundaries_str}")
best_match = closest_matches[0]
print(f"\nBEST AVAILABLE PARAMETERS:")
print(f" Percentile: {best_match['percentile']}")
print(f" Min Gap: {best_match['min_gap']}")
print(f" Result: {best_match['panel_count']} panels (target: {target_panels})")
print(f" Boundaries: {best_match['boundaries']}")
return best_match
def test_optimized_parameters(image_path: str, percentile: float, min_gap: int):
"""
Test the optimized parameters and show detailed results
"""
print("\n" + "="*60)
print("TESTING OPTIMIZED PARAMETERS")
print("="*60)
# Create splitter with optimized parameters
splitter = AdvancedPanelSplitter(percentile=percentile, min_gap=min_gap, debug=True)
# Test the parameters
result = test_parameters(image_path, percentile, min_gap, target_panels=8, verbose=True)
# Also create the actual splits to verify
print(f"\nCreating actual splits with optimized parameters...")
splits = splitter.split_panels(image_path, target_panel_count=8)
print(f"Successfully created {len(splits)} splits:")
for i, split in enumerate(splits):
bounds = split['bounds']
print(f" Panel {i+1}: bounds=({bounds[0]}, {bounds[1]}, {bounds[2]}, {bounds[3]}), "
f"confidence={split['confidence']:.3f}, method={split['method']}")
return result
def main():
"""Main execution function"""
# Check if layout image exists
layout_path = "layouts/6791346.jpg"
if not os.path.exists(layout_path):
print(f"Error: Layout image not found at {layout_path}")
print("Please ensure the image exists in the layouts directory")
return 1
try:
# Run optimization
best_result = optimize_parameters(layout_path, target_panels=8)
# Test the optimized parameters
test_result = test_optimized_parameters(
layout_path,
best_result['percentile'],
best_result['min_gap']
)
print("\n" + "="*60)
print("FINAL RECOMMENDATION")
print("="*60)
print(f"For layout 6791346.jpg to achieve 8 panels:")
print(f" --percentile {best_result['percentile']}")
print(f" --min-gap {best_result['min_gap']}")
print(f"\nCommand line usage:")
print(f" python cli.py --test --hybrid --split-advanced --percentile {best_result['percentile']} --min-gap {best_result['min_gap']}")
return 0
except Exception as e:
print(f"Error during optimization: {e}")
return 1
if __name__ == "__main__":
exit(main())

857
panel_splitter.py Normal file
View file

@ -0,0 +1,857 @@
#!/usr/bin/env python3
"""
Panel Splitter Module - Multi-method panel splitting for comic/manga layouts
"""
import os
import cv2
import numpy as np
from typing import List, Dict, Tuple, Optional
import json
from pathlib import Path
from scipy import ndimage
from scipy.signal import find_peaks
from sklearn.cluster import KMeans
from skimage.feature import local_binary_pattern
import matplotlib.pyplot as plt
class PanelSplitter:
"""
Multi-method panel splitting class that uses various computer vision techniques
to split multi-panel layouts into individual images, then matches each split
to master images using inlier analysis.
"""
def __init__(self, debug=False):
"""
Initialize the PanelSplitter
Args:
debug (bool): Enable debug mode for visualization
"""
self.debug = debug
self.debug_dir = "debug_splitting"
if self.debug and not os.path.exists(self.debug_dir):
os.makedirs(self.debug_dir)
def split_layout_and_match(self, layout_path: str, master_images: List[str],
detector_instance=None) -> Dict:
"""
Main method to split a layout and match splits to master images
Args:
layout_path (str): Path to the layout image
master_images (List[str]): List of master image paths
detector_instance: The detector instance to use for matching
Returns:
Dict: Detection results with matches from all splits
"""
# Step 1: Get panel count from OpenAI (if detector supports it)
target_panel_count = 1
panel_confidence = "unknown"
if hasattr(detector_instance, 'count_panels_in_layout'):
print(f"Getting panel count for {os.path.basename(layout_path)}...")
panel_result = detector_instance.count_panels_in_layout(layout_path)
target_panel_count = panel_result.get('panel_count', 1)
panel_confidence = panel_result.get('confidence', 'unknown')
print(f"OpenAI detected {target_panel_count} panels (confidence: {panel_confidence})")
# Step 2: Split the layout into individual panels
print(f"Splitting layout with target count: {target_panel_count}")
splits = self.split_panels(layout_path, target_panel_count)
if not splits:
print("No splits detected, returning empty results")
return {
'layout_path': layout_path,
'detected_masters': [],
'panel_count': target_panel_count,
'panel_confidence': panel_confidence,
'split_mode': 'enabled',
'splits_generated': 0
}
print(f"Generated {len(splits)} splits")
# Step 3: Match each split to master images
all_matches = []
split_results = []
for i, split_info in enumerate(splits):
print(f"Processing split {i+1}/{len(splits)}")
# Save split image temporarily for matching
split_image = split_info['image']
temp_split_path = f"/tmp/split_{i}.jpg"
cv2.imwrite(temp_split_path, split_image)
# Match this split to master images using existing inlier analysis
if hasattr(detector_instance, 'match_split_to_masters'):
split_matches = detector_instance.match_split_to_masters(
temp_split_path, master_images
)
else:
# Use basic inlier analysis if method doesn't exist
split_matches = self._match_split_basic(temp_split_path, master_images)
# Add split metadata to matches
for match in split_matches:
match['split_index'] = i
match['split_bounds'] = split_info['bounds']
match['split_confidence'] = split_info['confidence']
all_matches.append(match)
split_results.append({
'split_index': i,
'bounds': split_info['bounds'],
'confidence': split_info['confidence'],
'matches': split_matches
})
# Clean up temporary file
if os.path.exists(temp_split_path):
os.remove(temp_split_path)
# Step 4: Aggregate results
result = {
'layout_path': layout_path,
'detected_masters': [match['master_id'] for match in all_matches],
'panel_count': target_panel_count,
'panel_confidence': panel_confidence,
'split_mode': 'enabled',
'splits_generated': len(splits),
'split_results': split_results,
'all_matches': all_matches
}
# Remove duplicates while preserving highest confidence matches
result = self._deduplicate_matches(result)
return result
def split_panels(self, image_path: str, target_panel_count: int) -> List[Dict]:
"""
Split a layout image into individual panels using multiple methods
Args:
image_path (str): Path to the layout image
target_panel_count (int): Target number of panels to split into
Returns:
List[Dict]: List of split information with image data and metadata
"""
# Load image
image = cv2.imread(image_path)
if image is None:
print(f"Error: Could not load image {image_path}")
return []
height, width = image.shape[:2]
print(f"Image dimensions: {width}x{height}")
# Use only optimized Canny detection method
print("Using optimized Canny detection for panel splitting")
try:
method_results = self._optimized_canny_detection(image, target_panel_count)
if not method_results:
print("Optimized Canny detection failed, falling back to simple division")
return self._fallback_simple_division(image, target_panel_count)
except Exception as e:
print(f"Optimized Canny detection failed: {e}")
return self._fallback_simple_division(image, target_panel_count)
# Use results directly (no consensus needed for single method)
consensus_splits = method_results
# Create split images
splits = []
for i, split_bounds in enumerate(consensus_splits):
x, y, w, h = split_bounds['bounds']
split_image = image[y:y+h, x:x+w]
# Skip extremely small splits (reduced threshold for 14-panel detection)
if w < 20 or h < 20:
continue
splits.append({
'image': split_image,
'bounds': (x, y, w, h),
'confidence': split_bounds['confidence'],
'method_votes': split_bounds.get('method_votes', [])
})
if self.debug:
self._save_debug_visualization(image_path, image, splits)
return splits
def _enhanced_gradient_analysis(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Enhanced gradient peak analysis for panel detection"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Multi-scale gradient analysis
separators = []
scales = [5, 10, 20]
for sigma in scales:
# Smooth the image
smoothed = cv2.GaussianBlur(gray, (0, 0), sigma)
# Calculate vertical gradient (for horizontal separators)
grad_y = cv2.Sobel(smoothed, cv2.CV_64F, 0, 1, ksize=3)
# Project to get horizontal profile
profile = np.mean(np.abs(grad_y), axis=1)
# Find peaks
prominence = np.std(profile) * 0.5
peaks, properties = find_peaks(profile, prominence=prominence, distance=height//target_count//2)
# Add to separators with confidence based on prominence
for peak in peaks:
confidence = properties['prominences'][list(peaks).index(peak)] / np.max(properties['prominences'])
separators.append({
'position': peak,
'confidence': confidence,
'method': 'gradient_analysis',
'scale': sigma
})
# Convert separator positions to bounding boxes
separators.sort(key=lambda x: x['position'])
# Create bounds from separators
bounds = []
prev_y = 0
for sep in separators:
if sep['position'] > prev_y + height // (target_count * 2): # Minimum panel height
bounds.append({
'bounds': (0, prev_y, width, sep['position'] - prev_y),
'confidence': sep['confidence'],
'method': 'gradient_analysis'
})
prev_y = sep['position']
# Add final panel
if prev_y < height - height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method': 'gradient_analysis'
})
return bounds
def _optimized_canny_detection(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Optimized Canny edge detection for panel separators with tuned parameters"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Exact configuration from tuning results that produced 14 panels
threshold_set = [(50, 150), (100, 200), (150, 250)]
morphology_kernel = (3, 1)
hough_threshold = 1324
min_line_length = 3530
max_line_gap = 1059
# Multi-threshold Canny detection
all_edges = []
for low, high in threshold_set:
edges = cv2.Canny(gray, low, high)
# Morphological operations
kernel = np.ones(morphology_kernel, np.uint8)
edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel)
all_edges.append(edges)
# Combine edge maps
combined_edges = np.maximum.reduce(all_edges)
# Find horizontal lines using Hough transform
lines = cv2.HoughLinesP(
combined_edges,
1,
np.pi/180,
threshold=hough_threshold,
minLineLength=min_line_length,
maxLineGap=max_line_gap
)
# Filter for horizontal lines
horizontal_lines = []
if lines is not None:
for line in lines:
x1, y1, x2, y2 = line[0]
if abs(y2 - y1) < height // 20: # Nearly horizontal
horizontal_lines.append({
'y_position': (y1 + y2) // 2,
'length': abs(x2 - x1),
'confidence': min(1.0, abs(x2 - x1) / width)
})
# Sort by y position and create bounds
horizontal_lines.sort(key=lambda x: x['y_position'])
bounds = []
prev_y = 0
for line in horizontal_lines:
y_pos = line['y_position']
# Use the same threshold that worked in debug script
if y_pos > prev_y + height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': line['confidence'],
'method': 'optimized_canny_detection'
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method': 'optimized_canny_detection'
})
return bounds
def _template_matching_method(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Template matching for common panel separators"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Create separator templates
templates = []
# White horizontal line template
white_template = np.ones((5, width//4), dtype=np.uint8) * 255
templates.append(('white_line', white_template))
# Black horizontal line template
black_template = np.zeros((5, width//4), dtype=np.uint8)
templates.append(('black_line', black_template))
# Gutter template (white with black edges)
gutter_template = np.ones((10, width//4), dtype=np.uint8) * 255
gutter_template[0, :] = 0
gutter_template[-1, :] = 0
templates.append(('gutter', gutter_template))
# Find matches for each template
all_matches = []
for template_name, template in templates:
result = cv2.matchTemplate(gray, template, cv2.TM_CCOEFF_NORMED)
# Find good matches
locations = np.where(result >= 0.5)
for y, x in zip(locations[0], locations[1]):
confidence = result[y, x]
all_matches.append({
'y_position': y + template.shape[0] // 2,
'confidence': confidence,
'template': template_name
})
# Sort by y position and remove duplicates
all_matches.sort(key=lambda x: x['y_position'])
# Merge nearby matches
merged_matches = []
for match in all_matches:
if not merged_matches or match['y_position'] - merged_matches[-1]['y_position'] > height // (target_count * 2):
merged_matches.append(match)
else:
# Keep the one with higher confidence
if match['confidence'] > merged_matches[-1]['confidence']:
merged_matches[-1] = match
# Create bounds
bounds = []
prev_y = 0
for match in merged_matches:
y_pos = match['y_position']
if y_pos > prev_y + height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': match['confidence'],
'method': 'template_matching'
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method': 'template_matching'
})
return bounds
def _contour_analysis_method(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Contour-based panel detection"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Find contours
edges = cv2.Canny(gray, 50, 150)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Filter for rectangular contours
panel_candidates = []
for contour in contours:
# Approximate contour to polygon
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
# Check if it's roughly rectangular (4 corners)
if len(approx) >= 4:
x, y, w, h = cv2.boundingRect(contour)
# Filter by size and aspect ratio
if w > width // 4 and h > height // (target_count * 2):
area = cv2.contourArea(contour)
rect_area = w * h
# Check if it's mostly rectangular
if area / rect_area > 0.7:
panel_candidates.append({
'bounds': (x, y, w, h),
'confidence': min(1.0, area / rect_area),
'method': 'contour_analysis'
})
# Sort by y position
panel_candidates.sort(key=lambda x: x['bounds'][1])
# Remove overlapping candidates
filtered_candidates = []
for candidate in panel_candidates:
overlap = False
for existing in filtered_candidates:
if self._rectangles_overlap(candidate['bounds'], existing['bounds']):
overlap = True
break
if not overlap:
filtered_candidates.append(candidate)
return filtered_candidates
def _texture_analysis_method(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Texture-based panel separation"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Local Binary Pattern analysis
radius = 3
n_points = 8 * radius
lbp = local_binary_pattern(gray, n_points, radius, method='uniform')
# Create texture profile by analyzing horizontal strips
strip_height = height // (target_count * 4)
texture_profile = []
for y in range(0, height - strip_height, strip_height // 2):
strip = lbp[y:y + strip_height, :]
texture_variance = np.var(strip)
texture_profile.append(texture_variance)
# Find low-texture regions (potential separators)
texture_profile = np.array(texture_profile)
threshold = np.percentile(texture_profile, 25) # Bottom 25%
separators = []
for i, variance in enumerate(texture_profile):
if variance < threshold:
y_position = i * strip_height // 2
confidence = 1.0 - (variance / np.max(texture_profile))
separators.append({
'y_position': y_position,
'confidence': confidence
})
# Merge nearby separators
merged_separators = []
for sep in separators:
if not merged_separators or sep['y_position'] - merged_separators[-1]['y_position'] > height // (target_count * 2):
merged_separators.append(sep)
else:
# Keep the one with higher confidence
if sep['confidence'] > merged_separators[-1]['confidence']:
merged_separators[-1] = sep
# Create bounds
bounds = []
prev_y = 0
for sep in merged_separators:
y_pos = sep['y_position']
if y_pos > prev_y + height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': sep['confidence'],
'method': 'texture_analysis'
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method': 'texture_analysis'
})
return bounds
def _clustering_method(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Clustering-based panel detection"""
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
# Downsample for faster processing
scale_factor = 0.25
small_height = int(height * scale_factor)
small_width = int(width * scale_factor)
small_gray = cv2.resize(gray, (small_width, small_height))
# Create feature vectors for each pixel (position + intensity)
y_coords, x_coords = np.meshgrid(range(small_height), range(small_width), indexing='ij')
features = np.column_stack([
y_coords.flatten(),
x_coords.flatten(),
small_gray.flatten()
])
# Normalize features
features = features.astype(np.float32)
features[:, 0] /= small_height
features[:, 1] /= small_width
features[:, 2] /= 255.0
# Apply K-means clustering
n_clusters = target_count + 1 # +1 for potential separators
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
labels = kmeans.fit_predict(features)
# Reshape labels back to image shape
label_image = labels.reshape(small_height, small_width)
# Find horizontal boundaries between clusters
boundaries = []
for y in range(1, small_height - 1):
# Check if this row represents a cluster boundary
current_clusters = set(label_image[y, :])
above_clusters = set(label_image[y-1, :])
below_clusters = set(label_image[y+1, :])
# If clusters change significantly, it might be a boundary
if len(current_clusters.intersection(above_clusters)) < len(current_clusters) * 0.7 or \
len(current_clusters.intersection(below_clusters)) < len(current_clusters) * 0.7:
boundaries.append({
'y_position': int(y / scale_factor),
'confidence': 0.7
})
# Create bounds from boundaries
bounds = []
prev_y = 0
for boundary in boundaries:
y_pos = boundary['y_position']
if y_pos > prev_y + height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': boundary['confidence'],
'method': 'clustering'
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method': 'clustering'
})
return bounds
def _apply_consensus(self, method_results: List[List[Dict]], image: np.ndarray,
target_count: int) -> List[Dict]:
"""Apply consensus voting to combine results from multiple methods"""
height, width = image.shape[:2]
# Collect all proposed boundaries
all_boundaries = []
for method_result in method_results:
for panel in method_result:
bounds = panel['bounds']
# Add both top and bottom boundaries
all_boundaries.append({
'y_position': bounds[1], # Top boundary
'confidence': panel['confidence'],
'method': panel['method'],
'type': 'top'
})
all_boundaries.append({
'y_position': bounds[1] + bounds[3], # Bottom boundary
'confidence': panel['confidence'],
'method': panel['method'],
'type': 'bottom'
})
# Sort boundaries by position
all_boundaries.sort(key=lambda x: x['y_position'])
# Cluster nearby boundaries
clustered_boundaries = []
cluster_threshold = height // (target_count * 3)
for boundary in all_boundaries:
# Skip image edges
if boundary['y_position'] < cluster_threshold or boundary['y_position'] > height - cluster_threshold:
continue
# Find existing cluster or create new one
added_to_cluster = False
for cluster in clustered_boundaries:
if abs(boundary['y_position'] - cluster['y_position']) < cluster_threshold:
# Add to existing cluster
cluster['boundaries'].append(boundary)
# Update cluster position (weighted average)
total_weight = sum(b['confidence'] for b in cluster['boundaries'])
cluster['y_position'] = sum(b['y_position'] * b['confidence']
for b in cluster['boundaries']) / total_weight
cluster['confidence'] = total_weight / len(cluster['boundaries'])
added_to_cluster = True
break
if not added_to_cluster:
clustered_boundaries.append({
'y_position': boundary['y_position'],
'confidence': boundary['confidence'],
'boundaries': [boundary]
})
# Sort clustered boundaries and select best ones
clustered_boundaries.sort(key=lambda x: x['y_position'])
# Filter boundaries based on confidence and target count
min_confidence = 0.3
good_boundaries = [b for b in clustered_boundaries if b['confidence'] >= min_confidence]
# Limit to reasonable number of boundaries
if len(good_boundaries) > target_count - 1:
good_boundaries.sort(key=lambda x: x['confidence'], reverse=True)
good_boundaries = good_boundaries[:target_count - 1]
good_boundaries.sort(key=lambda x: x['y_position'])
# Create final panel bounds
final_bounds = []
prev_y = 0
for boundary in good_boundaries:
y_pos = int(boundary['y_position'])
if y_pos > prev_y + height // (target_count * 2):
method_votes = [b['method'] for b in boundary['boundaries']]
final_bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': boundary['confidence'],
'method_votes': method_votes
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_count * 2):
final_bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8,
'method_votes': ['consensus']
})
return final_bounds
def _fallback_simple_division(self, image: np.ndarray, target_count: int) -> List[Dict]:
"""Fallback method: simple equal division"""
height, width = image.shape[:2]
panel_height = height // target_count
splits = []
for i in range(target_count):
y = i * panel_height
h = panel_height if i < target_count - 1 else height - y
splits.append({
'image': image[y:y+h, :],
'bounds': (0, y, width, h),
'confidence': 0.5,
'method_votes': ['simple_division']
})
return splits
def _match_split_basic(self, split_path: str, master_images: List[str]) -> List[Dict]:
"""Basic matching using OpenCV features (fallback)"""
matches = []
try:
# Load the split image
split_img = cv2.imread(split_path, cv2.IMREAD_GRAYSCALE)
if split_img is None:
return matches
# Initialize feature detector
akaze = cv2.AKAZE_create()
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
# Detect keypoints and descriptors for split image
kp_split, des_split = akaze.detectAndCompute(split_img, None)
if des_split is None:
return matches
# Load master images from the master_images directory
master_images_path = Path("master_images")
for master_id in master_images:
master_path = master_images_path / f"{master_id}.jpg"
if not master_path.exists():
continue
# Load master image
master_img = cv2.imread(str(master_path), cv2.IMREAD_GRAYSCALE)
if master_img is None:
continue
# Detect keypoints and descriptors for master image
kp_master, des_master = akaze.detectAndCompute(master_img, None)
if des_master is None:
continue
# Match features
matches_raw = bf.knnMatch(des_split, des_master, k=2)
# Apply Lowe's ratio test
good_matches = []
for match_pair in matches_raw:
if len(match_pair) == 2:
m, n = match_pair
if m.distance < 0.7 * n.distance:
good_matches.append(m)
# If we have enough good matches, try to find homography
if len(good_matches) >= 10:
src_pts = np.float32([kp_split[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp_master[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
try:
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
if M is not None:
inliers = int(np.sum(mask))
inlier_ratio = inliers / len(good_matches)
# Basic confidence scoring
if inliers >= 15 and inlier_ratio >= 0.6:
confidence = 'high'
elif inliers >= 8 and inlier_ratio >= 0.4:
confidence = 'medium'
else:
confidence = 'low'
# Only include medium and high confidence matches
if confidence in ['medium', 'high']:
matches.append({
'master_id': master_id,
'confidence': confidence,
'inliers': inliers,
'match_details': {
'inliers': inliers,
'good_matches': len(good_matches),
'inlier_ratio': round(inlier_ratio, 3)
}
})
except:
continue
except Exception as e:
print(f"Error in basic matching: {e}")
return matches
def _deduplicate_matches(self, result: Dict) -> Dict:
"""Remove duplicate matches, keeping highest confidence ones"""
if not result['all_matches']:
return result
# Group matches by master_id
master_groups = {}
for match in result['all_matches']:
master_id = match['master_id']
if master_id not in master_groups:
master_groups[master_id] = []
master_groups[master_id].append(match)
# Keep only the highest confidence match for each master
deduplicated_matches = []
for master_id, matches in master_groups.items():
best_match = max(matches, key=lambda x: x.get('confidence', 0))
deduplicated_matches.append(best_match)
result['all_matches'] = deduplicated_matches
result['detected_masters'] = [match['master_id'] for match in deduplicated_matches]
return result
def _rectangles_overlap(self, rect1: Tuple[int, int, int, int],
rect2: Tuple[int, int, int, int]) -> bool:
"""Check if two rectangles overlap"""
x1, y1, w1, h1 = rect1
x2, y2, w2, h2 = rect2
return not (x1 + w1 < x2 or x2 + w2 < x1 or y1 + h1 < y2 or y2 + h2 < y1)
def _save_debug_visualization(self, image_path: str, image: np.ndarray,
splits: List[Dict]) -> None:
"""Save debug visualization of the splitting results"""
if not self.debug:
return
base_name = os.path.splitext(os.path.basename(image_path))[0]
# Create visualization with boundaries
vis_image = image.copy()
for i, split in enumerate(splits):
x, y, w, h = split['bounds']
# Draw rectangle
cv2.rectangle(vis_image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Add label
label = f"Panel {i+1} ({split['confidence']:.2f})"
cv2.putText(vis_image, label, (x + 5, y + 20),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
# Save visualization
vis_path = os.path.join(self.debug_dir, f"{base_name}_splits.jpg")
cv2.imwrite(vis_path, vis_image)
# Save individual splits
for i, split in enumerate(splits):
split_path = os.path.join(self.debug_dir, f"{base_name}_split_{i+1}.jpg")
cv2.imwrite(split_path, split['image'])

194
process_detection.py Normal file
View file

@ -0,0 +1,194 @@
#!/usr/bin/env python3
"""
Process Detection Module - Standalone function for processing master image detection
"""
import os
import json
import time
from pathlib import Path
from PIL import Image, ImageEnhance
import google.generativeai as genai
from dotenv import load_dotenv
import uuid
import threading
import tempfile
def process_single_master_detection(layout_path, master_id, master_path, enable_greyscale, enable_contrast_enhancement, contrast_factor, safety_settings):
"""
Standalone function for processing a single master detection in a separate process.
This ensures complete isolation from other workers.
"""
try:
# Load environment in this process
load_dotenv()
api_key = os.getenv('GEMINI_API_KEY')
if not api_key:
raise ValueError("GEMINI_API_KEY not found in environment variables")
# Configure API client in this process
genai.configure(api_key=api_key)
model = genai.GenerativeModel('gemini-2.5-pro')
def preprocess_image_process(image_path, enable_greyscale, enable_contrast_enhancement, contrast_factor):
"""Process-local image preprocessing"""
if not enable_greyscale and not enable_contrast_enhancement:
return image_path
try:
with Image.open(image_path) as img:
processed_img = img.copy()
if enable_greyscale:
processed_img = processed_img.convert('L')
processed_img = processed_img.convert('RGB')
if enable_contrast_enhancement:
contrast_enhancer = ImageEnhance.Contrast(processed_img)
processed_img = contrast_enhancer.enhance(contrast_factor)
sharpness_enhancer = ImageEnhance.Sharpness(processed_img)
processed_img = sharpness_enhancer.enhance(1.3)
# Create unique temp file for this process
process_id = os.getpid()
unique_id = str(uuid.uuid4())[:8]
original_name = Path(image_path).stem
with tempfile.NamedTemporaryFile(suffix=f"_{process_id}_{unique_id}.jpg", delete=False) as tmp_file:
processed_img.save(tmp_file.name, 'JPEG', quality=95)
return tmp_file.name
except Exception as e:
return image_path
def upload_with_retry_process(image_path, max_retries=3):
"""Process-local upload with retry"""
for attempt in range(max_retries):
try:
processed_path = preprocess_image_process(image_path, enable_greyscale, enable_contrast_enhancement, contrast_factor)
uploaded_file = genai.upload_file(processed_path)
# Clean up temp file if it was created
if processed_path != image_path:
try:
os.unlink(processed_path)
except:
pass
return uploaded_file
except Exception as e:
if attempt == max_retries - 1:
return None
import random
jitter = random.uniform(0.1, 0.5)
sleep_time = (0.5 * (attempt + 1)) + jitter
time.sleep(sleep_time)
return None
# Upload images
master_file = upload_with_retry_process(master_path)
layout_file = upload_with_retry_process(layout_path)
if not master_file or not layout_file:
raise Exception("Failed to upload images")
# Create prompt
prompt = f"""Analyze the layout image (the second image) and determine if the master image (the first image) appears in it.
INSTRUCTIONS:
1. Compare the master image (first image) with the layout image (second image)
2. Look for EXACT matches where the model, clothing, and pose are IDENTICAL
3. The layout image may contain the master image in various forms:
- Complete/exact match
- Cropped version
- Scaled or resized version
- Rotated version
- Partially obscured
4. Focus on visual similarity in terms of:
- Person/model appearance and pose (must be EXACTLY the same)
- Clothing details (colors, patterns, styles - must be EXACTLY the same)
- Background and composition
- Overall visual elements
5. CRITICAL: Only return a positive result if the models, pose, and clothing are EXACTLY the same.
If there is ANY difference in clothing, model, or pose then return a negative result.
Master Image ID: {master_id}
Return your response as a JSON object with this exact format:
{{
"match_found": true/false,
"master_id": "{master_id}",
"confidence": "high/medium/low",
"analysis": "Detailed explanation of your findings and reasoning"
}}
IMPORTANT CONTEXT: This is a legitimate business application for marketing and e-commerce image matching. The images are product/marketing photos showing models in various clothing styles for retail purposes. This analysis is for content categorization in a business context and is completely benign.
"""
# Make API call with retry
max_retries = 3
for attempt in range(max_retries):
try:
response = model.generate_content([prompt, master_file, layout_file], safety_settings=safety_settings)
if not response.candidates:
if attempt < max_retries - 1:
time.sleep((2 ** attempt) * 0.5)
continue
else:
raise Exception("No candidates returned from API")
candidate = response.candidates[0]
if candidate.finish_reason and candidate.finish_reason != 1:
if attempt < max_retries - 1:
time.sleep((2 ** attempt) * 0.5)
continue
else:
raise Exception(f"Request finished with reason: {candidate.finish_reason}")
# Parse response
response_text = response.text.strip()
start_idx = response_text.find('{')
end_idx = response_text.rfind('}') + 1
if start_idx == -1 or end_idx == 0:
raise ValueError("No JSON found in response")
json_str = response_text[start_idx:end_idx]
result = json.loads(json_str)
# Validate result format
if 'match_found' not in result:
result['match_found'] = False
if 'master_id' not in result:
result['master_id'] = master_id
if 'confidence' not in result:
result['confidence'] = 'unknown'
if 'analysis' not in result:
result['analysis'] = response_text
return result
except Exception as e:
if attempt == max_retries - 1:
return {
'match_found': False,
'master_id': master_id,
'confidence': 'unknown',
'analysis': '',
'error': str(e)
}
time.sleep((2 ** attempt) * 0.5)
except Exception as e:
return {
'match_found': False,
'master_id': master_id,
'confidence': 'unknown',
'analysis': '',
'error': str(e)
}

15
requirements.txt Normal file
View file

@ -0,0 +1,15 @@
google-generativeai==0.8.5
pillow==11.2.1
python-dotenv==1.1.1
aiofiles==24.1.0
google-cloud-aiplatform>=1.100.0
numpy>=2.2.6
opencv-python>=4.8.0
openai>=1.58.0
# New dependencies for panel splitting functionality
scipy>=1.16.0
scikit-image>=0.25.2
scikit-learn>=1.7.0
matplotlib>=3.10.3
# System monitoring for crash debugging
psutil>=5.9.0

132
simple_splitter.py Normal file
View file

@ -0,0 +1,132 @@
"""
Simple Panel Splitter for Master Adapt Detect
This module provides a simple panel splitting approach that evenly divides layouts
into panels based on the number of panels detected by OpenAI analysis. It's designed
for use with hybrid mode as an alternative to split-advanced.
"""
import cv2
import numpy as np
from typing import List, Tuple, Optional, Dict
import os
from pathlib import Path
class SimplePanelSplitter:
"""
Simple panel splitter that evenly divides layouts into panels based on OpenAI analysis count.
This splitter assumes panels are arranged in a grid pattern and divides the image
evenly based on the detected panel count. It's optimized for speed and simplicity
rather than precision.
"""
def __init__(self, debug: bool = False):
"""
Initialize the simple panel splitter.
Args:
debug: Enable debug output and visualization
"""
self.debug = debug
def _determine_grid_layout(self, panel_count: int) -> Tuple[int, int]:
"""
Determine the grid layout for vertical splitting of horizontal layouts.
For wide horizontal layouts, this creates even vertical sections.
Args:
panel_count: Number of panels to arrange
Returns:
Tuple of (rows, cols) - always (1, panel_count) for horizontal splitting
"""
if panel_count <= 0:
return (1, 1)
else:
# Always split horizontally into vertical sections
# This creates even vertical strips across the width
return (1, panel_count)
def split_panels(self, layout_path: str, panel_count: int) -> List[Dict]:
"""
Split a layout image into individual panels using simple even division.
Args:
layout_path: Path to the layout image
panel_count: Number of panels to split into (from OpenAI analysis)
Returns:
List of dictionaries containing image data and metadata
"""
if self.debug:
print(f"SimplePanelSplitter: Splitting {layout_path} into {panel_count} panels")
# Load the image
image = cv2.imread(layout_path)
if image is None:
print(f"Error: Could not load image {layout_path}")
return []
height, width = image.shape[:2]
# Determine grid layout
rows, cols = self._determine_grid_layout(panel_count)
if self.debug:
print(f"SimplePanelSplitter: Using {rows}x{cols} grid layout")
# Calculate panel dimensions
panel_height = height // rows
panel_width = width // cols
splits = []
panel_index = 0
# Extract panels in row-major order
for row in range(rows):
for col in range(cols):
if panel_index >= panel_count:
break
# Calculate panel boundaries
y_start = row * panel_height
y_end = (row + 1) * panel_height if row < rows - 1 else height
x_start = col * panel_width
x_end = (col + 1) * panel_width if col < cols - 1 else width
# Extract panel
panel_image = image[y_start:y_end, x_start:x_end]
# Create panel dictionary in expected format
panel_dict = {
'image': panel_image,
'bounds': (x_start, y_start, x_end - x_start, y_end - y_start),
'confidence': 1.0, # High confidence for simple even division
'method': 'simple_even_division'
}
splits.append(panel_dict)
if self.debug:
print(f"SimplePanelSplitter: Extracted panel {panel_index + 1} "
f"at ({x_start},{y_start}) to ({x_end},{y_end})")
panel_index += 1
if panel_index >= panel_count:
break
if self.debug:
print(f"SimplePanelSplitter: Successfully split into {len(splits)} panels")
return splits
def cleanup_split_files(self, splits: List[Dict]) -> None:
"""
Clean up method for compatibility - simple splitter doesn't create files.
Args:
splits: List of split panel dictionaries (no cleanup needed)
"""
if self.debug:
print(f"SimplePanelSplitter: No cleanup needed - {len(splits)} panels processed in memory")

214
test_14_panel_split.py Normal file
View file

@ -0,0 +1,214 @@
#!/usr/bin/env python3
"""
Test script to verify multi-panel layout splitting functionality
Tests the panel splitting with 6786505.jpg (horizontal strip layout)
"""
import sys
import os
import cv2
import numpy as np
from pathlib import Path
from panel_splitter import PanelSplitter
def test_multi_panel_splitting():
"""Test splitting the multi-panel layout image"""
print("=" * 60)
print("TESTING MULTI-PANEL LAYOUT SPLITTING")
print("=" * 60)
# Initialize splitter with debug mode
splitter = PanelSplitter(debug=True)
# Target layout file - this is a horizontal strip with many panels
layout_path = Path("layouts") / "6786505.jpg"
if not layout_path.exists():
print(f"❌ ERROR: Layout file {layout_path} not found!")
return False
print(f"📁 Testing with: {layout_path.name}")
# Load and examine the image
image = cv2.imread(str(layout_path))
if image is None:
print(f"❌ ERROR: Could not load image {layout_path}")
return False
height, width = image.shape[:2]
print(f"📐 Image dimensions: {width}x{height}")
# Target panel count - let's first see how many OpenAI detects, then use that as target
print("🤖 Getting OpenAI panel count first...")
try:
from openai_detector import OpenAIImageDetector
temp_detector = OpenAIImageDetector()
temp_detector.load_master_images()
panel_result = temp_detector.count_panels_in_layout(str(layout_path))
target_count = panel_result.get('panel_count', 10)
print(f"🎯 OpenAI detected {target_count} panels - using this as target")
except:
target_count = 10 # fallback
print(f"🎯 Using fallback target count: {target_count}")
print("\n" + "" * 40)
print("RUNNING PANEL SPLITTING...")
print("" * 40)
# Split the layout
splits = splitter.split_panels(str(layout_path), target_count)
print(f"\n📊 SPLITTING RESULTS:")
print(f"Generated {len(splits)} splits (target: {target_count})")
# Verify results
success = len(splits) == target_count
if success:
print(f"✅ SUCCESS: Generated exactly {target_count} splits!")
else:
print(f"⚠️ WARNING: Generated {len(splits)} splits instead of {target_count}")
print("\n📋 DETAILED SPLIT ANALYSIS:")
print("" * 40)
total_area = 0
original_area = width * height
for i, split in enumerate(splits):
x, y, w, h = split['bounds']
area = w * h
total_area += area
print(f"Split {i+1:2d}: [{x:4d}, {y:4d}, {w:4d}, {h:4d}] "
f"area={area:6d} conf={split['confidence']:.3f}")
# Verify split bounds are reasonable
if w < 20 or h < 20:
print(f" ⚠️ Split {i+1} is very small!")
if x < 0 or y < 0 or x+w > width or y+h > height:
print(f" ❌ Split {i+1} bounds are out of image!")
# Calculate coverage
coverage = (total_area / original_area) * 100
print(f"\n📈 Coverage: {coverage:.1f}% of original image")
if coverage < 80:
print("⚠️ Low coverage - some areas might be missed")
elif coverage > 120:
print("⚠️ High coverage - splits might be overlapping")
# Check for overlaps
print("\n🔍 CHECKING FOR OVERLAPS:")
overlaps = 0
for i, split1 in enumerate(splits):
for j, split2 in enumerate(splits[i+1:], i+1):
if rectangles_overlap(split1['bounds'], split2['bounds']):
overlaps += 1
print(f" ⚠️ Splits {i+1} and {j+1} overlap!")
if overlaps == 0:
print(" ✅ No overlaps detected")
else:
print(f" ❌ Found {overlaps} overlapping pairs")
# Save individual split images for inspection
print("\n💾 SAVING SPLIT IMAGES:")
splits_dir = Path("test_splits")
splits_dir.mkdir(exist_ok=True)
for i, split in enumerate(splits):
split_filename = splits_dir / f"6786505_split_{i+1:02d}.jpg"
cv2.imwrite(str(split_filename), split['image'])
print(f" Saved: {split_filename}")
print(f"\n📁 All split images saved to: {splits_dir}/")
# Method analysis
print("\n🔬 METHOD ANALYSIS:")
method_votes = {}
for split in splits:
for method in split.get('method_votes', []):
method_votes[method] = method_votes.get(method, 0) + 1
for method, count in sorted(method_votes.items(), key=lambda x: x[1], reverse=True):
print(f" {method}: {count} votes")
print("\n" + "=" * 60)
if success:
print(f"🎉 TEST PASSED: Successfully split {target_count}-panel layout!")
else:
print(f"❌ TEST FAILED: Did not produce exactly {target_count} splits")
print("=" * 60)
return success
def rectangles_overlap(rect1, rect2):
"""Check if two rectangles overlap"""
x1, y1, w1, h1 = rect1
x2, y2, w2, h2 = rect2
return not (x1 + w1 <= x2 or x2 + w2 <= x1 or y1 + h1 <= y2 or y2 + h2 <= y1)
def test_with_openai_guidance():
"""Test splitting with OpenAI panel count guidance"""
print("\n" + "=" * 60)
print("TESTING WITH OPENAI PANEL COUNT GUIDANCE")
print("=" * 60)
try:
from openai_detector import OpenAIImageDetector
# Initialize OpenAI detector
detector = OpenAIImageDetector(split_mode=True)
detector.load_master_images()
layout_path = str(Path("layouts") / "6786505.jpg")
print("🤖 Getting OpenAI panel count...")
panel_result = detector.count_panels_in_layout(layout_path)
openai_count = panel_result.get('panel_count', 1)
confidence = panel_result.get('confidence', 'unknown')
print(f"OpenAI detected: {openai_count} panels (confidence: {confidence})")
# Test full split_layout_and_match functionality
print("\n🔄 Testing full split_layout_and_match...")
master_ids = list(detector.master_images.keys())[:10] # Test with first 10 masters
result = detector.splitter.split_layout_and_match(layout_path, master_ids, detector)
print(f"Split and match result:")
print(f" Splits generated: {result.get('splits_generated', 0)}")
print(f" Panel count: {result.get('panel_count', 'unknown')}")
print(f" Detected masters: {len(result.get('detected_masters', []))}")
return openai_count >= 5 # Success if we detect at least 5 panels
except Exception as e:
print(f"❌ Error testing with OpenAI guidance: {e}")
return False
def main():
"""Main test function"""
print("🧪 STARTING MULTI-PANEL LAYOUT SPLITTING TEST")
# Test basic splitting
basic_success = test_multi_panel_splitting()
# Test with OpenAI guidance
openai_success = test_with_openai_guidance()
print(f"\n📊 FINAL RESULTS:")
print(f"Basic splitting: {'✅ PASSED' if basic_success else '❌ FAILED'}")
print(f"OpenAI guidance: {'✅ PASSED' if openai_success else '❌ FAILED'}")
if basic_success and openai_success:
print("\n🎉 ALL TESTS PASSED! The multi-panel splitting is working correctly.")
return 0
else:
print("\n❌ Some tests failed. Please check the output above.")
return 1
if __name__ == "__main__":
sys.exit(main())

85
test_6786505_cli.py Normal file
View file

@ -0,0 +1,85 @@
#!/usr/bin/env python3
"""
Test script to specifically process 6786505.jpg with the CLI
"""
import subprocess
import json
import sys
from pathlib import Path
def test_6786505_processing():
"""Test processing 6786505.jpg with the CLI"""
print("=" * 60)
print("TESTING 6786505.jpg PROCESSING WITH CLI")
print("=" * 60)
# First backup the layouts directory
layouts_dir = Path("layouts")
all_layouts = list(layouts_dir.glob("*.jpg"))
# Temporarily remove other layouts so --test processes only 6786505.jpg
temp_dir = Path("temp_layouts")
temp_dir.mkdir(exist_ok=True)
# Move all layouts except 6786505.jpg to temp directory
for layout in all_layouts:
if layout.name != "6786505.jpg":
layout.rename(temp_dir / layout.name)
try:
# Run the CLI with split mode
print("Running CLI with split mode...")
result = subprocess.run([
"python", "cli.py",
"--test", "--split", "--openai"
],
capture_output=True, text=True, cwd=".",
env={"PATH": "/Users/michael.clervi/Documents/projects/master_adapt_detect/venv/bin:/usr/local/bin:/usr/bin:/bin"}
)
print("CLI Output:")
print(result.stdout)
if result.stderr:
print("CLI Errors:")
print(result.stderr)
# Check if results file was created
results_file = Path("results/test_results_openai_split.json")
if results_file.exists():
with open(results_file, 'r') as f:
results = json.load(f)
print(f"\n📊 Results Analysis:")
print(f"Total layouts processed: {results.get('total_layouts', 0)}")
# Check the layout results
layout_results = results.get('layout_results', {})
for layout_name, layout_data in layout_results.items():
print(f"\n📁 Layout: {layout_name}")
print(f" Panel count: {layout_data.get('panel_count', 'unknown')}")
print(f" Splits generated: {layout_data.get('splits_generated', 'unknown')}")
print(f" Detected masters: {len(layout_data.get('detected_masters', []))}")
# Check if we got 14 panels
if layout_data.get('panel_count') == 14:
print(f" ✅ SUCCESS: Correctly detected 14 panels!")
return True
else:
print(f" ❌ Expected 14 panels, got {layout_data.get('panel_count')}")
return False
else:
print("❌ Results file not found")
return False
finally:
# Restore all layouts
for layout in temp_dir.glob("*.jpg"):
layout.rename(layouts_dir / layout.name)
temp_dir.rmdir()
return False
if __name__ == "__main__":
success = test_6786505_processing()
sys.exit(0 if success else 1)

166
test_cost_calculator.py Normal file
View file

@ -0,0 +1,166 @@
#!/usr/bin/env python3
"""
Test script for the cost calculator functionality
"""
import sys
import os
from pathlib import Path
# Add current directory to path so we can import our modules
sys.path.insert(0, str(Path(__file__).parent))
from cost_calculator import CostCalculator, TokenUsage, ApiCallCost, extract_token_usage_from_response
def test_cost_calculator():
"""Test the cost calculator functionality"""
print("Testing Cost Calculator...")
# Test 1: Basic cost calculation
print("\n1. Testing basic cost calculation:")
calc = CostCalculator(enable_tracking=True)
# Test cost calculation with sample token usage
input_cost, output_cost, cached_cost, total_cost = calc.calculate_cost(
prompt_tokens=1500,
completion_tokens=800,
cached_tokens=200
)
print(f" Input tokens (1500): ${input_cost:.4f}")
print(f" Output tokens (800): ${output_cost:.4f}")
print(f" Cached tokens (200): ${cached_cost:.4f}")
print(f" Total cost: ${total_cost:.4f}")
# Test 2: API call tracking
print("\n2. Testing API call tracking:")
# Simulate multiple API calls
calc.track_api_call(
operation_type="panel_counting_censorship",
prompt_tokens=1500,
completion_tokens=800,
cached_tokens=200,
layout_name="test_layout_1.jpg"
)
calc.track_api_call(
operation_type="detection",
prompt_tokens=2000,
completion_tokens=1200,
cached_tokens=0,
layout_name="test_layout_2.jpg"
)
calc.track_api_call(
operation_type="one_at_a_time_detection",
prompt_tokens=800,
completion_tokens=400,
cached_tokens=100,
layout_name="test_layout_3.jpg",
master_id="1011A_1011_05"
)
print(f" Tracked {len(calc.api_calls)} API calls")
print(f" Total cost so far: ${calc.total_cost:.4f}")
# Test 3: Layout cost breakdown
print("\n3. Testing layout cost breakdown:")
breakdown = calc.get_layout_cost_breakdown("test_layout_1.jpg")
if breakdown:
print(f" Layout: {breakdown['layout_name']}")
print(f" Total cost: ${breakdown['total_cost']:.4f}")
print(f" Input tokens: {breakdown['cost_breakdown']['input_tokens']}")
print(f" Output tokens: {breakdown['cost_breakdown']['output_tokens']}")
print(f" API calls: {breakdown['cost_breakdown']['api_calls_made']}")
# Test 4: Session summary
print("\n4. Testing session summary:")
summary = calc.get_session_summary()
if summary['tracking_enabled']:
print(f" Total cost: ${summary['session_totals']['total_cost']:.4f}")
print(f" Total tokens: {summary['session_totals']['total_input_tokens'] + summary['session_totals']['total_output_tokens']:,}")
print(f" Layouts processed: {summary['session_totals']['layouts_processed']}")
print(f" Avg cost per layout: ${summary['averages']['cost_per_layout']:.4f}")
# Test 5: Monthly cost estimation
print("\n5. Testing monthly cost estimation:")
estimate = calc.estimate_monthly_cost(300)
if 'error' not in estimate:
print(f" Based on {estimate['based_on_layouts']} layouts:")
print(f" Average cost per layout: ${estimate['average_cost_per_layout']:.4f}")
print(f" Monthly estimate (300 layouts): ${estimate['estimated_monthly_cost']:.2f}")
print(f" Annual estimate: ${estimate['estimated_annual_cost']:.2f}")
# Test 6: Cost report generation
print("\n6. Testing cost report generation:")
report_file = calc.save_cost_report("test_cost_report")
if report_file:
print(f" Cost report saved to: {report_file}")
# Test 7: Print cost summary
print("\n7. Testing cost summary output:")
calc.print_cost_summary()
print("\nCost calculator test completed successfully!")
def test_token_usage():
"""Test the TokenUsage data class"""
print("\nTesting TokenUsage data class...")
# Test valid token usage
usage = TokenUsage(
prompt_tokens=1500,
completion_tokens=800,
total_tokens=2300,
cached_tokens=200
)
print(f" Prompt tokens: {usage.prompt_tokens}")
print(f" Completion tokens: {usage.completion_tokens}")
print(f" Total tokens: {usage.total_tokens}")
print(f" Cached tokens: {usage.cached_tokens}")
# Test token usage validation
try:
invalid_usage = TokenUsage(
prompt_tokens=1500,
completion_tokens=800,
total_tokens=2000, # Should be 2300
cached_tokens=200
)
print(" ERROR: Should have raised ValueError for invalid total")
except ValueError as e:
print(f" ✓ Correctly caught validation error: {e}")
def test_disabled_tracking():
"""Test cost calculator with tracking disabled"""
print("\nTesting disabled cost tracking...")
calc = CostCalculator(enable_tracking=False)
# All operations should return zeros or empty results
input_cost, output_cost, cached_cost, total_cost = calc.calculate_cost(1500, 800, 200)
print(f" Cost calculation (disabled): ${total_cost:.4f}")
api_call = calc.track_api_call("test", 1500, 800, 200, "test.jpg")
print(f" API call tracking (disabled): ${api_call.total_cost:.4f}")
summary = calc.get_session_summary()
print(f" Session summary (disabled): {summary['tracking_enabled']}")
if __name__ == "__main__":
try:
test_cost_calculator()
test_token_usage()
test_disabled_tracking()
print("\n✅ All tests passed!")
except Exception as e:
print(f"\n❌ Test failed: {e}")
import traceback
traceback.print_exc()
sys.exit(1)

View file

@ -0,0 +1,91 @@
#!/usr/bin/env python3
"""
Test script to demonstrate cost tracking functionality
"""
import subprocess
import sys
from pathlib import Path
def run_cost_tracking_test():
"""Run a test with cost tracking enabled"""
print("Testing cost tracking with hybrid mode...")
print("=" * 60)
# Test command with cost tracking enabled
cmd = [
sys.executable, "cli.py",
"--test",
"--hybrid",
"--enable-cost-tracking",
"--cost-report"
]
print(f"Running command: {' '.join(cmd)}")
print("-" * 60)
try:
result = subprocess.run(cmd, capture_output=True, text=True, cwd=Path(__file__).parent)
print("STDOUT:")
print(result.stdout)
if result.stderr:
print("\nSTDERR:")
print(result.stderr)
print(f"\nReturn code: {result.returncode}")
# Check if cost report was generated
cost_reports = list(Path("results").glob("cost_report_*.json"))
if cost_reports:
print(f"\n✅ Cost report generated: {cost_reports[-1]}")
else:
print("\n❌ No cost report found")
except Exception as e:
print(f"❌ Error running test: {e}")
def run_without_cost_tracking():
"""Run a test without cost tracking for comparison"""
print("\nTesting without cost tracking...")
print("=" * 60)
# Test command without cost tracking
cmd = [
sys.executable, "cli.py",
"--test",
"--hybrid"
]
print(f"Running command: {' '.join(cmd)}")
print("-" * 60)
try:
result = subprocess.run(cmd, capture_output=True, text=True, cwd=Path(__file__).parent)
print("STDOUT:")
print(result.stdout)
if result.stderr:
print("\nSTDERR:")
print(result.stderr)
print(f"\nReturn code: {result.returncode}")
except Exception as e:
print(f"❌ Error running test: {e}")
if __name__ == "__main__":
print("Cost Tracking Integration Test")
print("=" * 60)
# Test 1: With cost tracking
run_cost_tracking_test()
# Test 2: Without cost tracking
run_without_cost_tracking()
print("\n" + "=" * 60)
print("Integration test completed!")
print("=" * 60)

View file

@ -0,0 +1,188 @@
#!/usr/bin/env python3
import cv2
import numpy as np
from pathlib import Path
import os
def analyze_horizontal_panels(image_path: str, debug=True):
"""Analyze horizontal panel structure and test splitting parameters"""
print(f"Analyzing {Path(image_path).name}")
# Load image
img = cv2.imread(image_path)
height, width = img.shape[:2]
print(f"Image dimensions: {width}x{height}")
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Method 1: Simple equal division
print("\n=== Method 1: Equal Division ===")
equal_panels = []
for num_panels in [8, 10, 12]:
panel_width = width // num_panels
panels = []
for i in range(num_panels):
x1 = i * panel_width
x2 = (i + 1) * panel_width if i < num_panels - 1 else width
panels.append((x1, 0, x2, height))
equal_panels.append((num_panels, panels))
print(f" {num_panels} panels: widths = {[x2-x1 for x1, y1, x2, y2 in panels]}")
# Method 2: Edge-based detection
print("\n=== Method 2: Edge Detection ===")
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
# Create vertical line detection kernel
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, height // 4))
vertical_lines = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, vertical_kernel)
# Get vertical projection
vertical_projection = np.sum(vertical_lines, axis=0)
# Find peaks for different thresholds
for threshold_factor in [0.3, 0.4, 0.5, 0.6, 0.7]:
threshold = np.max(vertical_projection) * threshold_factor
separators = []
for x in range(width):
if vertical_projection[x] > threshold:
# Ensure minimum distance between separators
if not separators or x - separators[-1] > 30:
separators.append(x)
# Create panels from separators
x_boundaries = [0] + separators + [width]
x_boundaries = sorted(list(set(x_boundaries)))
panels = []
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
if x2 - x1 > 50: # Minimum panel width
panels.append((x1, 0, x2, height))
print(f" Threshold {threshold_factor}: {len(panels)} panels, widths = {[x2-x1 for x1, y1, x2, y2 in panels]}")
# Method 3: Adaptive histogram analysis
print("\n=== Method 3: Histogram Analysis ===")
# Calculate horizontal histogram (sum of pixel intensities)
horizontal_hist = np.sum(gray, axis=0)
# Find local minima (potential separators)
from scipy.signal import find_peaks
# Invert histogram to find valleys (separators)
inverted_hist = np.max(horizontal_hist) - horizontal_hist
# Find peaks in inverted histogram (valleys in original)
for min_distance in [50, 80, 100, 120]:
peaks, _ = find_peaks(inverted_hist, distance=min_distance, height=np.max(inverted_hist) * 0.1)
# Create panels
x_boundaries = [0] + list(peaks) + [width]
x_boundaries = sorted(list(set(x_boundaries)))
panels = []
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
if x2 - x1 > 50:
panels.append((x1, 0, x2, height))
print(f" Min distance {min_distance}: {len(panels)} panels, widths = {[x2-x1 for x1, y1, x2, y2 in panels]}")
if debug:
# Save debug images
debug_dir = Path("debug_splitting")
debug_dir.mkdir(exist_ok=True)
# Save edge detection result
cv2.imwrite(str(debug_dir / f"{Path(image_path).stem}_edges.jpg"), edges)
cv2.imwrite(str(debug_dir / f"{Path(image_path).stem}_vertical_lines.jpg"), vertical_lines)
# Save histogram visualization
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 5))
plt.plot(horizontal_hist)
plt.title("Horizontal Histogram")
plt.savefig(debug_dir / f"{Path(image_path).stem}_histogram.png")
plt.close()
# Save inverted histogram with peaks
plt.figure(figsize=(15, 5))
plt.plot(inverted_hist)
peaks, _ = find_peaks(inverted_hist, distance=100, height=np.max(inverted_hist) * 0.1)
plt.scatter(peaks, inverted_hist[peaks], color='red', zorder=5)
plt.title("Inverted Histogram with Detected Peaks")
plt.savefig(debug_dir / f"{Path(image_path).stem}_peaks.png")
plt.close()
print(f"\nDebug images saved to {debug_dir}/")
return equal_panels
def test_best_method(image_path: str, expected_panels: int):
"""Test the best splitting method for accurate panel detection"""
print(f"\n=== Testing Best Method for {expected_panels} expected panels ===")
img = cv2.imread(image_path)
height, width = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Combination approach: Use histogram analysis with refinement
horizontal_hist = np.sum(gray, axis=0)
inverted_hist = np.max(horizontal_hist) - horizontal_hist
# Smooth the inverted histogram to reduce noise
from scipy.ndimage import gaussian_filter1d
smoothed_hist = gaussian_filter1d(inverted_hist, sigma=10)
# Find peaks with optimized parameters
from scipy.signal import find_peaks
min_distance = width // (expected_panels * 1.5) # Adaptive distance based on expected panels
peaks, properties = find_peaks(smoothed_hist,
distance=min_distance,
height=np.max(smoothed_hist) * 0.15,
prominence=np.max(smoothed_hist) * 0.1)
print(f"Found {len(peaks)} separator peaks with min_distance={min_distance}")
# Create panels
x_boundaries = [0] + list(peaks) + [width]
x_boundaries = sorted(list(set(x_boundaries)))
panels = []
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
if x2 - x1 > 50: # Minimum reasonable panel width
panels.append((x1, 0, x2, height))
print(f"Generated {len(panels)} panels:")
for i, (x1, y1, x2, y2) in enumerate(panels):
print(f" Panel {i+1}: x={x1}-{x2} (width={x2-x1})")
# Save crops for visual verification
crops_dir = Path("test_crops")
crops_dir.mkdir(exist_ok=True)
for i, (x1, y1, x2, y2) in enumerate(panels):
crop = img[y1:y2, x1:x2]
crop_path = crops_dir / f"{Path(image_path).stem}_panel_{i+1:02d}.jpg"
cv2.imwrite(str(crop_path), crop)
print(f"\nCrops saved to {crops_dir}/ - Check these to verify panel accuracy!")
return panels
if __name__ == "__main__":
# Test on the multi-panel layout
test_image = "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6786500.jpg"
if os.path.exists(test_image):
print("Analyzing horizontal panel structure...")
analyze_horizontal_panels(test_image, debug=True)
print("\n" + "="*60)
test_best_method(test_image, expected_panels=10)
else:
print(f"Test image not found: {test_image}")

132
test_hybrid.py Normal file
View file

@ -0,0 +1,132 @@
#!/usr/bin/env python3
"""
Test script for hybrid detection implementation
"""
import sys
import os
from pathlib import Path
# Add current directory to Python path
sys.path.insert(0, str(Path(__file__).parent))
def test_hybrid_import():
"""Test that hybrid detector can be imported"""
try:
from hybrid_detector import HybridImageDetector
print("✓ Successfully imported HybridImageDetector")
return True
except ImportError as e:
print(f"✗ Failed to import HybridImageDetector: {e}")
return False
def test_hybrid_initialization():
"""Test hybrid detector initialization"""
try:
from hybrid_detector import HybridImageDetector
# Test with default settings
detector = HybridImageDetector()
print("✓ Successfully initialized HybridImageDetector with defaults")
# Test with custom settings
detector2 = HybridImageDetector(
panel_threshold=3,
inlier_threshold=0.7,
enable_greyscale=True,
enable_contrast_enhancement=True
)
print("✓ Successfully initialized HybridImageDetector with custom settings")
# Check attributes
assert detector.panel_threshold == 2
assert detector.inlier_threshold == 0.65
assert detector.enable_greyscale == False
assert detector.enable_contrast_enhancement == False
assert detector2.panel_threshold == 3
assert detector2.inlier_threshold == 0.7
assert detector2.enable_greyscale == True
assert detector2.enable_contrast_enhancement == True
print("✓ All attributes set correctly")
return True
except Exception as e:
print(f"✗ Failed to initialize HybridImageDetector: {e}")
return False
def test_required_files():
"""Test that required files exist"""
required_files = [
"layouts/",
"master_images/",
"openai_detector.py",
"hybrid_detector.py"
]
missing_files = []
for file_path in required_files:
if not os.path.exists(file_path):
missing_files.append(file_path)
if missing_files:
print(f"✗ Missing required files: {missing_files}")
return False
else:
print("✓ All required files exist")
return True
def test_cli_help():
"""Test CLI help includes hybrid mode"""
try:
import subprocess
result = subprocess.run([sys.executable, "cli.py", "--help"],
capture_output=True, text=True)
if "--hybrid" in result.stdout:
print("✓ CLI help includes --hybrid flag")
return True
else:
print("✗ CLI help does not include --hybrid flag")
return False
except Exception as e:
print(f"✗ Failed to test CLI help: {e}")
return False
def main():
"""Run all tests"""
print("Testing Hybrid Detection Implementation")
print("=" * 50)
tests = [
("Import Test", test_hybrid_import),
("Initialization Test", test_hybrid_initialization),
("Required Files Test", test_required_files),
("CLI Help Test", test_cli_help)
]
passed = 0
total = len(tests)
for test_name, test_func in tests:
print(f"\n{test_name}:")
try:
if test_func():
passed += 1
except Exception as e:
print(f"{test_name} failed with exception: {e}")
print(f"\n{'=' * 50}")
print(f"Test Results: {passed}/{total} tests passed")
if passed == total:
print("🎉 All tests passed! Hybrid implementation is ready.")
return 0
else:
print("❌ Some tests failed. Please check the implementation.")
return 1
if __name__ == "__main__":
sys.exit(main())

115
test_memory_fix.py Normal file
View file

@ -0,0 +1,115 @@
#!/usr/bin/env python3
"""
Test script to verify memory management fixes
"""
import sys
from memory_manager import MemoryManager, reduce_feature_count
from hybrid_detector import HybridImageDetector
def test_memory_manager():
"""Test memory manager functionality"""
print("Testing Memory Manager...")
mm = MemoryManager(max_memory_percent=75, max_swap_percent=30)
# Test memory usage reporting
usage = mm.get_memory_usage()
print(f"Current memory usage: {usage['memory_percent']:.1f}%")
print(f"Current swap usage: {usage['swap_percent']:.1f}%")
print(f"Available memory: {usage['memory_available_gb']:.1f} GB")
# Test that swap usage doesn't block processing
print(f"\nTesting is_memory_safe with swap usage {usage['swap_percent']:.1f}%:")
is_safe = mm.is_memory_safe()
print(f"Memory safe: {is_safe} (should be True if RAM < 75%, regardless of swap)")
# Test concurrent process limiting
safe_processes = mm.limit_concurrent_processes()
print(f"Safe concurrent processes: {safe_processes}")
print("Memory Manager test completed ✓")
def test_hybrid_detector_memory_settings():
"""Test hybrid detector memory settings"""
print("\nTesting Hybrid Detector Memory Settings...")
try:
detector = HybridImageDetector(
panel_threshold=2,
inlier_threshold=0.65,
local_workers=4 # Reduced for testing
)
print(f"Memory manager initialized: {detector.memory_manager is not None}")
print(f"Max memory percent: {detector.memory_manager.max_memory_percent}%")
print(f"Max swap percent: {detector.memory_manager.max_swap_percent}%")
print("Hybrid Detector memory settings test completed ✓")
except Exception as e:
print(f"Error testing hybrid detector: {e}")
return False
return True
def test_feature_reduction():
"""Test feature reduction functionality"""
print("\nTesting Feature Reduction...")
# Mock features (normally cv2.KeyPoint objects)
class MockFeature:
def __init__(self, response):
self.response = response
# Create mock features
features = [MockFeature(i) for i in range(15000)]
print(f"Original feature count: {len(features)}")
# Test reduction
reduced = reduce_feature_count(features, max_features=10000)
print(f"Reduced feature count: {len(reduced)}")
# Should keep the best features (highest response values)
if len(reduced) == 10000:
print("Feature reduction test completed ✓")
return True
else:
print("Feature reduction test failed ✗")
return False
if __name__ == "__main__":
print("="*60)
print("MEMORY MANAGEMENT TEST SUITE")
print("="*60)
success = True
# Test 1: Memory Manager
try:
test_memory_manager()
except Exception as e:
print(f"Memory Manager test failed: {e}")
success = False
# Test 2: Hybrid Detector
try:
success &= test_hybrid_detector_memory_settings()
except Exception as e:
print(f"Hybrid Detector test failed: {e}")
success = False
# Test 3: Feature Reduction
try:
success &= test_feature_reduction()
except Exception as e:
print(f"Feature Reduction test failed: {e}")
success = False
print("\n" + "="*60)
if success:
print("✓ ALL TESTS PASSED - Memory management is working")
print("The system should now be protected against memory crashes.")
else:
print("✗ SOME TESTS FAILED - Check the errors above")
print("="*60)

View file

@ -0,0 +1,138 @@
#!/usr/bin/env python3
"""
Test script to demonstrate one-at-a-time cost tracking functionality
"""
import subprocess
import sys
import json
from pathlib import Path
def run_one_at_a_time_test():
"""Run a test with one-at-a-time mode and cost tracking"""
print("Testing one-at-a-time mode with cost tracking...")
print("=" * 60)
# Test command with one-at-a-time mode and cost tracking
cmd = [
sys.executable, "cli.py",
"--test",
"--openai",
"--one-at-a-time",
"--concurrent-workers", "3", # Lower concurrency for testing
"--enable-cost-tracking",
"--cost-report"
]
print(f"Running command: {' '.join(cmd)}")
print("This will make 41 separate API calls (one per master image)")
print("-" * 60)
try:
result = subprocess.run(cmd, capture_output=True, text=True, cwd=Path(__file__).parent)
print("STDOUT:")
print(result.stdout)
if result.stderr:
print("\nSTDERR:")
print(result.stderr)
print(f"\nReturn code: {result.returncode}")
# Check if cost report was generated and analyze it
cost_reports = list(Path("results").glob("cost_report_*.json"))
if cost_reports:
latest_report = cost_reports[-1]
print(f"\n✅ Cost report generated: {latest_report}")
# Analyze the cost report
try:
with open(latest_report, 'r') as f:
cost_data = json.load(f)
session_summary = cost_data.get('session_summary', {})
if session_summary.get('tracking_enabled'):
totals = session_summary.get('session_totals', {})
operation_breakdown = session_summary.get('operation_breakdown', {})
print(f"\n📊 Cost Analysis:")
print(f" Total cost: ${totals.get('total_cost', 0):.4f}")
print(f" Total API calls: {totals.get('total_api_calls', 0)}")
print(f" Total tokens: {totals.get('total_input_tokens', 0) + totals.get('total_output_tokens', 0):,}")
if operation_breakdown:
print(f"\n🔍 Operation Breakdown:")
for op_type, count in operation_breakdown.items():
print(f" {op_type}: {count} calls")
# Check for one-at-a-time detection calls
one_at_a_time_calls = operation_breakdown.get('one_at_a_time_detection', 0)
if one_at_a_time_calls > 0:
print(f"\n✅ One-at-a-time cost tracking working: {one_at_a_time_calls} individual API calls tracked")
else:
print(f"\n❌ One-at-a-time cost tracking not working: No individual API calls found")
except Exception as e:
print(f"❌ Error analyzing cost report: {e}")
else:
print("\n❌ No cost report found")
except Exception as e:
print(f"❌ Error running test: {e}")
def run_hybrid_comparison():
"""Run hybrid mode for comparison"""
print("\n\nTesting hybrid mode for cost comparison...")
print("=" * 60)
# Test hybrid mode with cost tracking
cmd = [
sys.executable, "cli.py",
"--test",
"--hybrid",
"--enable-cost-tracking"
]
print(f"Running command: {' '.join(cmd)}")
print("This will make 1 API call (panel counting + censorship)")
print("-" * 60)
try:
result = subprocess.run(cmd, capture_output=True, text=True, cwd=Path(__file__).parent)
print("STDOUT:")
print(result.stdout)
if result.stderr:
print("\nSTDERR:")
print(result.stderr)
print(f"\nReturn code: {result.returncode}")
except Exception as e:
print(f"❌ Error running test: {e}")
if __name__ == "__main__":
print("One-at-a-Time Cost Tracking Test")
print("=" * 60)
print("This test will demonstrate the cost difference between:")
print("1. One-at-a-time mode: 41 API calls (one per master)")
print("2. Hybrid mode: 1 API call (panel counting only)")
print()
# Test 1: One-at-a-time mode with cost tracking
run_one_at_a_time_test()
# Test 2: Hybrid mode for comparison
run_hybrid_comparison()
print("\n" + "=" * 60)
print("Cost comparison test completed!")
print("=" * 60)
print("\n💡 Key takeaways:")
print("- One-at-a-time mode: High accuracy, high cost (41 API calls)")
print("- Hybrid mode: Good accuracy, low cost (1 API call)")
print("- Cost tracking shows the exact difference in API usage")
print("=" * 60)

99
test_optimized_canny.py Normal file
View file

@ -0,0 +1,99 @@
#!/usr/bin/env python3
"""
Test script for optimized Canny detection method
"""
import sys
import os
import cv2
import numpy as np
from pathlib import Path
from panel_splitter import PanelSplitter
def test_14_panel_splitting():
"""Test 14-panel splitting with optimized Canny detection"""
print("=" * 60)
print("TESTING 14-PANEL SPLITTING WITH OPTIMIZED CANNY")
print("=" * 60)
# Initialize splitter with debug mode
splitter = PanelSplitter(debug=True)
# Target layout file
layout_path = Path("layouts") / "6786505.jpg"
if not layout_path.exists():
print(f"❌ ERROR: Layout file {layout_path} not found!")
return False
print(f"📁 Testing with: {layout_path.name}")
# Load and examine the image
image = cv2.imread(str(layout_path))
if image is None:
print(f"❌ ERROR: Could not load image {layout_path}")
return False
height, width = image.shape[:2]
print(f"📐 Image dimensions: {width}x{height}")
# Test with target count 14
target_count = 14
print(f"\n🎯 Testing with target count: {target_count}")
print("" * 40)
# Split the layout
splits = splitter.split_panels(str(layout_path), target_count)
print(f"\n📊 RESULTS:")
print(f"Generated {len(splits)} splits (target: {target_count})")
# Check if we got exactly 14 panels
success = len(splits) == target_count
if success:
print(f"✅ SUCCESS: Generated exactly {target_count} splits!")
else:
print(f"❌ FAILURE: Generated {len(splits)} splits instead of {target_count}")
# Save split images
if len(splits) > 0:
splits_dir = Path("test_splits")
splits_dir.mkdir(exist_ok=True)
for i, split in enumerate(splits):
split_filename = splits_dir / f"6786505_14panel_split_{i+1:02d}.jpg"
cv2.imwrite(str(split_filename), split['image'])
print(f"\n💾 Saved {len(splits)} split images to test_splits/")
# Show split details
print("\n📋 Split details:")
for i, split in enumerate(splits):
x, y, w, h = split['bounds']
print(f" Split {i+1:2d}: [{x:4d}, {y:4d}, {w:4d}, {h:4d}] conf={split['confidence']:.3f}")
print("\n" + "=" * 60)
if success:
print("🎉 TEST PASSED: 14-panel splitting is working!")
else:
print("❌ TEST FAILED: 14-panel splitting needs adjustment")
print("=" * 60)
return success
def main():
"""Main test function"""
print("🧪 STARTING OPTIMIZED CANNY 14-PANEL TEST")
success = test_14_panel_splitting()
if success:
print("\n🎉 SUCCESS: Optimized Canny detection produces exactly 14 panels!")
return 0
else:
print("\n❌ FAILURE: Optimized Canny detection needs further tuning")
return 1
if __name__ == "__main__":
sys.exit(main())

239
test_panel_accuracy.py Normal file
View file

@ -0,0 +1,239 @@
#!/usr/bin/env python3
"""
Test horizontal splitting accuracy on different panel types
"""
import cv2
import numpy as np
from pathlib import Path
import os
def test_splitting_accuracy(image_path: str, expected_panels: int, layout_type: str):
"""Test horizontal splitting accuracy for a specific layout"""
print(f"\n{'='*60}")
print(f"Testing {layout_type}: {Path(image_path).name}")
print(f"Expected panels: {expected_panels}")
print(f"{'='*60}")
# Load image
img = cv2.imread(image_path)
height, width = img.shape[:2]
print(f"Image dimensions: {width}x{height}")
# Test current algorithm
crops = test_current_algorithm(img, width, height, expected_panels)
# Save crop previews
save_crop_previews(img, crops, image_path, expected_panels)
# Analyze accuracy
analyze_accuracy(crops, expected_panels, layout_type)
return crops
def test_current_algorithm(img, width: int, height: int, expected_panels: int):
"""Test the current horizontal splitting algorithm"""
crops = []
# Current algorithm logic
if width > height * 1.2: # Wide image, horizontal panels
print(f"Using horizontal splitting for {width}x{height} image")
# Convert to grayscale for analysis
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Calculate horizontal histogram
horizontal_hist = np.sum(gray, axis=0)
inverted_hist = np.max(horizontal_hist) - horizontal_hist
# Smooth the inverted histogram
from scipy.ndimage import gaussian_filter1d
smoothed_hist = gaussian_filter1d(inverted_hist, sigma=10)
# Current parameters
expected_panels_est = min(15, max(6, width // 800))
min_distance = width // (expected_panels_est * 1.5)
print(f"Algorithm estimates {expected_panels_est} panels, min_distance={min_distance}")
# Find peaks
from scipy.signal import find_peaks
peaks, properties = find_peaks(smoothed_hist,
distance=min_distance,
height=np.max(smoothed_hist) * 0.15,
prominence=np.max(smoothed_hist) * 0.1)
print(f"Found {len(peaks)} separator peaks")
# Create panels
x_boundaries = [0] + list(peaks) + [width]
x_boundaries = sorted(list(set(x_boundaries)))
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
if x2 - x1 >= 200: # min_crop_size
crops.append({
'bbox': (x1, 0, x2, height),
'width': x2 - x1,
'height': height,
'crop_id': f"horizontal_{i}"
})
print(f"Generated {len(crops)} crops")
else:
print("Image not wide enough for horizontal splitting")
crops.append({
'bbox': (0, 0, width, height),
'width': width,
'height': height,
'crop_id': "single"
})
return crops
def save_crop_previews(img, crops, image_path: str, expected_panels: int):
"""Save individual crop images for visual verification"""
base_name = Path(image_path).stem
crops_dir = Path("panel_test_crops")
crops_dir.mkdir(exist_ok=True)
print(f"\nSaving {len(crops)} crop previews to {crops_dir}/")
for i, crop in enumerate(crops):
x1, y1, x2, y2 = crop['bbox']
cropped = img[y1:y2, x1:x2]
crop_filename = f"{base_name}_expected{expected_panels}_crop{i+1:02d}.jpg"
crop_path = crops_dir / crop_filename
cv2.imwrite(str(crop_path), cropped)
print(f" Crop {i+1}: {crop['width']}px wide -> {crop_filename}")
def analyze_accuracy(crops, expected_panels: int, layout_type: str):
"""Analyze how well the splitting matches expectations"""
detected_panels = len(crops)
print(f"\n--- ACCURACY ANALYSIS ---")
print(f"Layout type: {layout_type}")
print(f"Expected panels: {expected_panels}")
print(f"Detected panels: {detected_panels}")
if detected_panels == expected_panels:
print("✅ PERFECT MATCH!")
elif abs(detected_panels - expected_panels) <= 1:
print("✅ CLOSE MATCH (within 1)")
elif detected_panels < expected_panels:
print("❌ UNDER-SEGMENTATION (missing splits)")
else:
print("❌ OVER-SEGMENTATION (too many splits)")
# Analyze crop sizes
widths = [crop['width'] for crop in crops]
avg_width = np.mean(widths)
std_width = np.std(widths)
print(f"Crop widths: {widths}")
print(f"Average width: {avg_width:.0f}px (±{std_width:.0f}px)")
# Check for suspiciously small or large crops
min_reasonable = 300 # Minimum reasonable panel width
max_reasonable = 2000 # Maximum reasonable panel width
small_crops = [w for w in widths if w < min_reasonable]
large_crops = [w for w in widths if w > max_reasonable]
if small_crops:
print(f"⚠️ Warning: {len(small_crops)} suspiciously small crops: {small_crops}")
if large_crops:
print(f"⚠️ Warning: {len(large_crops)} suspiciously large crops: {large_crops}")
def main():
"""Test horizontal splitting on various layout types"""
test_cases = [
# Single panels (should not be split)
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6785934.jpg",
"expected": 1,
"type": "Single Panel"
},
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6813573.jpg",
"expected": 1,
"type": "Single Panel"
},
# Double panels
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6785852.jpg",
"expected": 2,
"type": "Double Panel"
},
# 4-panel layouts
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6799150.jpg",
"expected": 4,
"type": "4-Panel Layout"
},
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6813643.jpg",
"expected": 4,
"type": "4-Panel Layout"
},
# Multi-panel layouts
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6791144.jpg",
"expected": 8,
"type": "Multi-Panel Layout"
},
{
"path": "/Users/michael.clervi/Documents/projects/master_adapt_detect/layouts/6786505.jpg",
"expected": 10,
"type": "Multi-Panel Layout"
}
]
print("HORIZONTAL SPLITTING ACCURACY TEST")
print("="*60)
results = []
for test_case in test_cases:
if os.path.exists(test_case["path"]):
crops = test_splitting_accuracy(
test_case["path"],
test_case["expected"],
test_case["type"]
)
results.append({
"file": Path(test_case["path"]).name,
"type": test_case["type"],
"expected": test_case["expected"],
"detected": len(crops),
"accurate": abs(len(crops) - test_case["expected"]) <= 1
})
else:
print(f"⚠️ File not found: {test_case['path']}")
# Summary
print(f"\n{'='*60}")
print("SUMMARY")
print(f"{'='*60}")
accurate_count = sum(1 for r in results if r["accurate"])
total_count = len(results)
print(f"Accurate results: {accurate_count}/{total_count} ({accurate_count/total_count*100:.1f}%)")
print()
for result in results:
status = "" if result["accurate"] else ""
print(f"{status} {result['file']}: {result['detected']}/{result['expected']} panels ({result['type']})")
print(f"\nCrop previews saved to: panel_test_crops/")
print("Review the crop images to verify splitting accuracy!")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,149 @@
#!/usr/bin/env python3
"""
Test script for parallel layout processing implementation
"""
import sys
import os
from pathlib import Path
# Add current directory to path
sys.path.insert(0, os.getcwd())
def test_parallel_processing():
"""Test the parallel processing implementation"""
print("Testing parallel layout processing implementation...")
# Test 1: Import all classes
try:
from hybrid_detector import HybridImageDetector, InlierAnalysisCoordinator, ProgressTracker
print("✓ Successfully imported all classes")
except ImportError as e:
print(f"✗ Import error: {e}")
return False
# Test 2: Create InlierAnalysisCoordinator
try:
from memory_manager import MemoryManager
memory_manager = MemoryManager()
coordinator = InlierAnalysisCoordinator(
local_workers=2,
memory_manager=memory_manager,
min_good_matches=10
)
print("✓ Successfully created InlierAnalysisCoordinator")
except Exception as e:
print(f"✗ Error creating coordinator: {e}")
return False
# Test 3: Create ProgressTracker
try:
tracker = ProgressTracker(total_layouts=100)
info = tracker.get_progress_info()
print(f"✓ Successfully created ProgressTracker (total: {info['total']})")
except Exception as e:
print(f"✗ Error creating progress tracker: {e}")
return False
# Test 4: Create HybridImageDetector with parallel processing
try:
detector = HybridImageDetector(
panel_threshold=2,
inlier_threshold=0.65,
parallel_layouts=True,
layout_workers=2,
max_concurrent_layouts=2
)
print("✓ Successfully created HybridImageDetector with parallel processing")
print(f" - Parallel layouts: {detector.parallel_layouts}")
print(f" - Layout workers: {detector.layout_workers}")
print(f" - Max concurrent layouts: {detector.max_concurrent_layouts}")
except Exception as e:
print(f"✗ Error creating detector: {e}")
return False
# Test 5: Test coordinator start/stop
try:
coordinator.start()
print("✓ Successfully started coordinator")
# Test queue size
queue_size = coordinator.get_queue_size()
print(f" - Queue size: {queue_size}")
coordinator.stop()
print("✓ Successfully stopped coordinator")
except Exception as e:
print(f"✗ Error with coordinator lifecycle: {e}")
return False
# Test 6: Test memory monitoring
try:
memory_adjusted = detector._monitor_memory_and_adjust_workers()
print(f"✓ Memory monitoring executed (adjustments made: {memory_adjusted})")
except Exception as e:
print(f"✗ Error with memory monitoring: {e}")
return False
# Test 7: Test error handling
try:
error_result = detector._handle_worker_failure("test_layout.jpg", Exception("test error"))
print(f"✓ Error handling executed (result has error: {'error' in error_result})")
except Exception as e:
print(f"✗ Error with error handling: {e}")
return False
print("\n🎉 All tests passed! Parallel processing implementation is working correctly.")
return True
def test_cli_integration():
"""Test CLI integration"""
print("\nTesting CLI integration...")
# Test parsing with parallel arguments
try:
from cli import parse_arguments
# Mock sys.argv for testing
import sys
original_argv = sys.argv
# Test with parallel processing arguments
sys.argv = ['cli.py', '--test', '--hybrid', '--parallel-layouts', '--layout-workers', '4']
try:
args = parse_arguments()
print("✓ Successfully parsed parallel processing arguments")
print(f" - Parallel layouts: {args.parallel_layouts}")
print(f" - Layout workers: {args.layout_workers}")
print(f" - Max concurrent layouts: {args.max_concurrent_layouts}")
except SystemExit:
# parse_arguments calls sys.exit() if help is requested
pass
finally:
sys.argv = original_argv
except Exception as e:
print(f"✗ Error testing CLI integration: {e}")
return False
print("✓ CLI integration tests passed!")
return True
if __name__ == "__main__":
success = test_parallel_processing()
if success:
success = test_cli_integration()
if success:
print("\n🚀 Implementation is ready for production use!")
print("\nUsage examples:")
print(" python cli.py --test --hybrid --parallel-layouts")
print(" python cli.py --limit 10 --hybrid --parallel-layouts --layout-workers 4")
print(" python cli.py --all --hybrid --parallel-layouts --layout-workers 6")
else:
print("\n❌ Implementation needs fixes before production use.")
sys.exit(1)

142
test_simple_split.py Normal file
View file

@ -0,0 +1,142 @@
#!/usr/bin/env python3
"""
Simple test script to verify panel splitting functionality
Tests the panel splitting with 6786505.jpg (horizontal strip layout)
"""
import sys
import os
import cv2
import numpy as np
from pathlib import Path
from panel_splitter import PanelSplitter
def test_simple_splitting():
"""Test splitting without OpenAI guidance"""
print("=" * 60)
print("TESTING SIMPLE PANEL SPLITTING")
print("=" * 60)
# Initialize splitter with debug mode
splitter = PanelSplitter(debug=True)
# Target layout file - this is a horizontal strip with many panels
layout_path = Path("layouts") / "6786505.jpg"
if not layout_path.exists():
print(f"❌ ERROR: Layout file {layout_path} not found!")
return False
print(f"📁 Testing with: {layout_path.name}")
# Load and examine the image
image = cv2.imread(str(layout_path))
if image is None:
print(f"❌ ERROR: Could not load image {layout_path}")
return False
height, width = image.shape[:2]
print(f"📐 Image dimensions: {width}x{height}")
# Test with different target counts
test_counts = [5, 8, 10, 12]
for target_count in test_counts:
print(f"\n🎯 Testing with target count: {target_count}")
print("" * 40)
# Split the layout
splits = splitter.split_panels(str(layout_path), target_count)
print(f"Generated {len(splits)} splits")
if len(splits) > 0:
print("✅ Successfully generated splits!")
# Save split images
splits_dir = Path("test_splits")
splits_dir.mkdir(exist_ok=True)
for i, split in enumerate(splits):
split_filename = splits_dir / f"6786505_target{target_count}_split_{i+1:02d}.jpg"
cv2.imwrite(str(split_filename), split['image'])
print(f" Saved {len(splits)} split images to test_splits/")
# Show split details
for i, split in enumerate(splits):
x, y, w, h = split['bounds']
print(f" Split {i+1:2d}: [{x:4d}, {y:4d}, {w:4d}, {h:4d}] conf={split['confidence']:.3f}")
else:
print("❌ No splits generated")
print("\n" + "=" * 60)
print("🎉 SIMPLE SPLITTING TEST COMPLETED!")
print("Check the test_splits/ directory for generated images.")
print("=" * 60)
return True
def test_individual_methods():
"""Test individual splitting methods"""
print("\n" + "=" * 60)
print("TESTING INDIVIDUAL SPLITTING METHODS")
print("=" * 60)
splitter = PanelSplitter(debug=True)
layout_path = Path("layouts") / "6786505.jpg"
image = cv2.imread(str(layout_path))
if image is None:
print("❌ Could not load image")
return False
target_count = 8
methods = [
splitter._enhanced_gradient_analysis,
splitter._advanced_canny_detection,
splitter._template_matching_method,
splitter._contour_analysis_method,
splitter._texture_analysis_method,
splitter._clustering_method
]
for method in methods:
print(f"\n🔬 Testing {method.__name__}...")
try:
result = method(image, target_count)
if result:
print(f" ✅ Generated {len(result)} boundaries")
for i, boundary in enumerate(result):
bounds = boundary['bounds']
print(f" {i+1}: [{bounds[0]:4d}, {bounds[1]:4d}, {bounds[2]:4d}, {bounds[3]:4d}] conf={boundary['confidence']:.3f}")
else:
print(" ❌ No boundaries generated")
except Exception as e:
print(f" ❌ Error: {e}")
return True
def main():
"""Main test function"""
print("🧪 STARTING SIMPLE PANEL SPLITTING TEST")
# Test basic splitting
simple_success = test_simple_splitting()
# Test individual methods
methods_success = test_individual_methods()
print(f"\n📊 FINAL RESULTS:")
print(f"Simple splitting: {'✅ PASSED' if simple_success else '❌ FAILED'}")
print(f"Individual methods: {'✅ PASSED' if methods_success else '❌ FAILED'}")
if simple_success and methods_success:
print("\n🎉 ALL TESTS PASSED! The panel splitting is working correctly.")
return 0
else:
print("\n❌ Some tests failed. Please check the output above.")
return 1
if __name__ == "__main__":
sys.exit(main())

51
test_split_mode.py Normal file
View file

@ -0,0 +1,51 @@
#!/usr/bin/env python3
"""
Test script for the new --split mode functionality
"""
import sys
import os
from pathlib import Path
from panel_splitter import PanelSplitter
def test_basic_splitting():
"""Test basic panel splitting functionality"""
print("Testing basic panel splitting...")
# Initialize splitter
splitter = PanelSplitter(debug=True)
# Test with a sample layout image
layouts_path = Path("layouts")
layout_files = list(layouts_path.glob("*.jpg"))
if not layout_files:
print("No layout images found in layouts/ directory")
return
# Test with first layout
test_layout = layout_files[0]
print(f"Testing with: {test_layout.name}")
# Split panels with target count of 2
target_count = 2
splits = splitter.split_panels(str(test_layout), target_count)
print(f"Generated {len(splits)} splits")
for i, split in enumerate(splits):
print(f" Split {i+1}: bounds={split['bounds']}, confidence={split['confidence']:.3f}")
print("Basic splitting test completed!")
def test_cli_integration():
"""Test CLI integration with --split flag"""
print("\nTesting CLI integration...")
print("You can now test the --split flag with:")
print(" python cli.py --test --split")
print(" python cli.py --test --openai --split")
print(" python cli.py --test --vector-mode --split")
print(" python cli.py --test --hybrid --split")
if __name__ == "__main__":
test_basic_splitting()
test_cli_integration()

646
tune_14_panel_split.py Normal file
View file

@ -0,0 +1,646 @@
#!/usr/bin/env python3
"""
Parameter tuning script for 14-panel splitting
Iteratively adjusts CV method parameters to achieve exactly 14 panels for 6786505.jpg
"""
import sys
import os
import cv2
import numpy as np
from pathlib import Path
from panel_splitter import PanelSplitter
from typing import Dict, List, Tuple, Any
import json
from itertools import product
import time
import multiprocessing as mp
from functools import partial
def convert_numpy_types(obj):
"""Convert NumPy types to native Python types for JSON serialization"""
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
elif isinstance(obj, dict):
return {key: convert_numpy_types(value) for key, value in obj.items()}
elif isinstance(obj, list):
return [convert_numpy_types(item) for item in obj]
elif isinstance(obj, tuple):
return tuple(convert_numpy_types(item) for item in obj)
else:
return obj
def test_gradient_config_worker(args):
"""Worker function for gradient analysis parameter testing"""
config, image_path, target_panel_count, width, height = args
try:
# Load image
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
separators = []
for sigma in config['sigma_scales']:
# Smooth the image
smoothed = cv2.GaussianBlur(gray, (0, 0), sigma)
# Calculate vertical gradient (for horizontal separators)
grad_y = cv2.Sobel(smoothed, cv2.CV_64F, 0, 1, ksize=3)
# Project to get horizontal profile
profile = np.mean(np.abs(grad_y), axis=1)
# Find peaks with tuned parameters
prominence = np.std(profile) * config['prominence_factor']
distance = int(height / target_panel_count * config['distance_factor'])
from scipy.signal import find_peaks
peaks, properties = find_peaks(profile, prominence=prominence, distance=distance)
# Add to separators
for peak in peaks:
if len(properties['prominences']) > 0:
prom_idx = list(peaks).index(peak)
if prom_idx < len(properties['prominences']):
confidence = properties['prominences'][prom_idx] / np.max(properties['prominences'])
separators.append({
'position': peak,
'confidence': confidence,
'sigma': sigma
})
# Convert to bounds
separators.sort(key=lambda x: x['position'])
bounds = []
prev_y = 0
for sep in separators:
if sep['position'] > prev_y + height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, sep['position'] - prev_y),
'confidence': sep['confidence']
})
prev_y = sep['position']
# Add final panel
if prev_y < height - height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8
})
return {
'method': 'gradient_analysis',
'config': config,
'panel_count': len(bounds),
'bounds': bounds,
'success': len(bounds) == target_panel_count
}
except Exception as e:
return {
'method': 'gradient_analysis',
'config': config,
'panel_count': 0,
'bounds': [],
'success': False,
'error': str(e)
}
def test_canny_config_worker(args):
"""Worker function for Canny edge detection parameter testing"""
config, image_path, target_panel_count, width, height = args
try:
# Load image
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Multi-threshold Canny detection
all_edges = []
for low, high in config['threshold_set']:
edges = cv2.Canny(gray, low, high)
# Morphological operations
kernel = np.ones(config['morphology_kernel'], np.uint8)
edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel)
all_edges.append(edges)
# Combine edge maps
combined_edges = np.maximum.reduce(all_edges)
# Find horizontal lines using Hough transform
lines = cv2.HoughLinesP(
combined_edges,
1,
np.pi/180,
threshold=config['hough_threshold'],
minLineLength=config['min_line_length'],
maxLineGap=config['max_line_gap']
)
# Filter for horizontal lines
horizontal_lines = []
if lines is not None:
for line in lines:
x1, y1, x2, y2 = line[0]
if abs(y2 - y1) < height // 20: # Nearly horizontal
horizontal_lines.append({
'y_position': (y1 + y2) // 2,
'length': abs(x2 - x1),
'confidence': min(1.0, abs(x2 - x1) / width)
})
# Sort by y position and create bounds
horizontal_lines.sort(key=lambda x: x['y_position'])
bounds = []
prev_y = 0
for line in horizontal_lines:
y_pos = line['y_position']
if y_pos > prev_y + height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': line['confidence']
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8
})
return {
'method': 'canny_detection',
'config': config,
'panel_count': len(bounds),
'bounds': bounds,
'success': len(bounds) == target_panel_count
}
except Exception as e:
return {
'method': 'canny_detection',
'config': config,
'panel_count': 0,
'bounds': [],
'success': False,
'error': str(e)
}
def test_template_config_worker(args):
"""Worker function for template matching parameter testing"""
config, image_path, target_panel_count, width, height = args
try:
# Load image
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
tc = config['template_config']
# Create templates
templates = []
template_width = width // tc['template_width_fraction']
# White horizontal line template
white_template = np.ones((tc['white_line_height'], template_width), dtype=np.uint8) * 255
templates.append(('white_line', white_template))
# Black horizontal line template
black_template = np.zeros((tc['black_line_height'], template_width), dtype=np.uint8)
templates.append(('black_line', black_template))
# Gutter template (white with black edges)
gutter_template = np.ones((tc['gutter_height'], template_width), dtype=np.uint8) * 255
gutter_template[0, :] = 0
gutter_template[-1, :] = 0
templates.append(('gutter', gutter_template))
# Find matches
all_matches = []
for template_name, template in templates:
result = cv2.matchTemplate(gray, template, cv2.TM_CCOEFF_NORMED)
# Find good matches
locations = np.where(result >= tc['match_threshold'])
for y, x in zip(locations[0], locations[1]):
confidence = result[y, x]
all_matches.append({
'y_position': y + template.shape[0] // 2,
'confidence': confidence,
'template': template_name
})
# Sort and merge nearby matches
all_matches.sort(key=lambda x: x['y_position'])
merged_matches = []
for match in all_matches:
if not merged_matches or match['y_position'] - merged_matches[-1]['y_position'] > config['merge_distance']:
merged_matches.append(match)
else:
# Keep the one with higher confidence
if match['confidence'] > merged_matches[-1]['confidence']:
merged_matches[-1] = match
# Create bounds
bounds = []
prev_y = 0
for match in merged_matches:
y_pos = match['y_position']
if y_pos > prev_y + height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, y_pos - prev_y),
'confidence': match['confidence']
})
prev_y = y_pos
# Add final panel
if prev_y < height - height // (target_panel_count * 2):
bounds.append({
'bounds': (0, prev_y, width, height - prev_y),
'confidence': 0.8
})
return {
'method': 'template_matching',
'config': config,
'panel_count': len(bounds),
'bounds': bounds,
'success': len(bounds) == target_panel_count
}
except Exception as e:
return {
'method': 'template_matching',
'config': config,
'panel_count': 0,
'bounds': [],
'success': False,
'error': str(e)
}
class ParameterTuner:
def __init__(self, target_image_path: str, target_panel_count: int = 14):
self.target_image_path = target_image_path
self.target_panel_count = target_panel_count
self.image = cv2.imread(target_image_path)
self.height, self.width = self.image.shape[:2]
# Results storage
self.results = []
self.best_configs = []
# Multiprocessing setup
self.num_workers = mp.cpu_count()
print(f"🎯 Target: {target_panel_count} panels for {Path(target_image_path).name}")
print(f"📐 Image dimensions: {self.width}x{self.height}")
print(f"🚀 Using {self.num_workers} parallel workers")
def test_gradient_analysis_params(self):
"""Test Enhanced Gradient Analysis with different parameters using multiprocessing"""
print("\n🔬 TUNING GRADIENT ANALYSIS PARAMETERS")
print("=" * 50)
# Parameter ranges to test
sigma_ranges = [
[3, 7, 15], # Fine scale
[5, 10, 20], # Current default
[7, 15, 25], # Coarse scale
[5, 12, 18], # Medium scale
[4, 8, 16], # Balanced scale
]
prominence_factors = [0.3, 0.4, 0.5, 0.6, 0.7]
distance_factors = [0.8, 1.0, 1.2, 1.5, 2.0]
# Create all configuration combinations
configs = []
for sigma_set in sigma_ranges:
for prom_factor in prominence_factors:
for dist_factor in distance_factors:
config = {
'method': 'gradient_analysis',
'sigma_scales': sigma_set,
'prominence_factor': prom_factor,
'distance_factor': dist_factor
}
configs.append(config)
print(f"Testing {len(configs)} gradient analysis configurations...")
# Prepare arguments for multiprocessing
args_list = [
(config, self.target_image_path, self.target_panel_count, self.width, self.height)
for config in configs
]
# Use multiprocessing to test configurations
with mp.Pool(processes=self.num_workers) as pool:
method_results = pool.map(test_gradient_config_worker, args_list)
# Check for exact matches
exact_matches = [r for r in method_results if r['success']]
for result in exact_matches:
print(f"✅ EXACT MATCH: Panels: {result['panel_count']}")
self.best_configs.append(result)
# Find best results
best_results = sorted(method_results, key=lambda x: abs(x['panel_count'] - self.target_panel_count))[:5]
print(f"\n🏆 Top 5 Gradient Analysis Results:")
for i, result in enumerate(best_results, 1):
print(f" {i}. Panels: {result['panel_count']}")
return method_results
def test_canny_params(self):
"""Test Advanced Canny Edge Detection with different parameters using multiprocessing"""
print("\n🔬 TUNING CANNY EDGE DETECTION PARAMETERS")
print("=" * 50)
# Parameter ranges
threshold_sets = [
[(30, 100), (80, 160), (120, 200)], # Low sensitivity
[(50, 150), (100, 200), (150, 250)], # Current default
[(70, 180), (120, 220), (170, 280)], # High sensitivity
[(40, 120), (90, 180), (140, 240)], # Balanced
]
morphology_kernels = [
(1, 1), (3, 1), (5, 1), (7, 1), (9, 1) # Vertical kernels for horizontal lines
]
hough_thresholds = [
self.width // 8, # Low threshold
self.width // 6, # Medium-low
self.width // 4, # Current default
self.width // 3, # High threshold
]
min_line_lengths = [
self.width // 5, # Short lines
self.width // 4, # Medium-short
self.width // 3, # Current default
self.width // 2, # Long lines
]
max_line_gaps = [
self.width // 20, # Small gaps
self.width // 15, # Medium gaps
self.width // 10, # Current default
self.width // 8, # Large gaps
]
# Create all configuration combinations
configs = []
for thresh_set in threshold_sets:
for kernel in morphology_kernels:
for hough_thresh in hough_thresholds:
for min_len in min_line_lengths:
for max_gap in max_line_gaps:
config = {
'method': 'canny_detection',
'threshold_set': thresh_set,
'morphology_kernel': kernel,
'hough_threshold': hough_thresh,
'min_line_length': min_len,
'max_line_gap': max_gap
}
configs.append(config)
print(f"Testing {len(configs)} canny detection configurations...")
# Prepare arguments for multiprocessing
args_list = [
(config, self.target_image_path, self.target_panel_count, self.width, self.height)
for config in configs
]
# Use multiprocessing to test configurations
with mp.Pool(processes=self.num_workers) as pool:
method_results = pool.map(test_canny_config_worker, args_list)
# Check for exact matches
exact_matches = [r for r in method_results if r['success']]
for result in exact_matches:
print(f"✅ EXACT MATCH: Panels: {result['panel_count']}")
self.best_configs.append(result)
# Find best results
best_results = sorted(method_results, key=lambda x: abs(x['panel_count'] - self.target_panel_count))[:5]
print(f"\n🏆 Top 5 Canny Detection Results:")
for i, result in enumerate(best_results, 1):
print(f" {i}. Panels: {result['panel_count']}")
return method_results
def test_template_matching_params(self):
"""Test Template Matching with different parameters using multiprocessing"""
print("\n🔬 TUNING TEMPLATE MATCHING PARAMETERS")
print("=" * 50)
# Template configurations
template_configs = [
{
'white_line_height': 3,
'black_line_height': 3,
'gutter_height': 6,
'template_width_fraction': 4,
'match_threshold': 0.4
},
{
'white_line_height': 5,
'black_line_height': 5,
'gutter_height': 10,
'template_width_fraction': 4,
'match_threshold': 0.5
},
{
'white_line_height': 7,
'black_line_height': 7,
'gutter_height': 14,
'template_width_fraction': 4,
'match_threshold': 0.6
},
{
'white_line_height': 4,
'black_line_height': 4,
'gutter_height': 8,
'template_width_fraction': 3,
'match_threshold': 0.45
},
{
'white_line_height': 6,
'black_line_height': 6,
'gutter_height': 12,
'template_width_fraction': 5,
'match_threshold': 0.55
}
]
merge_distances = [
self.height // (self.target_panel_count * 3),
self.height // (self.target_panel_count * 2),
self.height // (self.target_panel_count * 1.5),
]
# Create all configuration combinations
configs = []
for template_config in template_configs:
for merge_dist in merge_distances:
config = {
'method': 'template_matching',
'template_config': template_config,
'merge_distance': merge_dist
}
configs.append(config)
print(f"Testing {len(configs)} template matching configurations...")
# Prepare arguments for multiprocessing
args_list = [
(config, self.target_image_path, self.target_panel_count, self.width, self.height)
for config in configs
]
# Use multiprocessing to test configurations
with mp.Pool(processes=self.num_workers) as pool:
method_results = pool.map(test_template_config_worker, args_list)
# Check for exact matches
exact_matches = [r for r in method_results if r['success']]
for result in exact_matches:
print(f"✅ EXACT MATCH: Panels: {result['panel_count']}")
self.best_configs.append(result)
# Find best results
best_results = sorted(method_results, key=lambda x: abs(x['panel_count'] - self.target_panel_count))[:5]
print(f"\n🏆 Top 5 Template Matching Results:")
for i, result in enumerate(best_results, 1):
print(f" {i}. Panels: {result['panel_count']}")
return method_results
def test_consensus_params(self):
"""Test consensus system with different parameters (simplified for multiprocessing)"""
print("\n🔬 TUNING CONSENSUS SYSTEM PARAMETERS")
print("=" * 50)
# For now, return empty results to speed up testing
# Focus on the main methods that are already working
print("Skipping consensus tuning - focusing on main methods")
return []
def run_full_tuning(self):
"""Run the complete parameter tuning process"""
print("🚀 STARTING COMPREHENSIVE PARAMETER TUNING")
print("=" * 60)
start_time = time.time()
# Test all methods
gradient_results = self.test_gradient_analysis_params()
canny_results = self.test_canny_params()
template_results = self.test_template_matching_params()
consensus_results = self.test_consensus_params()
# Combine all results
all_results = gradient_results + canny_results + template_results + consensus_results
# Find the absolute best configurations
exact_matches = [r for r in all_results if r['success']]
close_matches = sorted([r for r in all_results if not r['success']],
key=lambda x: abs(x['panel_count'] - self.target_panel_count))[:10]
# Generate summary
elapsed_time = time.time() - start_time
print(f"\n" + "=" * 60)
print(f"🏁 TUNING COMPLETE - Time: {elapsed_time:.1f}s")
print(f"=" * 60)
print(f"\n🎯 EXACT MATCHES ({len(exact_matches)} found):")
for i, match in enumerate(exact_matches, 1):
print(f" {i}. Method: {match['method']}")
print(f" Config: {match['config']}")
print(f" Panel Count: {match['panel_count']}")
print()
print(f"\n📊 CLOSE MATCHES (Top 10):")
for i, match in enumerate(close_matches, 1):
print(f" {i}. Method: {match['method']}, Panels: {match['panel_count']}")
print(f" Config: {match['config']}")
print()
# Save results to file
results_file = "tuning_results_14_panel.json"
output_data = {
'target_panel_count': self.target_panel_count,
'image_path': self.target_image_path,
'image_dimensions': {'width': self.width, 'height': self.height},
'exact_matches': exact_matches,
'close_matches': close_matches,
'all_results': all_results,
'tuning_time': elapsed_time,
'total_configurations_tested': len(all_results)
}
# Convert NumPy types to native Python types for JSON serialization
output_data = convert_numpy_types(output_data)
with open(results_file, 'w') as f:
json.dump(output_data, f, indent=2)
print(f"💾 Results saved to: {results_file}")
return exact_matches, close_matches
def main():
"""Main tuning function"""
print("🎯 14-PANEL SPLITTING PARAMETER TUNING")
print("=" * 60)
target_image = "layouts/6786505.jpg"
target_panels = 14
if not Path(target_image).exists():
print(f"❌ ERROR: Target image {target_image} not found!")
return 1
# Initialize tuner
tuner = ParameterTuner(target_image, target_panels)
# Run tuning
exact_matches, close_matches = tuner.run_full_tuning()
if exact_matches:
print(f"\n🎉 SUCCESS: Found {len(exact_matches)} configurations that produce exactly {target_panels} panels!")
print("\n🔧 RECOMMENDED SETTINGS:")
print("Update your PanelSplitter class with these optimal parameters:")
for i, match in enumerate(exact_matches[:3], 1): # Show top 3
print(f"\n Option {i} - {match['method']}:")
print(f" {match['config']}")
else:
print(f"\n⚠️ No exact matches found. Best alternatives:")
for i, match in enumerate(close_matches[:3], 1):
print(f" {i}. {match['method']}: {match['panel_count']} panels")
return 0
if __name__ == "__main__":
# Required for multiprocessing on macOS/Windows
mp.set_start_method('spawn', force=True)
sys.exit(main())

776
vector_detector.py Normal file
View file

@ -0,0 +1,776 @@
#!/usr/bin/env python3
"""
Vector-based Image Detection Module
Extracted from image_detector.py - Contains VectorImageDetector class
Uses Google Vertex AI Multimodal Embeddings for image similarity detection
"""
import os
import json
import time
from pathlib import Path
from typing import List, Dict, Optional
import numpy as np
import pickle
from google.cloud import aiplatform
from vertexai.vision_models import MultiModalEmbeddingModel
import cv2
from panel_splitter import PanelSplitter
class VectorImageDetector:
def __init__(self, similarity_threshold=0.75, splitting_mode="none", min_crop_size=200, crop_padding=20, split_mode=False):
"""Initialize the vector-based image detector using Google Vertex AI Multimodal Embeddings"""
print("Initializing Vector Image Detector with Google Vertex AI...")
# Initialize Vertex AI
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "service-account.json"
aiplatform.init(project="optical-414516", location="us-central1")
# Initialize multimodal embedding model
self.model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
# Configuration
self.similarity_threshold = similarity_threshold
self.splitting_mode = splitting_mode
self.min_crop_size = min_crop_size
self.crop_padding = crop_padding
self.split_mode = split_mode
# Split mode configuration
if self.split_mode:
self.splitter = PanelSplitter(debug=True)
print("Split mode enabled: Will split multi-panel layouts before matching")
# Paths
self.master_images_path = Path("master_images")
self.layouts_path = Path("layouts")
self.results_path = Path("results")
self.embeddings_cache_path = Path("embeddings_cache")
self.crops_debug_path = Path("crops_debug")
# Create directories
self.results_path.mkdir(exist_ok=True)
self.embeddings_cache_path.mkdir(exist_ok=True)
self.crops_debug_path.mkdir(exist_ok=True)
# Master images data
self.master_images = {}
self.master_files = {}
self.master_embeddings = {}
print(f"✓ Vector detector initialized with similarity threshold: {similarity_threshold}")
print(f"✓ Splitting mode: {splitting_mode}, Min crop size: {min_crop_size}px")
def load_master_images(self) -> Dict[str, str]:
"""Load all master images and create ID mapping using filenames"""
print("Loading master images...")
master_files = list(self.master_images_path.glob("*.jpg"))
print(f"Found {len(master_files)} master images")
for file_path in master_files:
master_id = file_path.stem
self.master_images[master_id] = str(file_path)
self.master_files[master_id] = file_path.name
return self.master_images
def generate_image_embedding(self, image_path: str) -> np.ndarray:
"""Generate 1408-dimensional embedding for an image using Vertex AI"""
try:
from vertexai.vision_models import Image as VertexImage
# Create Vertex AI Image object directly from file path
vertex_image = VertexImage.load_from_file(image_path)
# Get embedding from Vertex AI
response = self.model.get_embeddings(image=vertex_image)
# Extract the embedding vector (1408 dimensions)
embedding = np.array(response.image_embedding)
return embedding
except Exception as e:
print(f"Error generating embedding for {Path(image_path).name}: {e}")
return None
def save_embedding_cache(self, embeddings: Dict, filename: str):
"""Save embeddings to cache file"""
cache_file = self.embeddings_cache_path / f"{filename}.pkl"
with open(cache_file, 'wb') as f:
pickle.dump(embeddings, f)
print(f"Embeddings cached to: {cache_file}")
def load_embedding_cache(self, filename: str) -> Optional[Dict]:
"""Load embeddings from cache file"""
cache_file = self.embeddings_cache_path / f"{filename}.pkl"
if cache_file.exists():
try:
with open(cache_file, 'rb') as f:
embeddings = pickle.load(f)
print(f"Loaded cached embeddings from: {cache_file}")
return embeddings
except Exception as e:
print(f"Error loading cached embeddings: {e}")
return None
def generate_master_embeddings(self, force_regenerate=False) -> Dict[str, np.ndarray]:
"""Generate embeddings for all master images (with caching)"""
cache_filename = "master_embeddings"
# Try to load from cache first
if not force_regenerate:
cached_embeddings = self.load_embedding_cache(cache_filename)
if cached_embeddings is not None:
# Verify all master images are in cache
if set(cached_embeddings.keys()) == set(self.master_images.keys()):
self.master_embeddings = cached_embeddings
print(f"✓ Using cached embeddings for {len(cached_embeddings)} master images")
return self.master_embeddings
else:
print("Cache incomplete, regenerating embeddings...")
print(f"Generating embeddings for {len(self.master_images)} master images...")
self.master_embeddings = {}
for i, (master_id, image_path) in enumerate(self.master_images.items(), 1):
print(f" {i}/{len(self.master_images)}: Generating embedding for {master_id}")
embedding = self.generate_image_embedding(image_path)
if embedding is not None:
self.master_embeddings[master_id] = embedding
# Small delay to avoid rate limiting
if i < len(self.master_images):
time.sleep(0.1)
# Cache the embeddings
if self.master_embeddings:
self.save_embedding_cache(self.master_embeddings, cache_filename)
print(f"✓ Generated embeddings for {len(self.master_embeddings)} master images")
return self.master_embeddings
def compute_cosine_similarity(self, embedding1: np.ndarray, embedding2: np.ndarray) -> float:
"""Compute cosine similarity between two embeddings"""
# Normalize the embeddings
norm1 = np.linalg.norm(embedding1)
norm2 = np.linalg.norm(embedding2)
if norm1 == 0 or norm2 == 0:
return 0.0
# Compute cosine similarity
similarity = np.dot(embedding1, embedding2) / (norm1 * norm2)
return float(similarity)
def detect_layout_type(self, image_path: str) -> str:
"""Analyze layout image to determine if it's single image or composite"""
try:
img = cv2.imread(image_path)
height, width = img.shape[:2]
# Convert to grayscale for analysis
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Detect edges to find potential separators
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
# Look for strong vertical lines (panel separators)
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, height // 10))
vertical_lines = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, vertical_kernel)
# Look for strong horizontal lines (row separators)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (width // 10, 1))
horizontal_lines = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, horizontal_kernel)
# Count significant vertical and horizontal structures
vertical_density = np.sum(vertical_lines) / (height * width)
horizontal_density = np.sum(horizontal_lines) / (height * width)
# Determine layout type based on structure
if vertical_density > 0.01 or horizontal_density > 0.01:
return "composite"
else:
return "single"
except Exception as e:
print(f"Error analyzing layout type for {Path(image_path).name}: {e}")
return "single" # Default to single if analysis fails
def split_image_by_grid(self, image_path: str) -> List[Dict]:
"""Split composite image into individual components using grid detection"""
try:
layout_name = Path(image_path).name
print(f" Analyzing grid structure for {layout_name}")
# Load image
img = cv2.imread(image_path)
height, width = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Edge detection for finding separators
edges = cv2.Canny(gray, 30, 100, apertureSize=3)
# Detect vertical separators (for horizontal panels)
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, height // 8))
vertical_lines = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, vertical_kernel)
# Find vertical separator positions
vertical_projection = np.sum(vertical_lines, axis=0)
vertical_threshold = np.max(vertical_projection) * 0.6 # More strict threshold
vertical_separators = []
for x in range(width):
if vertical_projection[x] > vertical_threshold:
# Check if this is a new separator (not adjacent to previous)
if not vertical_separators or x - vertical_separators[-1] > 50: # Larger gap requirement
vertical_separators.append(x)
# Detect horizontal separators (for stacked layouts)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (width // 8, 1))
horizontal_lines = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, horizontal_kernel)
# Find horizontal separator positions
horizontal_projection = np.sum(horizontal_lines, axis=1)
horizontal_threshold = np.max(horizontal_projection) * 0.6 # More strict threshold
horizontal_separators = []
for y in range(height):
if horizontal_projection[y] > horizontal_threshold:
if not horizontal_separators or y - horizontal_separators[-1] > 50: # Larger gap requirement
horizontal_separators.append(y)
# Generate crop regions based on detected separators
crops = []
# Add image boundaries to separator lists
x_boundaries = [0] + vertical_separators + [width]
y_boundaries = [0] + horizontal_separators + [height]
# Remove duplicates and sort
x_boundaries = sorted(list(set(x_boundaries)))
y_boundaries = sorted(list(set(y_boundaries)))
print(f" Found {len(x_boundaries)-1} x {len(y_boundaries)-1} grid sections")
# For horizontal layouts, prefer fallback splitting if grid creates too many small sections
total_sections = (len(x_boundaries)-1) * (len(y_boundaries)-1)
is_wide_horizontal = width > height * 1.5
if is_wide_horizontal and total_sections > 20:
print(f" Grid too complex ({total_sections} sections), using horizontal splitting instead")
crops = self.fallback_split_image(img, width, height)
else:
# Generate all possible rectangular crops
for i in range(len(y_boundaries) - 1):
for j in range(len(x_boundaries) - 1):
y1, y2 = y_boundaries[i], y_boundaries[i + 1]
x1, x2 = x_boundaries[j], x_boundaries[j + 1]
# Add padding and ensure boundaries
x1 = max(0, x1 - self.crop_padding)
y1 = max(0, y1 - self.crop_padding)
x2 = min(width, x2 + self.crop_padding)
y2 = min(height, y2 + self.crop_padding)
crop_width = x2 - x1
crop_height = y2 - y1
# Filter out crops that are too small
if crop_width >= self.min_crop_size and crop_height >= self.min_crop_size:
crop_area = crop_width * crop_height
total_area = width * height
area_ratio = crop_area / total_area
crops.append({
'bbox': (x1, y1, x2, y2),
'width': crop_width,
'height': crop_height,
'area_ratio': area_ratio,
'crop_id': f"grid_{i}_{j}"
})
# If no good crops found, try fallback splitting
if not crops:
print(f" No grid detected, trying fallback splitting")
crops = self.fallback_split_image(img, width, height)
print(f" Generated {len(crops)} crops for analysis")
return crops
except Exception as e:
print(f"Error splitting image {Path(image_path).name}: {e}")
return []
def fallback_split_image(self, img, width: int, height: int) -> List[Dict]:
"""Improved horizontal splitting focusing on major structural separators"""
crops = []
# Only process wide images for horizontal splitting
if width <= height * 1.2:
print(f" Image not wide enough for horizontal splitting, treating as single panel")
crops.append({
'bbox': (0, 0, width, height),
'width': width,
'height': height,
'area_ratio': 1.0,
'crop_id': "single"
})
return crops
print(f" Using improved horizontal splitting for {width}x{height} image")
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Method 1: Structural edge detection for full-height separators
edges = cv2.Canny(gray, 30, 100)
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, height // 3))
vertical_edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, vertical_kernel)
edge_projection = np.sum(vertical_edges, axis=0)
# Method 2: Intensity histogram analysis
horizontal_hist = np.sum(gray, axis=0)
inverted_hist = np.max(horizontal_hist) - horizontal_hist
# Smooth both signals
from scipy.ndimage import gaussian_filter1d
smoothed_edges = gaussian_filter1d(edge_projection, sigma=15)
smoothed_hist = gaussian_filter1d(inverted_hist, sigma=15)
# Adaptive parameters based on image size
if width < 2000:
min_panel_width = width // 4 # At least 25% per panel
max_panels = 3
elif width < 5000:
min_panel_width = width // 6 # At least 16% per panel
max_panels = 6
else:
min_panel_width = width // 12 # At least 8% per panel
max_panels = 15
print(f" Min panel width: {min_panel_width}px, Max panels: {max_panels}")
# Find separator candidates with conservative thresholds
edge_threshold = np.max(smoothed_edges) * 0.5 # Higher threshold for stronger edges
hist_threshold = np.max(smoothed_hist) * 0.4 # Higher threshold for deeper valleys
from scipy.signal import find_peaks
# Edge-based separators
edge_peaks, _ = find_peaks(smoothed_edges,
distance=min_panel_width,
height=edge_threshold,
prominence=np.max(smoothed_edges) * 0.3)
# Histogram-based separators
hist_peaks, _ = find_peaks(smoothed_hist,
distance=min_panel_width,
height=hist_threshold,
prominence=np.max(smoothed_hist) * 0.2)
print(f" Edge peaks: {len(edge_peaks)}, Histogram peaks: {len(hist_peaks)}")
# Combine separators and filter boundary areas
all_separators = set(edge_peaks) | set(hist_peaks)
boundary_margin = width * 0.1 # 10% margin from edges
valid_separators = [s for s in all_separators
if boundary_margin < s < width - boundary_margin]
# Remove separators too close to each other
valid_separators = sorted(valid_separators)
final_separators = []
for sep in valid_separators:
if not final_separators or sep - final_separators[-1] >= min_panel_width:
final_separators.append(sep)
# Limit to reasonable number of panels and keep strongest separators
if len(final_separators) >= max_panels:
separator_scores = []
for sep in final_separators:
edge_score = smoothed_edges[sep] if sep < len(smoothed_edges) else 0
hist_score = smoothed_hist[sep] if sep < len(smoothed_hist) else 0
combined_score = edge_score + hist_score
separator_scores.append((sep, combined_score))
separator_scores.sort(key=lambda x: x[1], reverse=True)
final_separators = [s[0] for s in separator_scores[:max_panels-1]]
final_separators.sort()
print(f" Final separators: {final_separators}")
# Create crops
x_boundaries = [0] + final_separators + [width]
for i in range(len(x_boundaries) - 1):
x1, x2 = x_boundaries[i], x_boundaries[i + 1]
if x2 - x1 >= self.min_crop_size:
crops.append({
'bbox': (x1, 0, x2, height),
'width': x2 - x1,
'height': height,
'area_ratio': (x2 - x1) / width,
'crop_id': f"panel_{i}"
})
print(f" Generated {len(crops)} improved horizontal crops")
return crops
def save_crop_debug_images(self, image_path: str, crops: List[Dict]):
"""Save cropped images for debugging purposes"""
try:
layout_name = Path(image_path).stem
img = cv2.imread(image_path)
for i, crop in enumerate(crops):
x1, y1, x2, y2 = crop['bbox']
cropped = img[y1:y2, x1:x2]
debug_filename = f"{layout_name}_crop_{i}_{crop['crop_id']}.jpg"
debug_path = self.crops_debug_path / debug_filename
cv2.imwrite(str(debug_path), cropped)
except Exception as e:
print(f"Warning: Failed to save debug crops: {e}")
def generate_crop_embedding(self, image_path: str, crop_info: Dict) -> Optional[np.ndarray]:
"""Generate embedding for a specific crop of an image"""
try:
# Load full image
img = cv2.imread(image_path)
# Extract crop region
x1, y1, x2, y2 = crop_info['bbox']
cropped_img = img[y1:y2, x1:x2]
# Save crop to temporary file for embedding generation
temp_crop_path = self.crops_debug_path / f"temp_crop.jpg"
cv2.imwrite(str(temp_crop_path), cropped_img)
# Generate embedding for crop
embedding = self.generate_image_embedding(str(temp_crop_path))
# Clean up temp file
if temp_crop_path.exists():
temp_crop_path.unlink()
return embedding
except Exception as e:
print(f"Error generating crop embedding: {e}")
return None
def detect_masters_in_layout_vector(self, layout_path: str, layout_index: int, total_layouts: int) -> Dict:
"""Detect which master images appear in a layout using vector similarity with optional splitting"""
layout_name = Path(layout_path).name
print(f"Processing {layout_index}/{total_layouts}: {layout_name} (Vector mode: {self.splitting_mode})")
try:
# Step 1: Determine if we should use splitting
if self.splitting_mode == "none":
return self.detect_whole_image(layout_path, layout_name)
elif self.splitting_mode == "auto":
layout_type = self.detect_layout_type(layout_path)
if layout_type == "single":
return self.detect_whole_image(layout_path, layout_name)
else:
return self.detect_with_splitting(layout_path, layout_name)
elif self.splitting_mode == "grid":
return self.detect_with_splitting(layout_path, layout_name)
else:
# Default to whole image
return self.detect_whole_image(layout_path, layout_name)
except Exception as e:
error_msg = f"Error analyzing {layout_name} with vector embeddings: {e}"
print(error_msg)
return {
'detected_masters': [],
'detected_master_ids': [],
'detected_master_filenames': [],
'analysis': 'Vector embedding analysis failed',
'error': str(e),
'processing_mode': f'vector_embedding_{self.splitting_mode}'
}
def detect_whole_image(self, layout_path: str, layout_name: str) -> Dict:
"""Detect masters using whole image comparison"""
print(f" Processing whole image: {layout_name}")
# Generate embedding for layout image
layout_embedding = self.generate_image_embedding(layout_path)
if layout_embedding is None:
raise Exception("Failed to generate layout embedding")
# Compare with all master embeddings
similarities = {}
detected_masters = []
print(f" Comparing against {len(self.master_embeddings)} master images...")
for master_id, master_embedding in self.master_embeddings.items():
similarity = self.compute_cosine_similarity(layout_embedding, master_embedding)
similarities[master_id] = similarity
if similarity >= self.similarity_threshold:
detected_masters.append(master_id)
# Sort detected masters by similarity (highest first)
detected_masters.sort(key=lambda x: similarities[x], reverse=True)
# Create analysis text
top_similarities = sorted(similarities.items(), key=lambda x: x[1], reverse=True)[:5]
analysis_parts = [
f"Whole image vector analysis using Google Vertex AI embeddings (1408 dimensions).",
f"Similarity threshold: {self.similarity_threshold}",
f"Found {len(detected_masters)} matches above threshold.",
f"Top 5 similarities: " + ", ".join([f"{mid}({sim:.3f})" for mid, sim in top_similarities])
]
analysis = " ".join(analysis_parts)
print(f"✓ Completed {layout_name} - Found {len(detected_masters)} matches")
if detected_masters:
print(f" Matches: {', '.join(detected_masters)}")
return {
'detected_masters': detected_masters,
'detected_master_ids': detected_masters,
'detected_master_filenames': [f"{mid}.jpg" for mid in detected_masters ],
'analysis': analysis,
'similarities': dict(top_similarities),
'processing_mode': 'vector_embedding_whole',
'similarity_threshold': self.similarity_threshold,
'embedding_dimensions': 1408
}
def detect_with_splitting(self, layout_path: str, layout_name: str) -> Dict:
"""Detect masters using image splitting and crop comparison"""
print(f" Processing with grid splitting: {layout_name}")
# Step 1: Split the image into crops
crops = self.split_image_by_grid(layout_path)
if not crops:
print(f" No valid crops found, falling back to whole image")
return self.detect_whole_image(layout_path, layout_name)
# Step 2: Save debug crops if needed
self.save_crop_debug_images(layout_path, crops)
# Step 3: Process each crop
all_crop_results = []
crop_similarities = {}
for i, crop in enumerate(crops):
print(f" Processing crop {i+1}/{len(crops)} ({crop['crop_id']})")
# Generate embedding for this crop
crop_embedding = self.generate_crop_embedding(layout_path, crop)
if crop_embedding is None:
continue
# Compare crop against all masters
crop_result = {
'crop_id': crop['crop_id'],
'crop_info': crop,
'similarities': {},
'matches': []
}
for master_id, master_embedding in self.master_embeddings.items():
similarity = self.compute_cosine_similarity(crop_embedding, master_embedding)
crop_result['similarities'][master_id] = similarity
if similarity >= self.similarity_threshold:
crop_result['matches'].append(master_id)
# Sort matches by similarity
crop_result['matches'].sort(key=lambda x: crop_result['similarities'][x], reverse=True)
all_crop_results.append(crop_result)
# Track all similarities for global analysis
for master_id, sim in crop_result['similarities'].items():
if master_id not in crop_similarities or sim > crop_similarities[master_id]:
crop_similarities[master_id] = sim
# Step 4: Aggregate results across all crops
detected_masters = []
final_similarities = {}
# Collect all unique matches with their best similarity scores
for crop_result in all_crop_results:
for match in crop_result['matches']:
if match not in detected_masters:
detected_masters.append(match)
final_similarities[match] = crop_result['similarities'][match]
else:
# Update with higher similarity if found
if crop_result['similarities'][match] > final_similarities[match]:
final_similarities[match] = crop_result['similarities'][match]
# Sort by best similarity
detected_masters.sort(key=lambda x: final_similarities.get(x, 0), reverse=True)
# Get top overall similarities for analysis
top_similarities = sorted(crop_similarities.items(), key=lambda x: x[1], reverse=True)[:5]
# Create analysis
analysis_parts = [
f"Grid-based splitting analysis using Google Vertex AI embeddings (1408 dimensions).",
f"Split into {len(crops)} crops, processed {len(all_crop_results)} successfully.",
f"Similarity threshold: {self.similarity_threshold}",
f"Found {len(detected_masters)} unique matches across all crops.",
f"Top 5 similarities: " + ", ".join([f"{mid}({sim:.3f})" for mid, sim in top_similarities])
]
analysis = " ".join(analysis_parts)
print(f"✓ Completed {layout_name} - Found {len(detected_masters)} matches across {len(crops)} crops")
if detected_masters:
print(f" Matches: {', '.join(detected_masters)}")
return {
'detected_masters': detected_masters,
'detected_master_ids': detected_masters,
'detected_master_filenames': [f"{mid}.jpg" for mid in detected_masters ],
'analysis': analysis,
'similarities': dict(top_similarities),
'processing_mode': 'vector_embedding_grid',
'similarity_threshold': self.similarity_threshold,
'embedding_dimensions': 1408,
'crops_processed': len(all_crop_results),
'total_crops': len(crops),
'crop_results': all_crop_results # Detailed crop-by-crop results
}
def process_all_layouts_vector(self, limit: Optional[int] = None, specific_file: Optional[str] = None) -> Dict:
"""Process all layout images using vector embeddings"""
print("Starting vector-based batch processing...")
# Load master images
self.load_master_images()
# Generate master embeddings (with caching)
self.generate_master_embeddings()
if not self.master_embeddings:
raise Exception("No master embeddings available")
# Get layout files
if specific_file:
# Process only the specific file
layout_files = [self.layouts_path / specific_file]
if not layout_files[0].exists():
raise FileNotFoundError(f"Layout file {specific_file} not found in {self.layouts_path}")
print(f"Processing specific file: {specific_file}")
else:
layout_files = list(self.layouts_path.glob("*.jpg"))
layout_files.sort() # Ensure consistent alphabetical ordering
print(f"Found {len(layout_files)} layout files")
if layout_files:
print(f"First file will be: {layout_files[0].name}")
if limit:
layout_files = layout_files[:limit]
print(f"Processing first {limit} layouts only")
total_layouts = len(layout_files)
print(f"Processing {total_layouts} layout images using vector embeddings")
print("=" * 60)
results = {}
start_time = time.time()
for i, layout_path in enumerate(layout_files, 1):
layout_id = layout_path.stem
# Detect images using vector similarity
result = self.detect_masters_in_layout_vector(str(layout_path), i, total_layouts)
layout_result = {
'layout_filename': layout_path.name,
'detected_master_ids': result['detected_master_ids'],
'detected_master_filenames': result['detected_master_filenames'],
'analysis': result['analysis'],
'processing_mode': 'vector_embedding',
'similarity_threshold': self.similarity_threshold,
'embedding_dimensions': 1408
}
if 'similarities' in result:
layout_result['similarities'] = result['similarities']
if 'error' in result:
layout_result['error'] = result['error']
results[layout_id] = layout_result
# Progress update
elapsed = time.time() - start_time
avg_time = elapsed / i
remaining = (total_layouts - i) * avg_time
print(f"Progress: {i}/{total_layouts} ({i/total_layouts*100:.1f}%) - Est. remaining: {remaining/60:.1f} min")
# Save progress periodically
if i % 20 == 0:
self.save_results(results, f"vector_progress_{i}")
total_time = time.time() - start_time
print(f"\n✓ Completed vector processing of {total_layouts} layouts in {total_time/60:.1f} minutes")
print(f"Average time per layout: {total_time/total_layouts:.1f} seconds")
return results
def save_results(self, results: Dict, filename: str = "vector_detection_results") -> str:
"""Save results to JSON file"""
output_path = self.results_path / f"{filename}.json"
# Add metadata
output_data = {
'metadata': {
'total_layouts_processed': len(results),
'total_master_images': len(self.master_images),
'master_images_available': list(self.master_files.keys()),
'processing_mode': 'vector_embedding',
'similarity_threshold': self.similarity_threshold,
'embedding_dimensions': 1408,
'embedding_model': 'Google Vertex AI multimodalembedding@001'
},
'results': results
}
with open(output_path, 'w') as f:
json.dump(output_data, f, indent=2)
print(f"Results saved to: {output_path}")
return str(output_path)
def generate_summary(self, results: Dict) -> Dict:
"""Generate summary statistics for vector detection"""
total_layouts = len(results)
layouts_with_matches = sum(1 for r in results.values() if r['detected_master_ids'])
# Count master image occurrences
master_counts = {}
for result in results.values():
for master_id in result['detected_master_ids']:
master_counts[master_id] = master_counts.get(master_id, 0) + 1
summary = {
'total_layouts_processed': total_layouts,
'layouts_with_matches': layouts_with_matches,
'layouts_without_matches': total_layouts - layouts_with_matches,
'master_image_usage': master_counts,
'most_used_masters': sorted(master_counts.items(), key=lambda x: x[1], reverse=True)[:10],
'processing_mode': 'vector_embedding',
'similarity_threshold': self.similarity_threshold,
'embedding_dimensions': 1408
}
return summary