master_adapt_detect/master_adapt_detector_diagram.md
2025-10-01 14:32:55 -05:00

14 KiB

Master Adapt Detector Architecture Diagram

This diagram illustrates the architecture and flow of the Master Adapt Detector application, which uses multiple AI models and computer vision techniques to detect master images within layout images.

High-Level Architecture

graph TB
    subgraph "Entry Point"
        CLI[cli.py - Command Line Interface]
    end
    
    subgraph "Core Detection Engines"
        GD[Gemini Detector<br/>gemini_detector.py]
        OD[OpenAI Detector<br/>openai_detector.py]
        VD[Vector Detector<br/>vector_detector.py]
        HD[Hybrid Detector<br/>hybrid_detector.py]
    end
    
    subgraph "Panel Splitting System"
        PS[Panel Splitter<br/>panel_splitter.py]
        AS[Advanced Splitter<br/>advanced_splitter.py]
        SS[Simple Splitter<br/>simple_splitter.py]
    end
    
    subgraph "Support Systems"
        MM[Memory Manager<br/>memory_manager.py]
        LC[Logging Config<br/>logging_config.py]
        PD[Process Detection<br/>process_detection.py]
    end
    
    subgraph "AI APIs"
        GEMINI[Google Gemini 2.5 Pro]
        OPENAI[OpenAI o3]
        VERTEX[Google Vertex AI<br/>Vector Embeddings]
    end
    
    subgraph "Computer Vision"
        OPENCV[OpenCV<br/>Feature Detection]
        AKAZE[AKAZE Features]
        RANSAC[RANSAC Homography]
    end
    
    subgraph "Data Storage"
        MI[Master Images<br/>master_images/]
        LI[Layout Images<br/>layouts/]
        RES[Results<br/>results/]
        EMB[Embeddings Cache<br/>embeddings_cache/]
    end
    
    CLI --> GD
    CLI --> OD
    CLI --> VD
    CLI --> HD
    
    HD --> OD
    HD --> VD
    HD --> PS
    HD --> AS
    HD --> SS
    
    GD --> GEMINI
    OD --> OPENAI
    VD --> VERTEX
    
    HD --> OPENCV
    HD --> AKAZE
    HD --> RANSAC
    
    PS --> OPENCV
    AS --> OPENCV
    SS --> OPENCV
    
    GD --> MM
    OD --> MM
    VD --> MM
    HD --> MM
    
    CLI --> LC
    PD --> LC
    
    GD --> MI
    OD --> MI
    VD --> MI
    HD --> MI
    
    GD --> LI
    OD --> LI
    VD --> LI
    HD --> LI
    
    GD --> RES
    OD --> RES
    VD --> RES
    HD --> RES
    
    VD --> EMB
    HD --> EMB

Detailed Application Flow

flowchart TD
    START([Application Start]) --> PARSE[Parse CLI Arguments]
    PARSE --> MODE{Select Mode}
    
    MODE -->|--hybrid| HYBRID[Hybrid Mode]
    MODE -->|--openai| OPENAI_MODE[OpenAI Mode]
    MODE -->|--vector-mode| VECTOR_MODE[Vector Mode]
    MODE -->|default| GEMINI_MODE[Gemini Mode]
    
    subgraph "Hybrid Mode Processing"
        HYBRID --> LOAD_MASTERS[Load Master Images]
        LOAD_MASTERS --> INIT_EMBED{Vector Mode?}
        INIT_EMBED -->|Yes| GEN_EMBED[Generate Master Embeddings]
        INIT_EMBED -->|No| INIT_CV[Initialize OpenCV Components]
        GEN_EMBED --> PROCESS_LAYOUT[Process Layout]
        INIT_CV --> PROCESS_LAYOUT
        
        PROCESS_LAYOUT --> COUNT_PANELS[Count Panels with OpenAI o3]
        COUNT_PANELS --> DETECT_CENSOR[Detect Censorship with OpenAI o3]
        DETECT_CENSOR --> PANEL_CHECK{Panel Count ≤ Threshold?}
        
        PANEL_CHECK -->|Yes| LOCAL_ANALYSIS[Local Analysis]
        PANEL_CHECK -->|No| SPLIT_ANALYSIS[Split + Analysis]
        
        LOCAL_ANALYSIS --> VECTOR_CHECK{Vector Mode?}
        VECTOR_CHECK -->|Yes| VECTOR_SIM[Vector Similarity]
        VECTOR_CHECK -->|No| INLIER_ANALYSIS[Inlier Analysis]
        
        SPLIT_ANALYSIS --> SPLIT_PANELS[Split Panels]
        SPLIT_PANELS --> SPLIT_VECTOR_CHECK{Vector Mode?}
        SPLIT_VECTOR_CHECK -->|Yes| SPLIT_VECTOR[Split + Vector Similarity]
        SPLIT_VECTOR_CHECK -->|No| SPLIT_INLIER[Split + Inlier Analysis]
        
        VECTOR_SIM --> APPLY_REFINEMENT
        INLIER_ANALYSIS --> APPLY_REFINEMENT
        SPLIT_VECTOR --> APPLY_REFINEMENT
        SPLIT_INLIER --> APPLY_REFINEMENT
        
        APPLY_REFINEMENT[Apply CEN Refinement] --> DEDUP[Deduplication]
        DEDUP --> TRUNCATE[Truncate to Panel Count]
        TRUNCATE --> FALLBACK_CHECK{Fallback Enabled?}
        
        FALLBACK_CHECK -->|Yes & Needed| FALLBACK[OpenAI One-at-a-Time Fallback]
        FALLBACK_CHECK -->|No| SAVE_RESULTS
        FALLBACK --> SAVE_RESULTS[Save Results]
    end
    
    subgraph "OpenAI Mode Processing"
        OPENAI_MODE --> LOAD_MASTERS_O[Load Master Images]
        LOAD_MASTERS_O --> ONE_AT_TIME{One-at-a-Time?}
        ONE_AT_TIME -->|Yes| PARALLEL_MASTERS[Parallel Master Processing]
        ONE_AT_TIME -->|No| BATCH_PROCESS[Batch Processing]
        
        PARALLEL_MASTERS --> PANEL_AWARE{Panel-Aware Refinement?}
        PANEL_AWARE -->|Yes| COUNT_PANELS_O[Count Panels] --> INLIER_REFINE[Inlier Refinement]
        PANEL_AWARE -->|No| APPLY_CEN_O[Apply CEN Refinement]
        
        INLIER_REFINE --> APPLY_CEN_O
        BATCH_PROCESS --> APPLY_CEN_O
        APPLY_CEN_O --> SAVE_RESULTS_O[Save Results]
    end
    
    subgraph "Vector Mode Processing"
        VECTOR_MODE --> LOAD_MASTERS_V[Load Master Images]
        LOAD_MASTERS_V --> GEN_EMBED_V[Generate Master Embeddings]
        GEN_EMBED_V --> SPLITTING_CHECK{Splitting Enabled?}
        SPLITTING_CHECK -->|Yes| SPLIT_LAYOUT[Split Layout]
        SPLITTING_CHECK -->|No| COMPARE_EMBED[Compare Embeddings]
        
        SPLIT_LAYOUT --> COMPARE_SPLITS[Compare Split Embeddings]
        COMPARE_SPLITS --> SAVE_RESULTS_V[Save Results]
        COMPARE_EMBED --> SAVE_RESULTS_V
    end
    
    subgraph "Gemini Mode Processing"
        GEMINI_MODE --> LOAD_MASTERS_G[Load Master Images]
        LOAD_MASTERS_G --> GEMINI_ONE_AT_TIME{One-at-a-Time?}
        GEMINI_ONE_AT_TIME -->|Yes| PARALLEL_MASTERS_G[Parallel Master Processing]
        GEMINI_ONE_AT_TIME -->|No| BATCH_PROCESS_G[Batch Processing]
        
        PARALLEL_MASTERS_G --> APPLY_CEN_G[Apply CEN Refinement]
        BATCH_PROCESS_G --> APPLY_CEN_G
        APPLY_CEN_G --> SAVE_RESULTS_G[Save Results]
    end
    
    SAVE_RESULTS --> END([End])
    SAVE_RESULTS_O --> END
    SAVE_RESULTS_V --> END
    SAVE_RESULTS_G --> END

Panel Splitting Architecture

graph TB
    subgraph "Panel Splitting System"
        INPUT[Layout Image] --> DETECTOR{Splitter Type}
        
        DETECTOR -->|Basic| PANEL_SPLITTER[PanelSplitter]
        DETECTOR -->|Advanced| ADVANCED_SPLITTER[AdvancedPanelSplitter]
        DETECTOR -->|Simple| SIMPLE_SPLITTER[SimplePanelSplitter]
        
        subgraph "PanelSplitter Methods"
            PANEL_SPLITTER --> EDGE_DETECT[Edge Detection]
            PANEL_SPLITTER --> CONTOUR_FIND[Contour Finding]
            PANEL_SPLITTER --> HIST_ANALYSIS[Histogram Analysis]
            PANEL_SPLITTER --> KMEANS[K-Means Clustering]
        end
        
        subgraph "AdvancedPanelSplitter Methods"
            ADVANCED_SPLITTER --> SOBEL[Sobel Edge Detection]
            ADVANCED_SPLITTER --> GUTTER_DETECT[Gutter Detection]
            ADVANCED_SPLITTER --> ENERGY_ANALYSIS[Energy Analysis]
            ADVANCED_SPLITTER --> PERCENTILE_THRESH[Percentile Thresholding]
        end
        
        subgraph "SimplePanelSplitter Methods"
            SIMPLE_SPLITTER --> EVEN_SPLIT[Even Division]
            SIMPLE_SPLITTER --> PANEL_COUNT[Use Panel Count]
        end
        
        EDGE_DETECT --> SPLIT_RESULTS[Split Results]
        CONTOUR_FIND --> SPLIT_RESULTS
        HIST_ANALYSIS --> SPLIT_RESULTS
        KMEANS --> SPLIT_RESULTS
        
        SOBEL --> SPLIT_RESULTS
        GUTTER_DETECT --> SPLIT_RESULTS
        ENERGY_ANALYSIS --> SPLIT_RESULTS
        PERCENTILE_THRESH --> SPLIT_RESULTS
        
        EVEN_SPLIT --> SPLIT_RESULTS
        PANEL_COUNT --> SPLIT_RESULTS
    end
    
    SPLIT_RESULTS --> INDIVIDUAL_PANELS[Individual Panel Images]
    INDIVIDUAL_PANELS --> MATCH_PROCESS[Match Each Panel to Masters]

Memory Management and Multiprocessing

graph TB
    subgraph "Memory Management System"
        MEMORY_MANAGER[Memory Manager] --> MONITOR[Monitor Usage]
        MONITOR --> THRESH_CHECK{Usage > Threshold?}
        THRESH_CHECK -->|Yes| THROTTLE[Throttle Processes]
        THRESH_CHECK -->|No| CONTINUE[Continue Processing]
        
        THROTTLE --> WAIT[Wait for Memory]
        WAIT --> REDUCE_WORKERS[Reduce Worker Count]
        REDUCE_WORKERS --> CONTINUE
        
        CONTINUE --> PROCESS_POOL[Process Pool Executor]
        PROCESS_POOL --> WORKER1[Worker Process 1]
        PROCESS_POOL --> WORKER2[Worker Process 2]
        PROCESS_POOL --> WORKERN[Worker Process N]
        
        subgraph "Worker Process"
            WORKER1 --> ISOLATED_ENV[Isolated Environment]
            ISOLATED_ENV --> LOAD_MODELS[Load Models]
            LOAD_MODELS --> PROCESS_TASK[Process Task]
            PROCESS_TASK --> CLEANUP[Cleanup]
        end
        
        WORKER2 --> ISOLATED_ENV
        WORKERN --> ISOLATED_ENV
    end
    
    subgraph "Feature Limiting"
        PROCESS_TASK --> FEATURE_COUNT[Count Features]
        FEATURE_COUNT --> FEATURE_CHECK{Features > Limit?}
        FEATURE_CHECK -->|Yes| LIMIT_FEATURES[Limit Features]
        FEATURE_CHECK -->|No| PROCEED[Proceed]
        LIMIT_FEATURES --> PROCEED
    end

Data Flow and Storage

graph LR
    subgraph "Input Data"
        MI[Master Images<br/>41 images]
        LI[Layout Images<br/>299+ images]
    end
    
    subgraph "Processing Cache"
        TEMP[Temp Processed Images]
        EMB_CACHE[Embeddings Cache]
        SPLITS[Split Panel Images]
    end
    
    subgraph "Output Data"
        JSON[JSON Results]
        LOGS[Log Files]
        DEBUG[Debug Images]
        CROPS[Crop Images]
    end
    
    MI --> TEMP
    LI --> TEMP
    
    TEMP --> EMB_CACHE
    TEMP --> SPLITS
    
    EMB_CACHE --> JSON
    SPLITS --> JSON
    
    JSON --> LOGS
    JSON --> DEBUG
    JSON --> CROPS
    
    subgraph "Result Structure"
        JSON --> METADATA[Metadata]
        JSON --> LAYOUT_RESULTS[Layout Results]
        
        METADATA --> TOTAL_LAYOUTS[Total Layouts]
        METADATA --> MASTER_COUNT[Master Count]
        METADATA --> PROVIDER[Provider Info]
        METADATA --> PROCESSING_MODE[Processing Mode]
        
        LAYOUT_RESULTS --> DETECTED_MASTERS[Detected Masters]
        LAYOUT_RESULTS --> ANALYSIS[Analysis Text]
        LAYOUT_RESULTS --> CONFIDENCE[Confidence Score]
        LAYOUT_RESULTS --> PANEL_INFO[Panel Information]
    end

Key Components and Their Roles

1. CLI Interface (cli.py)

  • Purpose: Command-line interface for the application
  • Features: Argument parsing, mode selection, batch processing options
  • Modes: Gemini, OpenAI, Vector, Hybrid
  • Options: Test mode, batch processing, custom outputs, splitting options

2. Detection Engines

Hybrid Detector (hybrid_detector.py)

  • Purpose: Cost-efficient detection combining OpenAI panel counting with local analysis
  • Features:
    • Panel threshold-based routing
    • Vector similarity or inlier analysis
    • Automatic fallback to OpenAI one-at-a-time
    • CEN refinement and deduplication
  • Workflow: Panel count → Route to local/split analysis → Apply refinements

OpenAI Detector (openai_detector.py)

  • Purpose: Uses OpenAI o3 model for image matching
  • Features:
    • One-at-a-time processing with multiprocessing
    • Panel-aware refinement
    • Image preprocessing (greyscale, contrast)
  • API: OpenAI o3 vision model

Vector Detector (vector_detector.py)

  • Purpose: Uses Google Vertex AI embeddings for similarity matching
  • Features:
    • 1408-dimensional embeddings
    • Cosine similarity matching
    • Embedding caching
  • API: Google Vertex AI Multimodal Embeddings

Gemini Detector (gemini_detector.py)

  • Purpose: Uses Google Gemini 2.5 Pro for image analysis
  • Features:
    • Batch processing
    • Safety settings handling
    • Image preprocessing
  • API: Google Gemini 2.5 Pro

3. Panel Splitting System

Panel Splitter (panel_splitter.py)

  • Purpose: Basic multi-method panel splitting
  • Methods: Edge detection, contour finding, histogram analysis, K-means clustering

Advanced Splitter (advanced_splitter.py)

  • Purpose: Advanced edge detection and gutter analysis
  • Methods: Sobel edge detection, energy analysis, percentile thresholding

Simple Splitter (simple_splitter.py)

  • Purpose: Simple even division based on panel count
  • Methods: Even division, panel count-based splitting

4. Support Systems

Memory Manager (memory_manager.py)

  • Purpose: Prevents memory exhaustion during processing
  • Features: Memory monitoring, worker throttling, safe execution decorators

Logging Config (logging_config.py)

  • Purpose: Dual logging to terminal and file
  • Features: System info logging, exception tracking, memory usage logging

Process Detection (process_detection.py)

  • Purpose: Standalone functions for multiprocessing
  • Features: Process isolation, error handling, resource cleanup

5. Key Algorithms

Inlier Analysis (OpenCV)

  • Purpose: Local feature matching using computer vision
  • Algorithm: AKAZE features → RANSAC homography → Inlier counting
  • Advantage: No API costs, fast processing

Vector Similarity (Vertex AI)

  • Purpose: Semantic similarity using embeddings
  • Algorithm: Image embeddings → Cosine similarity → Threshold matching
  • Advantage: Semantic understanding, good for transformed images

Panel Detection (OpenAI o3)

  • Purpose: Intelligent panel counting and censorship detection
  • Algorithm: Vision model analysis → Panel count + censorship status
  • Advantage: Accurate panel analysis, handles complex layouts

6. Processing Modes

  • Strategy: OpenAI panel counting + local analysis for efficiency
  • Routing: ≤2 panels → local analysis, ≥3 panels → split + analysis
  • Fallback: OpenAI one-at-a-time if insufficient matches
  • Cost: ~1 API call per layout vs ~41 for pure OpenAI

OpenAI Mode

  • Strategy: Pure OpenAI o3 processing
  • Options: Batch or one-at-a-time with panel-aware refinement
  • Cost: High API usage but highest accuracy

Vector Mode

  • Strategy: Pure vector embedding similarity
  • Options: Splitting modes for multi-panel layouts
  • Cost: No API costs after embedding generation

Gemini Mode

  • Strategy: Google Gemini 2.5 Pro processing
  • Options: Batch or one-at-a-time processing
  • Cost: Lower than OpenAI but higher than vector

This architecture provides a flexible, scalable system for master image detection with multiple processing strategies optimized for different use cases and cost requirements.