master_adapt_detect/COST_TRACKING_README.md
2025-10-01 14:32:55 -05:00

8.6 KiB

Cost Tracking for Master Image Detection

This document describes the cost tracking features added to the master image detection application to monitor and report OpenAI API usage costs.

Overview

The cost tracking system provides comprehensive monitoring of OpenAI o3 API usage, including:

  • Real-time cost calculation for all API calls
  • Per-layout cost breakdown with detailed token usage
  • Session summaries with totals and averages
  • Monthly cost estimation based on usage patterns
  • Detailed cost reports in JSON format
  • Integration with all detection modes (Gemini, OpenAI, Vector, Hybrid)

Current OpenAI o3 Pricing

  • Input tokens: $2.00 per million tokens
  • Cached input tokens: $0.50 per million tokens
  • Output tokens: $8.00 per million tokens

CLI Usage

Enable Cost Tracking

Important: Cost tracking is disabled by default. You must use the --enable-cost-tracking flag to enable it.

# Enable cost tracking for any detection mode
python cli.py --test --hybrid --enable-cost-tracking

# Enable cost tracking with detailed report generation
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report

# Enable cost tracking with monthly cost estimation
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

What "tracking: disabled" means

If you see "Cost Calculator initialized (tracking: disabled)" in the logs, it means:

  1. Cost tracking is turned off - no costs are being calculated or stored
  2. You need to add the --enable-cost-tracking flag to enable cost monitoring
  3. API calls are still being made but their costs aren't being tracked

Why the repetitive initialization messages?

The cost calculator may be initialized multiple times due to:

  1. Multiprocessing workers - Each worker process loads the module
  2. Normal behavior - This doesn't affect functionality
  3. Only main process shows full details - Worker processes show minimal output

Cost Tracking Options

  • --enable-cost-tracking: Enable detailed cost tracking and real-time reporting
  • --cost-report: Generate detailed JSON cost report after processing
  • --cost-estimate N: Show monthly cost estimate based on N layouts per month

Example Usage

# Test hybrid mode with cost tracking enabled
python cli.py --test --hybrid --enable-cost-tracking

# Process 10 layouts with OpenAI and generate cost report
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report

# Full hybrid run with cost tracking and monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

Cost Tracking Features

1. Real-time Cost Monitoring

  • Tracks every OpenAI API call with token usage and cost
  • Displays running totals during processing
  • Shows cost per layout in progress updates

2. Detailed Cost Breakdown

Each processed layout includes cost information:

{
  "layout_filename": "example.jpg",
  "detected_master_ids": ["1011A_1011_05"],
  "cost_breakdown": {
    "total_cost": 0.0234,
    "cost_breakdown": {
      "input_tokens": 1500,
      "output_tokens": 800,
      "cached_tokens": 200,
      "api_calls_made": 1,
      "operation_types": ["panel_counting_censorship"]
    }
  }
}

3. Session Summary

Displays comprehensive cost statistics:

COST TRACKING SUMMARY
============================================================
Total cost: $2.4567
Total tokens: 145,678
  - Input tokens: 98,456
  - Output tokens: 47,222
  - Cached tokens: 12,345
Total API calls: 156
Layouts processed: 150

Averages:
  - Cost per layout: $0.0164
  - Tokens per layout: 971.2
  - API calls per layout: 1.0
  - Cost per 1K tokens: $0.0169

Operation breakdown:
  - panel_counting_censorship: 150 calls
  - detection: 0 calls
  - one_at_a_time_detection: 0 calls
============================================================

4. Monthly Cost Estimation

Estimates monthly costs based on current usage patterns:

MONTHLY COST ESTIMATE
Based on 150 processed layouts:
  Average cost per layout: $0.0164
  Estimated monthly cost (300 layouts): $4.92
  Estimated annual cost: $59.04

5. Cost Reports

Generates detailed JSON reports saved to results/cost_report_[timestamp].json:

{
  "session_summary": {
    "total_cost": 2.4567,
    "total_input_tokens": 98456,
    "total_output_tokens": 47222,
    "layouts_processed": 150
  },
  "layout_costs": {
    "example.jpg": {
      "total_cost": 0.0234,
      "total_input_tokens": 1500,
      "total_output_tokens": 800,
      "api_calls_made": 1
    }
  },
  "detailed_api_calls": [
    {
      "operation_type": "panel_counting_censorship",
      "timestamp": "2025-01-15T10:30:45.123456",
      "token_usage": {
        "prompt_tokens": 1500,
        "completion_tokens": 800,
        "total_tokens": 2300,
        "cached_tokens": 200
      },
      "total_cost": 0.0234,
      "layout_name": "example.jpg"
    }
  ]
}

Integration with Detection Modes

Hybrid Mode (Primary Focus)

Cost tracking is fully integrated with hybrid mode:

  • Panel counting + censorship detection: 1 API call per layout
  • Local inlier analysis: No API calls (zero cost)
  • Vector similarity: No API calls (zero cost)
  • Fallback to OpenAI: Additional API calls when needed

OpenAI Mode

Tracks all OpenAI API usage patterns:

  • Regular detection: 1 API call per layout (all masters compared)
  • One-at-a-time mode: 41 API calls per layout (one per master)
  • Censorship detection: Additional API calls for CEN refinement

Vector Mode

No API costs (uses Google Vertex AI, not OpenAI)

Gemini Mode

No API costs (uses Google Gemini, not OpenAI)

Operation Types Tracked

The system tracks different types of API operations:

  1. panel_counting_censorship: Combined panel counting and censorship detection
  2. detection: Main master image detection
  3. censorship_detection: Standalone censorship analysis
  4. one_at_a_time_detection: Individual master comparisons

Cost Optimization Benefits

The cost tracking system helps identify optimization opportunities:

Hybrid Mode Savings

Hybrid mode significantly reduces costs compared to one-at-a-time processing:

  • One-at-a-time mode: 41 API calls per layout
  • Hybrid mode: 1 API call per layout (97.6% reduction)
  • Estimated savings: Shows percentage savings in session summary

Usage Pattern Analysis

  • Identify expensive operations
  • Track token efficiency by operation type
  • Monitor cost per detected master
  • Analyze cost trends over time

Testing

Run the cost calculator tests to verify functionality:

python test_cost_calculator.py

This will test all cost tracking features including:

  • Basic cost calculation
  • API call tracking
  • Layout cost breakdown
  • Session summaries
  • Monthly cost estimation
  • Cost report generation

Technical Implementation

Core Components

  1. cost_calculator.py: Main cost tracking module
  2. Token extraction: Automatic token usage extraction from API responses
  3. Integration points: All OpenAI API calls instrumented
  4. Data structures: Efficient tracking of costs and token usage

Integration Points

Cost tracking is integrated at these key locations:

  • openai_detector.py: All OpenAI API calls
  • hybrid_detector.py: Hybrid mode processing
  • cli.py: Command-line interface and reporting
  • Results JSON: Cost breakdowns included in output

Error Handling

  • Graceful degradation when API responses lack usage data
  • Optional feature (disabled by default)
  • No impact on existing functionality when disabled

Future Enhancements

Potential future improvements include:

  • Cost budgeting: Set spending limits and alerts
  • Historical tracking: Long-term cost trend analysis
  • Token optimization: Automatic prompt and image optimization
  • Multi-provider support: Track costs across different AI providers
  • Real-time alerts: Notifications when costs exceed thresholds

Troubleshooting

Common Issues

  1. "No cost data available": Cost tracking is disabled (use --enable-cost-tracking)
  2. "API usage data missing": OpenAI response lacks usage information
  3. "Cost report empty": No API calls were made during processing

Debugging

Enable cost tracking and run a test:

python cli.py --test --hybrid --enable-cost-tracking

This will show real-time cost information and help identify any issues.

Support

For questions or issues with cost tracking:

  1. Check the session summary output for diagnostic information
  2. Review the cost report JSON for detailed API call information
  3. Run the test suite to verify functionality
  4. Ensure OpenAI API responses include usage data