2025-10-01 14:32:55 -05:00

8.6 KiB

Raw Permalink Blame History

Cost Tracking for Master Image Detection

This document describes the cost tracking features added to the master image detection application to monitor and report OpenAI API usage costs.

Overview

The cost tracking system provides comprehensive monitoring of OpenAI o3 API usage, including:

Real-time cost calculation for all API calls
Per-layout cost breakdown with detailed token usage
Session summaries with totals and averages
Monthly cost estimation based on usage patterns
Detailed cost reports in JSON format
Integration with all detection modes (Gemini, OpenAI, Vector, Hybrid)

Current OpenAI o3 Pricing

Input tokens: $2.00 per million tokens
Cached input tokens: $0.50 per million tokens
Output tokens: $8.00 per million tokens

CLI Usage

Enable Cost Tracking

Important: Cost tracking is disabled by default. You must use the --enable-cost-tracking flag to enable it.

# Enable cost tracking for any detection mode
python cli.py --test --hybrid --enable-cost-tracking

# Enable cost tracking with detailed report generation
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report

# Enable cost tracking with monthly cost estimation
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

What "tracking: disabled" means

If you see "Cost Calculator initialized (tracking: disabled)" in the logs, it means:

Cost tracking is turned off - no costs are being calculated or stored
You need to add the --enable-cost-tracking flag to enable cost monitoring
API calls are still being made but their costs aren't being tracked

Why the repetitive initialization messages?

The cost calculator may be initialized multiple times due to:

Multiprocessing workers - Each worker process loads the module
Normal behavior - This doesn't affect functionality
Only main process shows full details - Worker processes show minimal output

Cost Tracking Options

--enable-cost-tracking: Enable detailed cost tracking and real-time reporting
--cost-report: Generate detailed JSON cost report after processing
--cost-estimate N: Show monthly cost estimate based on N layouts per month

Example Usage

# Test hybrid mode with cost tracking enabled
python cli.py --test --hybrid --enable-cost-tracking

# Process 10 layouts with OpenAI and generate cost report
python cli.py --limit 10 --openai --enable-cost-tracking --cost-report

# Full hybrid run with cost tracking and monthly estimate
python cli.py --all --hybrid --enable-cost-tracking --cost-estimate 300

Cost Tracking Features

1. Real-time Cost Monitoring

Tracks every OpenAI API call with token usage and cost
Displays running totals during processing
Shows cost per layout in progress updates

2. Detailed Cost Breakdown

Each processed layout includes cost information:

{
  "layout_filename": "example.jpg",
  "detected_master_ids": ["1011A_1011_05"],
  "cost_breakdown": {
    "total_cost": 0.0234,
    "cost_breakdown": {
      "input_tokens": 1500,
      "output_tokens": 800,
      "cached_tokens": 200,
      "api_calls_made": 1,
      "operation_types": ["panel_counting_censorship"]
    }
  }
}

3. Session Summary

Displays comprehensive cost statistics:

COST TRACKING SUMMARY
============================================================
Total cost: $2.4567
Total tokens: 145,678
  - Input tokens: 98,456
  - Output tokens: 47,222
  - Cached tokens: 12,345
Total API calls: 156
Layouts processed: 150

Averages:
  - Cost per layout: $0.0164
  - Tokens per layout: 971.2
  - API calls per layout: 1.0
  - Cost per 1K tokens: $0.0169

Operation breakdown:
  - panel_counting_censorship: 150 calls
  - detection: 0 calls
  - one_at_a_time_detection: 0 calls
============================================================

4. Monthly Cost Estimation

Estimates monthly costs based on current usage patterns:

MONTHLY COST ESTIMATE
Based on 150 processed layouts:
  Average cost per layout: $0.0164
  Estimated monthly cost (300 layouts): $4.92
  Estimated annual cost: $59.04

5. Cost Reports

Generates detailed JSON reports saved to results/cost_report_[timestamp].json:

{
  "session_summary": {
    "total_cost": 2.4567,
    "total_input_tokens": 98456,
    "total_output_tokens": 47222,
    "layouts_processed": 150
  },
  "layout_costs": {
    "example.jpg": {
      "total_cost": 0.0234,
      "total_input_tokens": 1500,
      "total_output_tokens": 800,
      "api_calls_made": 1
    }
  },
  "detailed_api_calls": [
    {
      "operation_type": "panel_counting_censorship",
      "timestamp": "2025-01-15T10:30:45.123456",
      "token_usage": {
        "prompt_tokens": 1500,
        "completion_tokens": 800,
        "total_tokens": 2300,
        "cached_tokens": 200
      },
      "total_cost": 0.0234,
      "layout_name": "example.jpg"
    }
  ]
}

Integration with Detection Modes

Hybrid Mode (Primary Focus)

Cost tracking is fully integrated with hybrid mode:

Panel counting + censorship detection: 1 API call per layout
Local inlier analysis: No API calls (zero cost)
Vector similarity: No API calls (zero cost)
Fallback to OpenAI: Additional API calls when needed

OpenAI Mode

Tracks all OpenAI API usage patterns:

Regular detection: 1 API call per layout (all masters compared)
One-at-a-time mode: 41 API calls per layout (one per master)
Censorship detection: Additional API calls for CEN refinement

Vector Mode

No API costs (uses Google Vertex AI, not OpenAI)

Gemini Mode

No API costs (uses Google Gemini, not OpenAI)

Operation Types Tracked

The system tracks different types of API operations:

panel_counting_censorship: Combined panel counting and censorship detection
detection: Main master image detection
censorship_detection: Standalone censorship analysis
one_at_a_time_detection: Individual master comparisons

Cost Optimization Benefits

The cost tracking system helps identify optimization opportunities:

Hybrid Mode Savings

Hybrid mode significantly reduces costs compared to one-at-a-time processing:

One-at-a-time mode: 41 API calls per layout
Hybrid mode: 1 API call per layout (97.6% reduction)
Estimated savings: Shows percentage savings in session summary

Usage Pattern Analysis

Identify expensive operations
Track token efficiency by operation type
Monitor cost per detected master
Analyze cost trends over time

Testing

Run the cost calculator tests to verify functionality:

python test_cost_calculator.py

This will test all cost tracking features including:

Basic cost calculation
API call tracking
Layout cost breakdown
Session summaries
Monthly cost estimation
Cost report generation

Technical Implementation

Core Components

cost_calculator.py: Main cost tracking module
Token extraction: Automatic token usage extraction from API responses
Integration points: All OpenAI API calls instrumented
Data structures: Efficient tracking of costs and token usage

Integration Points

Cost tracking is integrated at these key locations:

openai_detector.py: All OpenAI API calls
hybrid_detector.py: Hybrid mode processing
cli.py: Command-line interface and reporting
Results JSON: Cost breakdowns included in output

Error Handling

Graceful degradation when API responses lack usage data
Optional feature (disabled by default)
No impact on existing functionality when disabled

Future Enhancements

Potential future improvements include:

Cost budgeting: Set spending limits and alerts
Historical tracking: Long-term cost trend analysis
Token optimization: Automatic prompt and image optimization
Multi-provider support: Track costs across different AI providers
Real-time alerts: Notifications when costs exceed thresholds

Troubleshooting

Common Issues

"No cost data available": Cost tracking is disabled (use --enable-cost-tracking)
"API usage data missing": OpenAI response lacks usage information
"Cost report empty": No API calls were made during processing

Debugging

Enable cost tracking and run a test:

python cli.py --test --hybrid --enable-cost-tracking

This will show real-time cost information and help identify any issues.

Support

For questions or issues with cost tracking:

Check the session summary output for diagnostic information
Review the cost report JSON for detailed API call information
Run the test suite to verify functionality
Ensure OpenAI API responses include usage data

8.6 KiB Raw Permalink Blame History