ferrero-opentext/Python-Version/MARKDOWN_DOCS/A1_RETRY_LOGIC.md
nickviljoen e1f15ea632 Add A1 retry logic and orchestrator off-hours cadence
Feature 1: A1→A2 Empty Folder Retry Logic
- Track retry attempts (max 3) for campaigns with no master assets
- Mark campaigns as permanently failed after 3 attempts
- Stop processing and sending emails for permanently failed campaigns
- Two new email templates: retry notification and permanent failure
- Database migration adds 4 new columns to campaign_status table
- Comprehensive documentation in A1_RETRY_LOGIC.md

Feature 2: Orchestrator Off-Hours Cadence
- Add 30 minutes to all task intervals during off-hours
- Off-hours: 10 PM - 5 AM weekdays + all day Saturday/Sunday
- Tasks only run at minutes 0 and 30 during off-hours
- Configurable and easy to enable/disable
- Daily Report (7 PM) remains unchanged

Files changed:
- NEW: database/migrations/003_add_a1_retry_tracking.sql
- NEW: MARKDOWN_DOCS/A1_RETRY_LOGIC.md
- MODIFIED: scripts/shared/database.py (added 3 methods)
- MODIFIED: scripts/a1_to_a2_box_uploader.py (added retry logic)
- MODIFIED: scripts/shared/notifier.py (added 2 templates)
- MODIFIED: scripts/orchestrator-prod.py (added off-hours config)
- MODIFIED: RUN_ORCHESTRATOR.md (added off-hours docs)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-31 17:38:57 +02:00

321 lines
9.4 KiB
Markdown

# A1→A2 Empty Folder Retry Logic
## Preventing Infinite Error Emails for Empty Campaigns
**Purpose:** Avoid sending "no assets found" error emails every 3 minutes indefinitely when a campaign is set to A1 but has no master assets.
**Author:** Claude Code
**Date:** January 31, 2026
**Related Files:**
- `scripts/a1_to_a2_box_uploader.py` (main script)
- `scripts/shared/database.py` (retry tracking methods)
- `database/migrations/003_add_a1_retry_tracking.sql` (schema)
---
## How It Works
### The Problem
Previously, when a campaign had status A1 but the Master Assets folder was empty:
- System sent error email every 3 minutes
- Campaign remained in A1 status forever
- No distinction between temporary and permanent failures
- Notification fatigue for support team
### The Solution
Three-attempt retry mechanism with permanent failure tracking:
**Flow:**
1. Campaign in A1 status with no assets → Attempt 1 (email sent, retry_count=1)
2. Still no assets after 3 minutes → Attempt 2 (email sent, retry_count=2)
3. Still no assets after 6 minutes → Attempt 3 (email sent, retry_count=3, permanently_failed=TRUE)
4. Campaign now skipped on all future runs → Manual reset required
### Database Tracking
Four new fields in `campaign_status` table:
- `a1_retry_count` (INTEGER): Number of failed attempts (0-3)
- `a1_last_retry_at` (TIMESTAMP): When last attempt occurred
- `a1_permanently_failed` (BOOLEAN): TRUE after 3 failures
- `a1_failure_reason` (TEXT): Why it failed (e.g., "No master assets found")
---
## Configuration
### Maximum Retry Attempts
**Current Setting:** 3 attempts
**To Change:** Edit `/Users/nickviljoen/Desktop/Ferrero/ferrero-opentext/Python-Version/scripts/shared/database.py`
```python
def increment_a1_retry(self, campaign_id, campaign_number, campaign_name, reason):
"""..."""
# Maximum retry attempts before marking as permanently failed
MAX_RETRIES = 3 # CHANGE THIS NUMBER
```
**Recommendation:** Keep at 3. This allows:
- Immediate notification (attempt 1)
- Short-term retry (attempt 2 after 3 min)
- Medium-term retry (attempt 3 after 6 min)
- Permanent failure (after 9 minutes total)
---
## Email Notifications
### Retry Email (Attempts 1-2)
**Subject:** ⚠️ No Assets Found (Attempt X/3) - Campaign {name}
**Recipients:** Error notification list
**Content:**
- Current retry count
- Remaining attempts
- What happens next
### Final Failure Email (Attempt 3)
**Subject:** ❌ PERMANENTLY FAILED - Campaign {name} (No Assets After 3 Attempts)
**Recipients:** Error notification list
**Content:**
- Campaign marked as permanently failed
- Required actions to fix
- SQL command to manually reset
---
## Manual Operations
### Check Campaign Retry Status
```sql
SELECT campaign_number, campaign_name, status,
a1_retry_count, a1_last_retry_at,
a1_permanently_failed, a1_failure_reason
FROM campaign_status
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
```
### Reset Single Campaign
```sql
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
```
**Or using psql command:**
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking <<EOF
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
EOF
```
### Reset All Failed Campaigns
```sql
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE a1_permanently_failed = TRUE;
```
### View All Failed Campaigns
```sql
SELECT campaign_number, campaign_name,
a1_retry_count, a1_last_retry_at, a1_failure_reason
FROM campaign_status
WHERE a1_permanently_failed = TRUE
ORDER BY a1_last_retry_at DESC;
```
---
## Failure Scenarios
### Scenario 1: Temporary Empty Folder
**What Happens:**
- Attempt 1: Email sent, retry counter = 1
- Assets added to folder before attempt 2
- Next run finds assets, processes successfully
- Retry counter automatically reset to 0
**Result:** Problem self-resolves, minimal notifications
### Scenario 2: Persistent Empty Folder
**What Happens:**
- Attempt 1 (0 min): Email sent, retry counter = 1
- Attempt 2 (3 min): Email sent, retry counter = 2
- Attempt 3 (6 min): Email sent, retry counter = 3
- Campaign marked permanently failed
- Processing stops, no more emails
**Result:** Support team alerted, infinite emails prevented
### Scenario 3: Wrong Status Assignment
**What Happens:**
- Campaign set to A1 by mistake (no assets intended)
- Fails 3 times, marked permanently failed
- Admin realizes mistake, changes status to different value
- Campaign no longer appears in A1 search results
**Result:** No reset needed, campaign excluded from processing
---
## Testing
### Test Retry Logic
1. Create test campaign in DAM with A1 status
2. Ensure Master Assets folder is empty
3. Run A1→A2 script manually 3 times
4. Verify emails received and database state
```bash
# Run 1
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Check database
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed FROM campaign_status WHERE status = 'A1';"
# Run 2 (wait 3 minutes or run immediately for testing)
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Check again
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed FROM campaign_status WHERE status = 'A1';"
# Run 3
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Verify permanently failed
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed, a1_failure_reason FROM campaign_status WHERE a1_permanently_failed = TRUE;"
```
### Test Reset Logic
```bash
# Reset the test campaign
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "UPDATE campaign_status SET a1_retry_count = 0, a1_permanently_failed = FALSE WHERE campaign_number = 'TEST_CAMPAIGN';"
# Run again
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Verify it retries
```
---
## Monitoring
### Dashboard Query: Current Retry Status
```sql
SELECT
COUNT(*) FILTER (WHERE a1_retry_count = 0) as "No Issues",
COUNT(*) FILTER (WHERE a1_retry_count = 1) as "Attempt 1",
COUNT(*) FILTER (WHERE a1_retry_count = 2) as "Attempt 2",
COUNT(*) FILTER (WHERE a1_retry_count >= 3) as "Permanently Failed"
FROM campaign_status
WHERE status = 'A1';
```
### Alert Query: Campaigns Near Failure
```sql
SELECT campaign_number, campaign_name, a1_retry_count, a1_last_retry_at
FROM campaign_status
WHERE status = 'A1'
AND a1_retry_count >= 2
AND a1_permanently_failed = FALSE
ORDER BY a1_retry_count DESC, a1_last_retry_at DESC;
```
---
## Troubleshooting
### Q: Campaign keeps failing even after adding assets
**A:** Check if campaign was marked permanently failed. Reset using SQL command above.
### Q: Want to change from 3 to 5 retry attempts
**A:** Edit `MAX_RETRIES = 3` in `database.py` line ~567. Also update email templates to reflect new maximum.
### Q: How to disable retry logic completely?
**A:** Not recommended, but you can:
1. Set `MAX_RETRIES = 999` (effectively infinite)
2. Or revert to old `a1_to_a2_no_assets` template without retry tracking
### Q: Can I set different retry counts for different campaigns?
**A:** No, it's a global setting. All campaigns use same `MAX_RETRIES` value.
### Q: What if I want to delete permanently failed campaigns from database?
**A:** Don't delete. Instead, change their status to something other than A1. They'll be excluded from processing automatically.
---
## Future Enhancements
Potential improvements for future versions:
1. **Configurable retry timing:**
- Instead of relying on cron frequency (3 min)
- Check `a1_last_retry_at` and skip if too recent
- Allow exponential backoff (3 min, 10 min, 30 min)
2. **Campaign-specific retry limits:**
- Add optional `a1_max_retries` column
- Allow different campaigns to have different thresholds
- Default to global MAX_RETRIES if not set
3. **Automatic cleanup:**
- After 30 days, auto-reset permanently failed campaigns
- Or send weekly digest of stuck campaigns
4. **Webhook notifications:**
- Send to external system when campaign permanently fails
- Integrate with ticketing system
5. **Admin UI:**
- Web interface to view/reset retry status
- Bulk reset operations
---
## Code Locations
**Quick reference for developers:**
| Component | File | Line Range |
|-----------|------|------------|
| Retry check logic | `a1_to_a2_box_uploader.py` | ~176-186 |
| Empty folder detection | `a1_to_a2_box_uploader.py` | ~193-231 |
| Success reset | `a1_to_a2_box_uploader.py` | ~354-356 |
| `get_a1_retry_status()` | `database.py` | ~522-558 |
| `increment_a1_retry()` | `database.py` | ~560-620 |
| `reset_a1_retry()` | `database.py` | ~622-655 |
| Email templates | `notifier.py` | ~593-687 |
| Database migration | `migrations/003_add_a1_retry_tracking.sql` | All |
---
## Change Log
**January 31, 2026:**
- Initial implementation
- 3-attempt retry mechanism
- Permanent failure tracking
- Two new email templates
- This documentation created
**Future updates will be logged here.**