ferrero-opentext/Python-Version/MARKDOWN_DOCS/A1_RETRY_LOGIC.md
nickviljoen 28586308d7 Docs: Refresh A1 empty-folder doc and LTD asset type notes
A1_RETRY_LOGIC.md updated to reflect the 2026-04-28 rework: empty
folders are now treated as expected workflow (silent skip + one-time
warning at poll 20, no auto permanent-fail), while the original
3-strikes-then-permanently-fail behavior is preserved for genuine
folder errors via the mark_failed_at_max flag.

README.md adds LTD (Licensing Translation Document) to the asset type
override section alongside EOL, and notes that empty overrides remove
fields while non-empty overrides on non-MVP fields are appended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:19:06 +02:00

324 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# A1→A2 Empty Folder Handling
**Purpose:** Avoid spam emails and false-positive permanent failures for the common workflow where campaign managers create an A1 campaign before uploading the master assets.
**Initial implementation:** January 31, 2026
**Reworked:** April 28, 2026 — empty folders are now treated as expected client workflow rather than failures.
**Related files:**
- `scripts/a1_to_a2_box_uploader.py` (main script)
- `scripts/shared/database.py` (retry tracking methods)
- `database/migrations/003_add_a1_retry_tracking.sql` (schema)
---
## How It Works (current behavior)
### The empty-folder case (most common)
When a campaign is at A1 in DAM but the Master Assets folder is empty, the script treats this as a normal pre-asset state, not a failure.
**Flow:**
1. Every poll: `a1_retry_count` is incremented for visibility, the script logs `No master assets yet (poll N) - skipping until assets appear`, and exits silently.
2. At poll 20 (~1 hour at the 3-minute orchestrator cadence) the script sends a single `a1_to_a2_no_assets_warning` email so genuinely-stuck campaigns still surface.
3. After poll 20, the script keeps skipping silently. **`a1_permanently_failed` is never auto-set for empty folders.**
4. When assets eventually appear and A1→A2 succeeds, `db.reset_a1_retry()` clears the counter automatically.
The threshold lives in `scripts/a1_to_a2_box_uploader.py` as `EMPTY_FOLDER_WARNING_THRESHOLD = 20`.
### The genuine-error case
The 3-retries-then-permanently-fail behavior **still exists** for actual folder-level errors (e.g. `Assets folder not found (tried Master Assets)`), which are caught by the script's exception handler. These DO mark `a1_permanently_failed=TRUE` after 3 failures and DO send the retry / permanently-failed emails.
`db.increment_a1_retry()` accepts `mark_failed_at_max=True|False` to switch between the two behaviors. The empty-folder branch passes `False`; the exception handler passes `True` (default).
### Queue-slot filter
The A1→A2 script processes up to 2 campaigns per run (`campaigns[:2]`). Permanently-failed campaigns are filtered out **before** the slot cap so they no longer block the queue (`scripts/a1_to_a2_box_uploader.py:652`).
### Database tracking
Four fields on the `campaign_status` table:
- `a1_retry_count` (INTEGER): Number of polls where the folder was empty / errored. For empty-folder cases this can grow unbounded; reset on success.
- `a1_last_retry_at` (TIMESTAMP): When last attempt occurred
- `a1_permanently_failed` (BOOLEAN): TRUE only via the genuine-error path (after 3 failures), never via the empty-folder path
- `a1_failure_reason` (TEXT): Why it failed (e.g., "Assets folder not found (tried Master Assets)")
---
## Configuration
### Empty-folder warning threshold
`scripts/a1_to_a2_box_uploader.py`:
```python
EMPTY_FOLDER_WARNING_THRESHOLD = 20 # ~1 hour at 3-min poll cadence
```
Send the one-time warning sooner/later by adjusting this constant.
### Genuine-error retry attempts
`scripts/shared/database.py``increment_a1_retry()`:
```python
MAX_RETRIES = 3
```
Applies only when the caller passes `mark_failed_at_max=True` (default), i.e. the exception handler in `process_campaign()`. The empty-folder branch passes `False` and is unaffected.
---
## Email Notifications
### Empty-folder warning (one-time, at poll 20)
**Template:** `a1_to_a2_no_assets_warning`
**Subject:** ⚠️ Campaign in A1 with no assets yet - {campaign_name}
**Recipients:** Error notification list
**Sent:** exactly once per stuck campaign, when `a1_retry_count == 20`. Counter resets on success, so a future re-stuck event would warn again.
### Genuine-error retry email (attempts 12)
**Template:** `a1_to_a2_no_assets_retry`
**Subject:** ⚠️ No Assets Found (Attempt X/3) - Campaign {name}
**Recipients:** Error notification list
**Trigger:** non-empty-folder errors caught by `process_campaign()`'s exception handler.
### Genuine-error final failure (attempt 3)
**Template:** `a1_to_a2_permanently_failed`
**Subject:** ❌ PERMANENTLY FAILED - Campaign {name} (No Assets After 3 Attempts)
**Recipients:** Error notification list
**Content:**
- Campaign marked as permanently failed (campaign filtered from future queue runs)
- Required actions to fix
- SQL command to manually reset
---
## Manual Operations
### Check Campaign Retry Status
```sql
SELECT campaign_number, campaign_name, status,
a1_retry_count, a1_last_retry_at,
a1_permanently_failed, a1_failure_reason
FROM campaign_status
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
```
### Reset Single Campaign
```sql
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
```
**Or using psql command:**
```bash
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking <<EOF
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';
EOF
```
### Reset All Failed Campaigns
```sql
UPDATE campaign_status
SET a1_retry_count = 0,
a1_last_retry_at = NULL,
a1_permanently_failed = FALSE,
a1_failure_reason = NULL
WHERE a1_permanently_failed = TRUE;
```
### View All Failed Campaigns
```sql
SELECT campaign_number, campaign_name,
a1_retry_count, a1_last_retry_at, a1_failure_reason
FROM campaign_status
WHERE a1_permanently_failed = TRUE
ORDER BY a1_last_retry_at DESC;
```
---
## Failure Scenarios
### Scenario 1: Temporary Empty Folder
**What Happens:**
- Attempt 1: Email sent, retry counter = 1
- Assets added to folder before attempt 2
- Next run finds assets, processes successfully
- Retry counter automatically reset to 0
**Result:** Problem self-resolves, minimal notifications
### Scenario 2: Persistent Empty Folder
**What Happens:**
- Attempt 1 (0 min): Email sent, retry counter = 1
- Attempt 2 (3 min): Email sent, retry counter = 2
- Attempt 3 (6 min): Email sent, retry counter = 3
- Campaign marked permanently failed
- Processing stops, no more emails
**Result:** Support team alerted, infinite emails prevented
### Scenario 3: Wrong Status Assignment
**What Happens:**
- Campaign set to A1 by mistake (no assets intended)
- Fails 3 times, marked permanently failed
- Admin realizes mistake, changes status to different value
- Campaign no longer appears in A1 search results
**Result:** No reset needed, campaign excluded from processing
---
## Testing
### Test Retry Logic
1. Create test campaign in DAM with A1 status
2. Ensure Master Assets folder is empty
3. Run A1→A2 script manually 3 times
4. Verify emails received and database state
```bash
# Run 1
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Check database
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed FROM campaign_status WHERE status = 'A1';"
# Run 2 (wait 3 minutes or run immediately for testing)
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Check again
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed FROM campaign_status WHERE status = 'A1';"
# Run 3
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Verify permanently failed
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "SELECT campaign_number, a1_retry_count, a1_permanently_failed, a1_failure_reason FROM campaign_status WHERE a1_permanently_failed = TRUE;"
```
### Test Reset Logic
```bash
# Reset the test campaign
PGPASSWORD=ferrero_pass_2025 psql -h localhost -p 5437 -U ferrero_user -d ferrero_tracking -c "UPDATE campaign_status SET a1_retry_count = 0, a1_permanently_failed = FALSE WHERE campaign_number = 'TEST_CAMPAIGN';"
# Run again
python scripts/a1_to_a2_box_uploader.py --auth-pfx-v2
# Verify it retries
```
---
## Monitoring
### Dashboard Query: Current Retry Status
```sql
SELECT
COUNT(*) FILTER (WHERE a1_retry_count = 0) as "No Issues",
COUNT(*) FILTER (WHERE a1_retry_count = 1) as "Attempt 1",
COUNT(*) FILTER (WHERE a1_retry_count = 2) as "Attempt 2",
COUNT(*) FILTER (WHERE a1_retry_count >= 3) as "Permanently Failed"
FROM campaign_status
WHERE status = 'A1';
```
### Alert Query: Campaigns Near Failure
```sql
SELECT campaign_number, campaign_name, a1_retry_count, a1_last_retry_at
FROM campaign_status
WHERE status = 'A1'
AND a1_retry_count >= 2
AND a1_permanently_failed = FALSE
ORDER BY a1_retry_count DESC, a1_last_retry_at DESC;
```
---
## Troubleshooting
### Q: Campaign keeps failing even after adding assets
**A:** Check if campaign was marked permanently failed. Reset using SQL command above.
### Q: Want to change from 3 to 5 retry attempts
**A:** Edit `MAX_RETRIES = 3` in `database.py` line ~567. Also update email templates to reflect new maximum.
### Q: How to disable retry logic completely?
**A:** Not recommended, but you can:
1. Set `MAX_RETRIES = 999` (effectively infinite)
2. Or revert to old `a1_to_a2_no_assets` template without retry tracking
### Q: Can I set different retry counts for different campaigns?
**A:** No, it's a global setting. All campaigns use same `MAX_RETRIES` value.
### Q: What if I want to delete permanently failed campaigns from database?
**A:** Don't delete. Instead, change their status to something other than A1. They'll be excluded from processing automatically.
---
## Future Enhancements
Potential improvements for future versions:
1. **Configurable retry timing:**
- Instead of relying on cron frequency (3 min)
- Check `a1_last_retry_at` and skip if too recent
- Allow exponential backoff (3 min, 10 min, 30 min)
2. **Campaign-specific retry limits:**
- Add optional `a1_max_retries` column
- Allow different campaigns to have different thresholds
- Default to global MAX_RETRIES if not set
3. **Automatic cleanup:**
- After 30 days, auto-reset permanently failed campaigns
- Or send weekly digest of stuck campaigns
4. **Webhook notifications:**
- Send to external system when campaign permanently fails
- Integrate with ticketing system
5. **Admin UI:**
- Web interface to view/reset retry status
- Bulk reset operations
---
## Code Locations
**Quick reference for developers:**
| Component | File | Line Range |
|-----------|------|------------|
| Retry check logic | `a1_to_a2_box_uploader.py` | ~176-186 |
| Empty folder detection | `a1_to_a2_box_uploader.py` | ~193-231 |
| Success reset | `a1_to_a2_box_uploader.py` | ~354-356 |
| `get_a1_retry_status()` | `database.py` | ~522-558 |
| `increment_a1_retry()` | `database.py` | ~560-620 |
| `reset_a1_retry()` | `database.py` | ~622-655 |
| Email templates | `notifier.py` | ~593-687 |
| Database migration | `migrations/003_add_a1_retry_tracking.sql` | All |
---
## Change Log
**January 31, 2026:**
- Initial implementation
- 3-attempt retry mechanism
- Permanent failure tracking
- Two new email templates
- This documentation created
**Future updates will be logged here.**