2.8 KiB
2.8 KiB
| title | description | tags | source | created | updated | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Caption Aligner Cursor Stall Cascade — bisect Fallback for VTT Alignment | When fuzzy text-match fails for one VTT cue, cursor stays stuck; all subsequent cues also miss. Fix: bisect.bisect_left time-based fallback |
|
daily/2026-05-08.md | 2026-05-08 | 2026-05-08 |
Caption Aligner Cursor Stall Cascade — bisect Fallback for VTT Alignment
The Failure Mode
A caption aligner matches VTT cue text to transcript words by walking through a word list with a cursor (from_idx). When a fuzzy-match fails for a single cue, the cursor stays at the miss position. Every subsequent cue starts its search window from the same stuck position — so they all miss too. One failure cascades into all remaining cues being unmatched.
Cue 1 → match at word 45 ✓ (cursor → 45)
Cue 2 → fuzzy miss ✗ (cursor stays at 45)
Cue 3 → search starts at 45, misses ✗ (cursor still 45)
Cue 4 → search starts at 45, misses ✗ (cursor still 45)
... all remaining cues miss
The Fix
Add a time-based fallback using bisect.bisect_left on the transcript word start times. When fuzzy-match fails, advance the cursor to the word nearest to the cue's start time instead of leaving it stuck.
import bisect
# Pre-build a list of word start times (sorted, matches word_list order)
starts = [w.start for w in word_list]
def align_cue(cue, word_list, starts, from_idx):
# 1. Try fuzzy text match within search window
window = word_list[from_idx : from_idx + _SEARCH_WINDOW]
match_idx = fuzzy_match(cue.text, window, min_ratio=_MIN_MATCH_RATIO)
if match_idx is not None:
return from_idx + match_idx # advance cursor to match
# 2. Time-based fallback — find word nearest to cue start time
fallback_idx = bisect.bisect_left(starts, cue.start_time, from_idx)
# Advance cursor to fallback position so next cue doesn't stall
return min(fallback_idx, len(word_list) - 1)
Tuning Parameters
After adding the bisect fallback, also widen the fuzzy search window to reduce initial misses:
| Parameter | Before | After | Effect |
|---|---|---|---|
_SEARCH_WINDOW |
60 words | 150 words | Reduces misses from minor timing drift |
_MIN_MATCH_RATIO |
0.50 | 0.35 | Tolerates partial cue text, filler words |
Metrics to Add
Log per-run stats so you can track alignment quality over time:
logger.info(
"Caption alignment complete: %d text-matched, %d time-fallback, %d unmatched",
text_match_count, time_fallback_count, unmatched_count
)
Related
- wiki/tech-patterns/redis-celery-worker-queue — pipeline this aligner runs inside
- wiki/tech-patterns/vtt-descriptive-transcript-regeneration — downstream VTT consumer