obsidian/wiki/concepts/caption-aligner-cursor-stall-cascade.md
2026-05-10 21:21:13 +01:00

2.8 KiB

title description tags source created updated
Caption Aligner Cursor Stall Cascade — bisect Fallback for VTT Alignment When fuzzy text-match fails for one VTT cue, cursor stays stuck; all subsequent cues also miss. Fix: bisect.bisect_left time-based fallback
python
vtt
captions
alignment
bisect
video-accessibility
daily/2026-05-08.md 2026-05-08 2026-05-08

Caption Aligner Cursor Stall Cascade — bisect Fallback for VTT Alignment

The Failure Mode

A caption aligner matches VTT cue text to transcript words by walking through a word list with a cursor (from_idx). When a fuzzy-match fails for a single cue, the cursor stays at the miss position. Every subsequent cue starts its search window from the same stuck position — so they all miss too. One failure cascades into all remaining cues being unmatched.

Cue 1 → match at word 45 ✓  (cursor → 45)
Cue 2 → fuzzy miss ✗         (cursor stays at 45)
Cue 3 → search starts at 45, misses ✗  (cursor still 45)
Cue 4 → search starts at 45, misses ✗  (cursor still 45)
... all remaining cues miss

The Fix

Add a time-based fallback using bisect.bisect_left on the transcript word start times. When fuzzy-match fails, advance the cursor to the word nearest to the cue's start time instead of leaving it stuck.

import bisect

# Pre-build a list of word start times (sorted, matches word_list order)
starts = [w.start for w in word_list]

def align_cue(cue, word_list, starts, from_idx):
    # 1. Try fuzzy text match within search window
    window = word_list[from_idx : from_idx + _SEARCH_WINDOW]
    match_idx = fuzzy_match(cue.text, window, min_ratio=_MIN_MATCH_RATIO)
    
    if match_idx is not None:
        return from_idx + match_idx  # advance cursor to match
    
    # 2. Time-based fallback — find word nearest to cue start time
    fallback_idx = bisect.bisect_left(starts, cue.start_time, from_idx)
    # Advance cursor to fallback position so next cue doesn't stall
    return min(fallback_idx, len(word_list) - 1)

Tuning Parameters

After adding the bisect fallback, also widen the fuzzy search window to reduce initial misses:

Parameter Before After Effect
_SEARCH_WINDOW 60 words 150 words Reduces misses from minor timing drift
_MIN_MATCH_RATIO 0.50 0.35 Tolerates partial cue text, filler words

Metrics to Add

Log per-run stats so you can track alignment quality over time:

logger.info(
    "Caption alignment complete: %d text-matched, %d time-fallback, %d unmatched",
    text_match_count, time_fallback_count, unmatched_count
)