obsidian/wiki/concepts/caption-aligner-cursor-drift.md
2026-05-10 21:21:13 +01:00

3.3 KiB

title aliases tags sources created updated
Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback
caption-aligner
vtt-alignment
cursor-drift
fuzzy-match-drift
python
captions
vtt
fuzzy-matching
video-accessibility
bisect
daily/2026-05-08.md
2026-05-08 2026-05-08

Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback

When a fuzzy text-matching caption aligner fails to find a match for a cue, the search cursor stays at the last successful position. All subsequent cues then fall outside the search window, causing the entire tail of the VTT file to be misaligned or dropped.

The Problem

The aligner works by sliding a from_idx cursor forward through a list of word timestamps as it matches each caption cue. When difflib.SequenceMatcher (or similar) fails to find a match above _MIN_MATCH_RATIO:

# Typical aligner inner loop (simplified)
from_idx = 0
for cue in vtt_cues:
    match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
    if match:
        cue.start_time = match.start
        cue.end_time = match.end
        from_idx = match.end_idx  # advance cursor
    # BUG: if match is None, from_idx stays — all future cues search the same stale window

When Gemini paraphrases or rewrites caption text (common in AI-generated transcripts), the ratio drops below the threshold and the cursor freezes.

Fix: bisect Time-Based Fallback

When fuzzy match fails, advance the cursor using the cue's start_time instead of staying frozen:

import bisect

# Build a list of word start times (parallel to words list)
word_starts = [w.start for w in words]

from_idx = 0
for cue in vtt_cues:
    match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
    if match:
        cue.start_time = match.start
        cue.end_time = match.end
        from_idx = match.end_idx
    else:
        # Time-based fallback: advance cursor to the word nearest this cue's timestamp
        from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)
        # cue keeps its original timestamp but cursor moves forward

This prevents cursor stall even when every match in a sequence fails.

Parameter Tuning (Applied in This Session)

Parameter Before After Reason
_SEARCH_WINDOW 60 150 Gemini paraphrasing can shift word positions significantly
_MIN_MATCH_RATIO 0.5 0.35 AI-rewritten captions have lower similarity to transcript words

Lower _MIN_MATCH_RATIO increases false-positive risk — the bisect fallback acts as a safety valve when ratio goes too low to be reliable.

Diagnostic Pattern

# Log every failed match to identify drift early
if match is None:
    logger.warning(
        f"No match for cue {cue.index} '{cue.text[:40]}' "
        f"from_idx={from_idx}, window={_SEARCH_WINDOW}"
    )
    from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)

Sources

  • daily/2026-05-08.md — Discovered in the video-accessibility caption alignment pipeline; Gemini-generated captions paraphrased content causing ratio drops; cursor stuck, all cues after first failure misaligned