3.3 KiB
| title | aliases | tags | sources | created | updated | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback |
|
|
|
2026-05-08 | 2026-05-08 |
Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback
When a fuzzy text-matching caption aligner fails to find a match for a cue, the search cursor stays at the last successful position. All subsequent cues then fall outside the search window, causing the entire tail of the VTT file to be misaligned or dropped.
The Problem
The aligner works by sliding a from_idx cursor forward through a list of word timestamps as it matches each caption cue. When difflib.SequenceMatcher (or similar) fails to find a match above _MIN_MATCH_RATIO:
# Typical aligner inner loop (simplified)
from_idx = 0
for cue in vtt_cues:
match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
if match:
cue.start_time = match.start
cue.end_time = match.end
from_idx = match.end_idx # advance cursor
# BUG: if match is None, from_idx stays — all future cues search the same stale window
When Gemini paraphrases or rewrites caption text (common in AI-generated transcripts), the ratio drops below the threshold and the cursor freezes.
Fix: bisect Time-Based Fallback
When fuzzy match fails, advance the cursor using the cue's start_time instead of staying frozen:
import bisect
# Build a list of word start times (parallel to words list)
word_starts = [w.start for w in words]
from_idx = 0
for cue in vtt_cues:
match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
if match:
cue.start_time = match.start
cue.end_time = match.end
from_idx = match.end_idx
else:
# Time-based fallback: advance cursor to the word nearest this cue's timestamp
from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)
# cue keeps its original timestamp but cursor moves forward
This prevents cursor stall even when every match in a sequence fails.
Parameter Tuning (Applied in This Session)
| Parameter | Before | After | Reason |
|---|---|---|---|
_SEARCH_WINDOW |
60 | 150 | Gemini paraphrasing can shift word positions significantly |
_MIN_MATCH_RATIO |
0.5 | 0.35 | AI-rewritten captions have lower similarity to transcript words |
Lower _MIN_MATCH_RATIO increases false-positive risk — the bisect fallback acts as a safety valve when ratio goes too low to be reliable.
Diagnostic Pattern
# Log every failed match to identify drift early
if match is None:
logger.warning(
f"No match for cue {cue.index} '{cue.text[:40]}' "
f"from_idx={from_idx}, window={_SEARCH_WINDOW}"
)
from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)
Related Concepts
- wiki/concepts/native-track-blob-url — VTT blob URL for browser
<track>elements
Sources
- daily/2026-05-08.md — Discovered in the video-accessibility caption alignment pipeline; Gemini-generated captions paraphrased content causing ratio drops; cursor stuck, all cues after first failure misaligned