85 lines
3.3 KiB
Markdown
85 lines
3.3 KiB
Markdown
---
|
|
title: "Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback"
|
|
aliases: [caption-aligner, vtt-alignment, cursor-drift, fuzzy-match-drift]
|
|
tags: [python, captions, vtt, fuzzy-matching, video-accessibility, bisect]
|
|
sources:
|
|
- "daily/2026-05-08.md"
|
|
created: 2026-05-08
|
|
updated: 2026-05-08
|
|
---
|
|
|
|
# Caption Aligner Cursor Drift — Fuzzy Match Failure and bisect Fallback
|
|
|
|
When a fuzzy text-matching caption aligner fails to find a match for a cue, the search cursor stays at the last successful position. All subsequent cues then fall outside the search window, causing the entire tail of the VTT file to be misaligned or dropped.
|
|
|
|
## The Problem
|
|
|
|
The aligner works by sliding a `from_idx` cursor forward through a list of word timestamps as it matches each caption cue. When `difflib.SequenceMatcher` (or similar) fails to find a match above `_MIN_MATCH_RATIO`:
|
|
|
|
```python
|
|
# Typical aligner inner loop (simplified)
|
|
from_idx = 0
|
|
for cue in vtt_cues:
|
|
match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
|
|
if match:
|
|
cue.start_time = match.start
|
|
cue.end_time = match.end
|
|
from_idx = match.end_idx # advance cursor
|
|
# BUG: if match is None, from_idx stays — all future cues search the same stale window
|
|
```
|
|
|
|
When Gemini paraphrases or rewrites caption text (common in AI-generated transcripts), the ratio drops below the threshold and the cursor freezes.
|
|
|
|
## Fix: bisect Time-Based Fallback
|
|
|
|
When fuzzy match fails, advance the cursor using the cue's `start_time` instead of staying frozen:
|
|
|
|
```python
|
|
import bisect
|
|
|
|
# Build a list of word start times (parallel to words list)
|
|
word_starts = [w.start for w in words]
|
|
|
|
from_idx = 0
|
|
for cue in vtt_cues:
|
|
match = find_fuzzy_match(cue.text, words, from_idx, window=_SEARCH_WINDOW)
|
|
if match:
|
|
cue.start_time = match.start
|
|
cue.end_time = match.end
|
|
from_idx = match.end_idx
|
|
else:
|
|
# Time-based fallback: advance cursor to the word nearest this cue's timestamp
|
|
from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)
|
|
# cue keeps its original timestamp but cursor moves forward
|
|
```
|
|
|
|
This prevents cursor stall even when every match in a sequence fails.
|
|
|
|
## Parameter Tuning (Applied in This Session)
|
|
|
|
| Parameter | Before | After | Reason |
|
|
|---|---|---|---|
|
|
| `_SEARCH_WINDOW` | 60 | 150 | Gemini paraphrasing can shift word positions significantly |
|
|
| `_MIN_MATCH_RATIO` | 0.5 | 0.35 | AI-rewritten captions have lower similarity to transcript words |
|
|
|
|
Lower `_MIN_MATCH_RATIO` increases false-positive risk — the bisect fallback acts as a safety valve when ratio goes too low to be reliable.
|
|
|
|
## Diagnostic Pattern
|
|
|
|
```python
|
|
# Log every failed match to identify drift early
|
|
if match is None:
|
|
logger.warning(
|
|
f"No match for cue {cue.index} '{cue.text[:40]}' "
|
|
f"from_idx={from_idx}, window={_SEARCH_WINDOW}"
|
|
)
|
|
from_idx = bisect.bisect_left(word_starts, cue.start_time, from_idx)
|
|
```
|
|
|
|
## Related Concepts
|
|
|
|
- [[wiki/concepts/native-track-blob-url]] — VTT blob URL for browser `<track>` elements
|
|
|
|
## Sources
|
|
|
|
- [[daily/2026-05-08.md]] — Discovered in the video-accessibility caption alignment pipeline; Gemini-generated captions paraphrased content causing ratio drops; cursor stuck, all cues after first failure misaligned
|