vault backup: 2026-04-29 20:39:50

2026-04-29 20:39:50 +01:00 · 2026-04-29 20:39:50 +01:00 · 2f10aa4afa
commit 2f10aa4afa
parent 5302dd187d
2 changed files with 56 additions and 0 deletions
--- a/Daily/2026-04-29.md
+++ b/Daily/2026-04-29.md
@ -506,3 +506,6 @@ tags: [daily]
 - 20:28 (3min) | `video-accessibility`
  - **Asked:** Check the 400 error on optical server for the PATCH /vtt endpoint without modifying code
  - **Done:** Identified security validation failure in the validate_json_payload function on the optical server
+- 20:38 (7min) | `video-accessibility`
+  - **Asked:** Check why the server returns 400 error when patching VTT files on the optical server.
+  - **Done:** Found that the middleware's XSS validation regex matches "script" substring in the "audio_description_vtt" field name, causing false-positive rejection.
--- a/wiki/tech-patterns/xss-validation-false-positive-word-boundary.md
+++ b/wiki/tech-patterns/xss-validation-false-positive-word-boundary.md
@ -0,0 +1,53 @@
+---
+tags: [tech-patterns, auto-generated]
+source: video-accessibility
+created: 2026-04-29
+---
+
+# Fixing XSS Validation False Positives in JSON Field Names
+
+## When to use
+When a content security middleware rejects legitimate JSON payloads with "400 Bad Request" errors, and the actual field names or values contain substrings matching XSS patterns (like "script", "onclick") but are not actual security threats. This pattern explains how to diagnose and fix overly aggressive regex validation.
+
+## Prerequisites
+- A web application with JSON request/response validation middleware
+- XSS protection patterns implemented via regex matching
+- Access to server logs and middleware source code
+- Understanding of regex word boundaries and Python string validation
+
+## Steps
+1. Reproduce the failing request and capture the exact payload that returns 400
+2. Examine the middleware validation code, specifically the regex patterns used for XSS detection
+3. Check if the regex pattern uses word boundaries (`\b...\b`) or is unanchored
+4. Identify which field name or value contains a substring matching the pattern
+5. Update the regex to use word boundaries: change `r"(script|javascript|vbscript|onload|onerror|onclick)"` to `r"\b(script|javascript|vbscript|onerror|onclick)\b"`
+6. Alternatively, separate field name validation from value validation to apply stricter rules only to content, not identifiers
+7. Redeploy and retest with the problematic payload
+
+## Key Configuration
+```python
+# Before (too aggressive):
+DANGEROUS_PATTERN = r"(script|javascript|vbscript|onload|onerror|onclick)"
+
+# After (with word boundaries):
+DANGEROUS_PATTERN = r"\b(script|javascript|vbscript|onload|onerror|onclick)\b"
+
+# Better: distinguish between field names and values
+def validate_json_values(obj):
+    for key, value in obj.items():
+        # Apply relaxed validation to keys (field names)
+        if isinstance(key, str):
+            validate_string_content(key, strict=False)
+        # Apply strict XSS validation to values only
+        if isinstance(value, str):
+            validate_string_content(value, strict=True)
+```
+
+## Gotchas
+- Regex patterns without word boundaries will match substrings within legitimate identifiers (e.g., "description", "prescription", "subscription")
+- Validating both JSON keys and values with identical strict patterns is too broad—field names should allow more flexibility than user-facing content
+- Container restart may be required if old code version with aggressive regex was deployed; verify container image timestamp matches current code
+- The pattern `script` inside `description` will always match without word boundaries, causing false positives on VTT (video text track) and similar subtitle fields
+
+## Source
+Project: video-accessibility