banner_studio/ANIMATION_V1.md
Simeon Schecter e51686a3d4 ANIMATION_V1: design spec for the V1 animation system
Specifies the V1 animation system end-to-end. Authored after two
Deep Research passes (preserved as ANIMATION_V1_RESEARCH.md and
ANIMATION_V1_DESIGN_DECISIONS.md for provenance).

ANIMATION_V1.md covers:
- Hard constraints: Chrome Heavy Ad Intervention (4MB / 15s burst /
  60s total CPU), composite-only animation, 150KB initial-load cap,
  GSAP via s0.2mdn.net CDN, free-tier only.
- Custom JSON schema (not Lottie) — block-based timeline, absolute
  start times, preset references only, no inline keyframes. Designed
  for AI authoring and human-readable diffs.
- 25-preset library across entrance / exit / emphasis / typography /
  mask / list categories. Each preset specifies start state, end
  state, default ease, default duration, and split/mask requirements.
- 9-category easing matrix using GSAP stock eases; bounce, slow,
  rough, and circ excluded from the V1 surface.
- Mask system: mask is a property on the masked layer (not a
  standalone layer). clip-path mandatory over interactive elements
  to prevent ghost-click failures. Konva ↔ HTML parity table.
- Per-character animation: SplitType at render time, Dropflow at
  spec time, automated aria-label / aria-hidden contract, 150-node
  ceiling enforced by QA gate.
- Animated bounding-box math: discrete sampling at 30 fps,
  unionBoundingBox() called from asset selection, render worker,
  and QA gate. Adds required_source_size to ResolvedLayer.
- 12 QA gates (G1-G12) covering schema, performance, asset,
  accessibility, and parity.

ARCHITECTURE.md updates:
- Forward-notes section at the top pointing to ANIMATION_V1.md and
  RESOLVED_FEED.md, matching the existing Part 7 forward-note style.
- Inline forward note in the Part 3 animation stack block.
- Old content preserved as historical record.

Decisions baked in (resolved during draft):
- Loops are global (max 3), not per-block. Per-block loops invite
  nested-infinite-loop bugs in AI-generated specs.
- Block triggers are time-anchored only. Event/interaction triggers
  wait for V2 rich media.
- blur_in and shake_horizontal dropped from the 27-preset research
  list. Blur is a video pattern; shake reads as a rendering error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-18 20:12:58 -04:00

34 KiB
Raw Permalink Blame History

ANIMATION_V1.md

Status: V1 design specification. Not implemented in the vertical slice. The slice ships three preset animations (fade_in, hold, fade_out) as a forward-pointer to this system. See SLICE_DEVIATIONS.md for the deltas.

This document supersedes the animation discussion in ARCHITECTURE.md.

The animation system is the product. Without it this is a static-image templating tool, and there is no shortage of those. Every architectural decision in this document is downstream of one premise: a motion-design-led creative team must be able to adopt this platform without feeling they have accepted reduced expressive range.

This document is prescriptive. It specifies the JSON schema, the preset library, the easing matrix, the mask system, the per-character animation contract, the bounding-box math, and the QA gates that enforce all of the above. It is meant to be implementable as written.


1. Goals and Constraints

Goals

  1. Designer-authored, AI-scaled. Designers author templates with named animation presets per layer. The AI orchestrator composes presets into a per-banner timeline using a brand-voice-aware rationale, and never invents new motion.
  2. Two runtimes, one contract. A single JSON schema drives both the Konva browser preview and the Playwright-headless final render. What the reviewer sees is what ships.
  3. Production-grade motion vocabulary. Translation, scale, rotation, opacity. Masks (geometric and image). Per-character text animation. All four transforms with industry-idiomatic easing.
  4. Composable, not procedural. A timeline is an ordered list of blocks. Each block applies one preset to one layer. The AI's job is to choose presets and order blocks — never to emit raw keyframes.

Hard constraints (these define the rails)

These come from the Round 1 research pass and are non-negotiable. Every primitive in this document is designed to fit inside them.

  • Chrome Heavy Ad Intervention. An ad is unloaded if it exceeds:
    • 4 MB cumulative network bandwidth
    • 15 s of main-thread CPU within any rolling 30 s window
    • 60 s of total main-thread CPU over the page lifetime
  • Composite properties only. Animations use transform (translate, scale, rotate) and opacity exclusively. Animating width, height, top, left, margin, or letter-spacing forces layout recalculation and burns the CPU budget. The schema cannot express these.
  • Duration. 15 s default, 30 s hard ceiling (CM360). Loops global, 3× maximum, must fit inside the duration ceiling.
  • Initial load weight. 150 KB zip cap (CM360 / IAB LEAN). Polite load may add up to 2.2 MB.
  • CDN-hosted runtime. GSAP loads from s0.2mdn.net/ads/studio/cached_libs/ and does not count against the 150 KB cap.
  • GSAP free tier only. No Club GreenSock plugins. SplitText is replaced by the MIT-licensed SplitType library.

Out of scope for V1

  • Keyframe authoring UI. Designers compose presets in template code; no visual timeline editor.
  • Custom easing curves. The GSAP stock catalog is the V1 surface.
  • Lottie import or export. We do not ingest After Effects work.
  • Interaction triggers (hover, click-to-play state changes). Timelines are time-anchored only.
  • Audio, video, expandable, or rich-media formats.

2. The Schema

2.1 Why a custom schema (and not Lottie)

Lottie is the industry standard for vector animation interchange. It is also the wrong fit for V1. The Round 2 research pass landed on a custom schema for three reasons, all load-bearing:

  1. AI authoring. Lottie's bezier-tangent representation (i.x, o.y floating-point arrays) forces the orchestrator to emit cubic-bezier math instead of semantic strings like "power2.out". This is the exact failure mode that hallucinations love.
  2. Diff legibility. A version diff that reads "ease": "power2.out", "preset": "scale_pop" is auditable. A diff that reads "i": {"x": [0.25], "y": [0.1]} is not. The version service depends on humans being able to read diffs.
  3. Composite-only enforcement. Lottie intrinsically mixes composite (p, s, r, o) with non-composite (fc, sw, vector paths, mattes, expressions). Stripping the non-composite half at export time is a translation layer we would have to write, maintain, and defend against drift.

The schema described below is inspired by Lottie's declarative timeline shape, but every field maps 1:1 to a GSAP call. The orchestrator produces JSON; the runtime hands it to GSAP without transformation.

2.2 Top-level shape

{
  "version": "1.0.0",
  "meta": {
    "duration": 15.0,          // seconds, ≤ 30
    "loops": 1,                // 1 = play once, 2 = play+repeat, 3 = play+repeat+repeat. Max 3.
    "fps_target": 60
  },
  "blocks": [
    // one entry per animation event, ordered by `start`
  ]
}

blocks is ordered, time-anchored, and never overlaps in semantics: each block describes one preset applied to one layer over one interval. Multiple blocks may run simultaneously (different layers animating in parallel is the common case), but the timeline reads top-to-bottom in start order.

2.3 Block shape

{
  "id": "block_hero_in",                // stable id for diffs and overrides
  "layer_id": "hero_image",             // references BannerSpec layer
  "preset": "scale_up_fade",            // see Section 3
  "start": 0.0,                         // seconds, relative to timeline 0
  "duration": 0.8,                      // seconds, overrides preset default
  "ease": "power2.out",                 // overrides preset default. Optional.
  "stagger": null,                      // see Section 5 for per-character blocks
  "mask_ref": null                      // see Section 4 for animated masks
}

Three rules govern the block:

  • preset is required. No anonymous keyframes. If a designer needs a motion that isn't in the preset library, the answer is to add it to the library, not to inline keyframes.
  • duration and ease are optional overrides. The preset's defaults are the right answer 95% of the time; the override is for the 5% where a designer has a specific reason.
  • start is absolute, not relative. Block N+1 does not implicitly begin when block N ends. This sounds verbose, but it makes the diff trivial: editing block 3's duration doesn't ripple into block 4's start time.

2.4 Worked example

A 300×250 banner with hero image, headline, and CTA. Hero scales in, headline reveals per-character, CTA pops:

{
  "version": "1.0.0",
  "meta": { "duration": 4.0, "loops": 1, "fps_target": 60 },
  "blocks": [
    {
      "id": "hero_in",
      "layer_id": "hero",
      "preset": "scale_up_fade",
      "start": 0.0,
      "duration": 0.8
    },
    {
      "id": "headline_in",
      "layer_id": "headline",
      "preset": "fade_up_chars",
      "start": 0.5,
      "duration": 0.6,
      "stagger": 0.02
    },
    {
      "id": "cta_in",
      "layer_id": "cta",
      "preset": "scale_pop",
      "start": 1.4,
      "duration": 0.6
    },
    {
      "id": "cta_pulse",
      "layer_id": "cta",
      "preset": "pulse_gentle",
      "start": 2.2,
      "duration": 1.5
    }
  ]
}

A reviewer reading this diff knows exactly what changed. The orchestrator emitting this JSON has four lookups and three integer-ish numbers to produce per block. The runtime translates each block to one gsap.fromTo() call against the preset's defined start and end states.

2.5 Loops

meta.loops is a global counter. loops: 3 replays the entire timeline three times; the cumulative duration must fit inside the IAB / CM360 ceiling (3 × meta.duration ≤ 30 s). Per-block loops are not supported in V1 — they invite nested-infinite-loop bugs that an AI generator will ship with surprising regularity.

For "always-on" effects that read as loops (the gentle pulse on a CTA, the float on a product image), use the preset's built-in yoyo (Section 3). The preset, not the meta loop counter, owns that behavior.

2.6 Where the schema lives in the BannerSpec

The animation timeline attaches to each Artboard:

interface Artboard {
  artboard_id: string;
  width: number;
  height: number;
  layers: ResolvedLayer[];
  animation: AnimationTimeline;   // ← this document
}

Each artboard has its own timeline because animation choices vary per size (a 728×90 leaderboard cannot afford the same staggered entrance as a 300×600 half-page). The orchestrator emits one timeline per size as part of its per-size copy decision. Rationale lands in the existing ai_reasoning.animation_rationale field shared across sizes.


3. The Preset Library

25 named presets, organized by category. Every preset compiles to one or more GSAP fromTo calls against composite properties only. Every preset defines a start state, an end state, a default ease, and a default duration. Designers reference presets by name; the AI orchestrator selects by name.

The selection criteria for inclusion:

  • Maps to composite transforms or opacity only.
  • Renders identically in Konva and in headless Chromium.
  • Is a pattern that appears in production-quality display advertising, not just motion-design tutorial reels.
  • Does not require a Club GreenSock plugin.

3.1 Entrance presets

Preset Start End Ease Duration Use
fade_in opacity 0 opacity 1 power1.inOut 0.8 s Backgrounds, disclaimers, logos.
slide_in_left x 100%, op 0 x 0, op 1 power2.out 0.8 s Hero imagery, body copy.
slide_in_right x 100%, op 0 x 0, op 1 power2.out 0.8 s Side-panel reveals.
slide_in_up y 100%, op 0 y 0, op 1 power2.out 0.8 s CTAs, bottom-anchored copy.
slide_in_down y 100%, op 0 y 0, op 1 power2.out 0.8 s Top-anchored headers, badges.
scale_up_fade scale 0.8, op 0 scale 1, op 1 power2.out 0.8 s Hero products, centered logos.
scale_down_fade scale 1.2, op 0 scale 1, op 1 power2.out 1.0 s Lifestyle backgrounds settling in.
scale_pop scale 0.5, op 0 scale 1, op 1 back.out(1.7) 0.6 s Badges, CTA buttons, price circles.

3.2 Exit presets

Preset Start End Ease Duration Use
fade_out opacity 1 opacity 0 power1.inOut 0.6 s Scene transitions, legacy copy.
slide_out_left x 0, op 1 x 100%, op 0 power2.in 0.6 s Sweep clearing the frame.
slide_out_right x 0, op 1 x 100%, op 0 power2.in 0.6 s Sweep clearing the frame.

3.3 Emphasis presets

These are yoyo presets — they animate from base state to peak state and return. The block's duration field is the full there-and-back time.

Preset State change Ease Duration Use
pulse_gentle scale 1 ↔ 1.05 sine.inOut 1.5 s Sustained CTA attention.
pulse_strong scale 1 ↔ 1.15 power2.inOut 0.8 s Urgent promotional badges.
float_vertical y 0 ↔ 10 px sine.inOut 2.0 s Floating product imagery.

3.4 Typography presets

All typography presets require text splitting. The stagger field on the block controls the interval between consecutive characters/words/lines. Splitting happens at runtime via SplitType — see Section 5.

Preset Start End Ease Stagger Split Use
fade_up_chars y 20, op 0 y 0, op 1 power2.out 0.02 s chars Premium headlines.
fade_up_words y 20, op 0 y 0, op 1 power2.out 0.04 s words Subheads, longer copy.
typewriter op 0 op 1 steps(1) 0.04 s chars Tech, narrative, informative.
scramble_chars text random text final linear 0.03 s chars Cyber, high-tech promos.
scale_pop_chars scale 0, op 0 scale 1, op 1 back.out(1.5) 0.04 s chars Bold, energetic typography.

3.5 Mask presets

Masks animate the visibility of a layer through a moving clip shape. See Section 4 for the full mask system; these presets reference it.

Preset Mask shape Animates Ease Duration Use
mask_wipe_right rectangle clip 0% → 100% width power2.inOut 1.0 s Revealing new background.
mask_wipe_up rectangle clip 0% → 100% height power2.inOut 1.0 s Rising imagery reveal.
mask_circle_out circle r 0 → max power3.inOut 1.2 s Cinematic scene transitions.
mask_text_reveal layer bbox y 100% → 0% power3.out 0.8 s Text rising from invisible floor.

3.6 List/stagger presets

For multi-element layers (carousels, icon rows, bulleted lists). Stagger applies to the layer's children, not to characters.

Preset Start End Ease Stagger Use
stagger_slide_in x 50 px, op 0 x 0, op 1 power2.out 0.08 s Bullets, multi-product rows.
stagger_pop_up scale 0.8, op 0 scale 1, op 1 back.out(1.5) 0.08 s Social icons, logo lockups.

3.7 Explicit exclusions

Presets in this list are common in tutorial-grade libraries but are not shipped in V1. The rationale is recorded so the next person who asks "why not jello?" has an answer.

  • jello, rubberBand, wobble, tada, headShake, swing — multi-axis skew/scale combos that read as amateur in premium display. Animate.css ships them; production-quality ads don't.
  • shake_horizontal and other shake patterns — would require GSAP's rough ease, which is excluded in Section 4. Shake patterns read as rendering errors more often than as deliberate emphasis.
  • blur_in — animates filter: blur(Npx). Filter is a composite-eligible property in Chromium, but it forces repaint, not just compositor work, and burns the CPU budget faster than transforms. The use case (premium reveals) is better served by scale_down_fade and mask_circle_out. Blur belongs in video, not display.
  • Drop-shadow and box-shadow animation — same repaint cost as blur. If a layer needs a dimensional shadow that "appears," bake the shadow into a transparent asset and animate that asset's opacity.
  • All Club GreenSock pluginsMorphSVG, SplitText, Physics2D, DrawSVG, MotionPath, CustomEase. V1 stays on the free GSAP tier with SplitType as the SplitText replacement.

4. Easing

Easing is the difference between mechanical motion and motion that reads as designed. The V1 surface is the GSAP stock ease catalog, restricted to the curves that actually appear in production display advertising.

4.1 Principles

  • Entrance eases out. Elements arriving on stage decelerate so the eye can catch them.
  • Exit eases in. Elements leaving accelerate away to clear the visual field.
  • Emphasis uses sine.inOut or power*.inOut. Symmetric eases for symmetric (yoyo) motion.
  • Linear is reserved for typewriters, progress bars, and continuous panning. Almost never the right answer for entrances or exits.

4.2 Easing matrix

Category Default Alt 1 Alt 2
Entrance (translation) power2.out power3.out expo.out
Entrance (scale) power2.out back.out(1.5) elastic.out(1, 0.5)
Exit (translation) power2.in power3.in expo.in
Exit (scale) power2.in back.in(1.2) power1.in
Emphasis (pulse, float) sine.inOut power1.inOut power2.inOut
Slow reveal (fade) power1.inOut linear sine.out
Snappy state change expo.inOut power4.inOut steps(1)
Per-character stagger power2.out back.out(1.2) linear
Mask reveal power3.inOut power2.inOut expo.inOut

4.3 Excluded eases

The GSAP catalog includes curves that are rarely or never used in production display. They are hidden from the V1 surface to keep the orchestrator's decision space small.

  • bounce.in / bounce.out — gravity-simulation cartoon physics. Reads as children's brand or casual gaming. back covers the use case with more taste.
  • slow — cinematic speed-ramping. Lingers in the middle of the transition. Wrong for 15-second display.
  • rough — randomized jitter. Reads as a rendering error.
  • circ — mathematically rigid circular arc. The power* family feels more physical.

The accessible palette is: power1power4, expo, sine, back, elastic, linear, steps. Each with .in, .out, .inOut variants where applicable. ~24 distinct eases — enough range for production motion design, small enough that the AI doesn't get decision fatigue.


5. Masks

Masks are first-class in V1. They unlock the "reveal" patterns that read as premium motion (logo-shaped reveals, circular wipes, text rising from an invisible floor) and are the difference between "ad" and "ad you remember."

5.1 The mask as a layer property

A mask is not a standalone timeline layer. It is a property on the masked layer. Co-locating the mask with the layer it masks keeps z-index relationships implicit and keeps the diff readable.

{
  "layer_id": "hero_image",
  "mask": {
    "type": "clip-path",            // "clip-path" | "image"
    "geometry": "circle",            // for clip-path: "rect" | "circle" | "polygon"
    "asset_id": null,                // for type: "image", references the mask asset
    "animation": {
      "preset": "mask_circle_out",
      "duration": 1.2,
      "ease": "power3.inOut"
    }
  }
}

Blocks reference an animated mask via mask_ref pointing to the masked layer's id. The block describes the layer's animation; the mask's animation is internal to the mask object.

5.2 clip-path vs. mask-image: hit-testing decides

CSS gives us two mask primitives. They are not interchangeable.

  • clip-path — actually clips the element's geometry. Pointer events outside the clip do not fire.
  • mask-image — modifies the alpha channel only. The element's bounding box still receives pointer events, including in the "invisible" region.

For animated reveals over interactive elements — which is the most common case, because CTAs are interactive and CTAs are the most-revealed thing — clip-path is mandatory. mask-image lets users click an "invisible" button that's mid-reveal, fires the click handler, and ships the wrong analytics. This is a real bug we will not have because the schema doesn't let us write it: mask.type: "clip-path" is the default and any geometric mask animation uses it.

mask-image is reserved for non-interactive layers (background imagery, decorative elements) where alpha-channel masking is the only way to achieve the effect (e.g., a brushstroke wipe, a textured reveal).

5.3 Konva ↔ HTML parity

Each mask type renders in both runtimes:

Mask type Konva HTML
clip-path rect Group.clipFunc with a rect path CSS clip-path: inset(...)
clip-path circle Group.clipFunc with an arc CSS clip-path: circle(...)
clip-path polygon Group.clipFunc with a polygon CSS clip-path: polygon(...)
image globalCompositeOperation: 'source-in' on cached group CSS mask-image: url(...)

The known parity risk is mask-image SVG scaling. Konva masks are pixel- exact; CSS mask-size and mask-position can drift if the underlying layer's box model changes between preview and render. The parity test suite (Section 9) renders each mask preset in both runtimes and diffs the output PNG. Drift > 2px on any sampled frame fails the test.

5.4 Animated mask geometry

The four mask presets in Section 3.5 cover the patterns that show up in production display:

  • Linear wipe (mask_wipe_right, mask_wipe_up): rectangular clip-path animating one edge from 0% to 100%.
  • Circular reveal (mask_circle_out): clip-path circle with radius animating from 0 to a value large enough to cover the layer's bounding box.
  • Text rising from a floor (mask_text_reveal): clip-path rect fixed at the layer's bbox; text inside animates y: 100% → 0%. The mask itself doesn't animate — the masked content does.

Mask geometry that requires designer authorship (logo-shaped masks, custom polygon shapes) is supported by mask.geometry: "polygon" with a path string supplied by the template, but is not part of any preset. The orchestrator does not invent mask shapes.


6. Per-Character Text Animation

Per-character animation is the difference between "templated copy" and "designed message." It is also the single most performance-sensitive primitive in the V1 system. This section is precise because it has to be.

6.1 The split happens at runtime, in the DOM, via SplitType

For the headless Chromium render (the one that ships), text splitting happens in the DOM at runtime using SplitType. The reasons are non- negotiable:

  • Accessibility. Screen readers must read the headline as one coherent string, not as 47 phonetic letters. SplitType supports the ARIA pattern (see 6.2). Canvas-rendered text destroys accessibility outright.
  • DOM-based QA. Ad-server review bots parse the DOM to verify text content. Canvas text is invisible to them and triggers rejection.
  • Text selection and SEO. Native browser text selection works on DOM text. Canvas text does not select.

For the Konva preview (the one designers and reviewers see), per-character positioning is computed at spec-resolution time from Dropflow's glyph positions. Konva renders each character as a KText node positioned to match Dropflow's output. This preserves the preview-render parity contract — the preview is positionally identical to what SplitType produces at runtime, sub-pixel kerning and ligatures aside.

The split-of-labor:

  • Dropflow (spec time, both runtimes): computes glyph positions used to drive Konva preview rendering and to validate the assumption that SplitType will produce the same layout.
  • SplitType (render time, Chromium only): does the actual DOM split for the GSAP-driven animation.

6.2 The accessibility contract

Splitting text into per-character spans destroys its semantic continuity. The compilation engine must apply the standard ARIA mitigation, and the schema does not let the designer or the orchestrator forget:

For every layer where the block's preset is in Section 3.4 (typography presets requiring char-split), the export pipeline must:

  1. Set aria-label="<original text content>" on the parent layer element.
  2. Set aria-hidden="true" on every SplitType-generated child span.

This is automated, not designer-authored. A QA gate (Section 9) verifies that every char-split layer in the exported HTML has the parent label and the hidden children before the export passes.

6.3 The performance ceiling

Per-character animation creates one DOM node per character. Every node becomes its own GPU composite layer when animated via transform. Mobile browsers have a hard ceiling on simultaneous composite layers, and breaching it causes jank visible in the final render.

The V1 ceiling: 150 simultaneously animated character nodes.

This is not a recommendation; it's a QA gate. If a banner's exported timeline has more than 150 char-split nodes animating in a single 30-second window, the export fails.

Implications for the orchestrator:

  • A 60-char headline with fade_up_chars consumes 60 of the budget.
  • A second char-split element (a 40-char subhead with fade_up_chars) brings the total to 100.
  • A third char-split element is risky. The orchestrator should prefer fade_up_words (510 nodes) for subheads and reserve char-splits for headlines.

Body copy and legal disclaimers must never use char-split. The orchestrator selects fade_up_words or fade_up_lines for any layer whose resolved text exceeds 80 characters, regardless of designer preset choice.

6.4 Concrete per-character parameters

The defaults below are baked into the preset library and should not be overridden lightly. They reflect production-grade motion design, not arbitrary timings:

Pattern Stagger Ease Distance / scale Max chars
Typewriter 0.030.05 s steps(1) n/a 80
Fade-up 0.02 s power2.out 1520 px 80
Scale-pop 0.04 s back.out(1.5) scale 0 → 1 50
Scramble 0.03 s linear n/a (text mutation) 40

7. The Animated Bounding Box

The most consequential and least obvious part of the animation system is the asset-sizing problem.

7.1 The problem

A hero image animating with scale_down_fade starts at scale: 1.2 and ends at scale: 1.0. The source image must be sized for the largest frame — 1.2 × layer_width — not the final frame. Ship the source at the final-frame size and the first frame is upscaled, blurry, and visibly broken.

The same logic applies to:

  • Translation (slide_in_*): source must cover the layer's position at every keyframe, including the offscreen start.
  • Rotation: a rotated rectangle's axis-aligned bounding box grows with the angle. A 600×600 image rotated 15° needs a ~775×775 source.
  • Compound transforms (scale + rotate + translate, e.g. an entrance that combines scale_up_fade with a small rotation): the bounding box is the union of every transformed corner at every sampled time.

7.2 The math

For a layer with rectangular bounding box, given a transform with translation (tx, ty), scale (sx, sy), and rotation θ at time t:

For each corner (cx, cy) of the base rectangle:

x'(t) = tx(t) + sx(t) · (cx · cos(θ(t))  cy · sin(θ(t)))
y'(t) = ty(t) + sy(t) · (cx · sin(θ(t)) + cy · cos(θ(t)))

The layer's axis-aligned bounding box at time t is the min/max of x'/y' over the four corners. The animation's bounding box is the union of every per-t bounding box.

Analytical solution of the extrema is computationally expensive and brittle across compound transforms. V1 uses discrete sampling: 30 samples per second of animated duration, taking the union of all sampled bounding boxes. At 30 fps over a 1.0-second animation, that's 30 4-corner evaluations — negligible cost, and the union over-approximates the true bbox by at most one sample's worth of motion, which is well within tolerance for source-asset sizing.

7.3 unionBoundingBox() — one function, many call sites

A single function in packages/layout-engine:

function unionBoundingBox(
  layer: ResolvedLayer,
  blocks: AnimationBlock[],
  fps: number = 30
): BoundingBox;

It is called from:

  • Smart asset selection (packages/api-lib/asset-selection): picks source crop sized for the union bbox, not the layer rect.
  • Render worker (packages/render-worker): fails fast if the source asset's dimensions are smaller than the union bbox.
  • QA gate (packages/qa-gates): flags upscaled frames before export, with the field path of the offending layer.

7.4 What changes in the BannerSpec

A resolved layer gains a required_source_size field:

interface ResolvedLayer {
  // ... existing fields
  required_source_size?: { width: number; height: number };
}

The orchestrator does not populate this field. The layout engine computes it after resolving the layer's animation blocks. The render worker reads it and fails the render if the actual source is smaller. The asset selector reads it when requesting a crop from the asset service.


8. Library and Runtime Mechanics

8.1 GSAP via the Google CDN

GSAP 3 loads from https://s0.2mdn.net/ads/studio/cached_libs/gsap_3.9.1_min.js. This URL is whitelisted by CM360, which means GSAP's bundle does not count against the 150 KB initial-load cap. The HTML export injects exactly this URL into the document head. Local-relative paths or other CDNs (including unpkg, jsDelivr) cause CM360 to flag the library as a 4th-party call and count its weight, which reliably fails the 150 KB gate.

8.2 SplitType in place of SplitText

The MIT-licensed SplitType library (~2 KB minified) replaces GSAP's Club-only SplitText. SplitType is bundled into the exported HTML, not loaded from a CDN, because no major ad CDN whitelists it. It is small enough to fit within the 150 KB budget alongside the rest of the banner. The export pipeline applies the aria-label/aria-hidden contract from 6.2 in the same pass that calls SplitType.

8.3 No localStorage, no sessionStorage

CM360 rejects creatives that reference browser storage APIs. The export pipeline's static-analysis pass scans the compiled JS bundle for any reference to localStorage or sessionStorage and fails the export if found. None of our code uses these APIs; this gate exists to catch a third-party dependency drift.

8.4 will-change: transform on animated layers

The export pipeline annotates every layer that has at least one animation block with will-change: transform (and opacity if applicable). This promotes the layer to its own GPU compositor layer ahead of time, preventing first-frame jank from late-binding the layer when GSAP starts animating it.

8.5 prefers-reduced-motion

The exported HTML includes a @media (prefers-reduced-motion: reduce) block that snaps every animation to its end state instantly. This serves two purposes:

  • Accessibility: users who have requested reduced motion see the final composition immediately.
  • Headless PNG capture: Playwright launches with --force-prefers-reduced-motion, which deterministically forces the banner to its final-frame state. Capturing the static backup PNG becomes a single screenshot with no waitForTimeout, which removes the largest source of flakiness in the render pipeline.

9. QA Gates

Every gate below runs at export time. An export that fails any gate is held — the spec is not corrupt, but it's not shippable, and the review UI flags it for human resolution.

9.1 Schema gates

  • G1 — Composite-only. Every block's preset is in the V1 library. Inline keyframes are not expressible in the schema, so this gate is enforced by the schema's TypeScript types, not by a runtime check.
  • G2 — Duration ceiling. meta.duration × meta.loops ≤ 30.
  • G3 — Block ordering. Blocks are sorted by start. No block's start + duration exceeds meta.duration.
  • G4 — Loop count. meta.loops is in {1, 2, 3}.

9.2 Performance gates

  • G5 — Char-split ceiling. Sum of animated char-split nodes across all simultaneously-running blocks in any 30-second window ≤ 150.
  • G6 — Weight budget. Final zipped HTML ≤ 150 KB initial load.
  • G7 — No storage APIs. Static analysis finds no reference to localStorage or sessionStorage.
  • G8 — GSAP via CDN. The exported HTML loads GSAP from s0.2mdn.net, not from a local path.

9.3 Asset gates

  • G9 — Source size. Every layer's actual source asset dimensions are ≥ the required_source_size computed by unionBoundingBox. Fail with the layer id and the deficit.
  • G10 — Crossorigin. Every image element has crossorigin="anonymous" (required for Konva canvas reads in the preview).

9.4 Accessibility gates

  • G11 — Aria contract. Every layer with a char-split preset has aria-label set to the original text on the parent and aria-hidden="true" on every child span.

9.5 Parity gates

  • G12 — Konva ↔ Playwright pixel diff. For each preset, the parity test suite renders a reference banner in both runtimes and compares PNG output. Per-frame pixel diff > 2 px on any sampled keyframe fails the build (not the export — this is a development-time gate, not a per-banner gate).

10. Open Questions

These were debated during the research passes and design and are deferred — either to a later V1 iteration or to V2. Tracking them here so they don't get lost.

  • Interaction triggers. V1 timelines are time-anchored. The Bannerflow pattern of decoupling animation from text content supports localization but not user interaction. When V2 introduces hover, click-to-expand, or in-banner video, the schema gains an event-trigger model that block references can hook into.
  • Custom eases. GSAP's CustomEase is Club-only. If a designer ever needs a custom curve, V2 either licenses Club GreenSock or implements cubic-bezier eases via the free GSAP power.in/out family with bespoke control points. The schema field is already string; this is purely an authoring decision, not a schema change.
  • Mask shape authoring. V1 supports rect, circle, and designer-supplied polygons. Logo-shaped masks (SVG paths) work as mask.geometry: "polygon", but there is no authoring UI — designers paste path strings into the template. V2 ships a small editor for this.
  • Per-block loops. Not in V1. If a use case for per-block loops emerges (a single icon that pulses 5× while the rest of the timeline plays through once), it's expressible as multiple back-to-back blocks in V1. Native per-block looping waits.
  • The 150-character ceiling on char-split. This is the conservative number from research. A production benchmarking pass on real hardware (low-end Android, mid-tier iPhone) may push it higher or lower. Treat the current ceiling as a placeholder that gates G5 should be calibrated against empirically.

11. What This Replaces and What It Defers

This document replaces:

  • The animation section of ARCHITECTURE.md. The original sketch predates the preset library and the bounding-box concept.
  • The implicit fade_in / hold / fade_out preset set in the vertical slice. Those three names are kept (mapped to fade_in, no-op, fade_out) but every other preset in the slice is replaced by an explicit entry from Section 3.

This document does not specify:

  • The timeline authoring UI (V2).
  • The Figma sync path for animation specs (V2+).
  • The trafficking-sheet representation of animation choices (V1, but documented separately in TRAFFICKING_V1.md when that exists).

The next implementation step after this document is approved is to flesh out packages/types with the schema in Section 2.2/2.3 and to seed packages/layout-engine with unionBoundingBox(). Everything else (preset library implementation, QA gates, runtime mechanics) hangs off those two artifacts.