Specifies the V1 animation system end-to-end. Authored after two Deep Research passes (preserved as ANIMATION_V1_RESEARCH.md and ANIMATION_V1_DESIGN_DECISIONS.md for provenance). ANIMATION_V1.md covers: - Hard constraints: Chrome Heavy Ad Intervention (4MB / 15s burst / 60s total CPU), composite-only animation, 150KB initial-load cap, GSAP via s0.2mdn.net CDN, free-tier only. - Custom JSON schema (not Lottie) — block-based timeline, absolute start times, preset references only, no inline keyframes. Designed for AI authoring and human-readable diffs. - 25-preset library across entrance / exit / emphasis / typography / mask / list categories. Each preset specifies start state, end state, default ease, default duration, and split/mask requirements. - 9-category easing matrix using GSAP stock eases; bounce, slow, rough, and circ excluded from the V1 surface. - Mask system: mask is a property on the masked layer (not a standalone layer). clip-path mandatory over interactive elements to prevent ghost-click failures. Konva ↔ HTML parity table. - Per-character animation: SplitType at render time, Dropflow at spec time, automated aria-label / aria-hidden contract, 150-node ceiling enforced by QA gate. - Animated bounding-box math: discrete sampling at 30 fps, unionBoundingBox() called from asset selection, render worker, and QA gate. Adds required_source_size to ResolvedLayer. - 12 QA gates (G1-G12) covering schema, performance, asset, accessibility, and parity. ARCHITECTURE.md updates: - Forward-notes section at the top pointing to ANIMATION_V1.md and RESOLVED_FEED.md, matching the existing Part 7 forward-note style. - Inline forward note in the Part 3 animation stack block. - Old content preserved as historical record. Decisions baked in (resolved during draft): - Loops are global (max 3), not per-block. Per-block loops invite nested-infinite-loop bugs in AI-generated specs. - Block triggers are time-anchored only. Event/interaction triggers wait for V2 rich media. - blur_in and shake_horizontal dropped from the 27-preset research list. Blur is a video pattern; shake reads as a rendering error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
34 KiB
ANIMATION_V1.md
Status: V1 design specification. Not implemented in the vertical slice. The slice ships three preset animations (
fade_in,hold,fade_out) as a forward-pointer to this system. See SLICE_DEVIATIONS.md for the deltas.This document supersedes the animation discussion in ARCHITECTURE.md.
The animation system is the product. Without it this is a static-image templating tool, and there is no shortage of those. Every architectural decision in this document is downstream of one premise: a motion-design-led creative team must be able to adopt this platform without feeling they have accepted reduced expressive range.
This document is prescriptive. It specifies the JSON schema, the preset library, the easing matrix, the mask system, the per-character animation contract, the bounding-box math, and the QA gates that enforce all of the above. It is meant to be implementable as written.
1. Goals and Constraints
Goals
- Designer-authored, AI-scaled. Designers author templates with named animation presets per layer. The AI orchestrator composes presets into a per-banner timeline using a brand-voice-aware rationale, and never invents new motion.
- Two runtimes, one contract. A single JSON schema drives both the Konva browser preview and the Playwright-headless final render. What the reviewer sees is what ships.
- Production-grade motion vocabulary. Translation, scale, rotation, opacity. Masks (geometric and image). Per-character text animation. All four transforms with industry-idiomatic easing.
- Composable, not procedural. A timeline is an ordered list of blocks. Each block applies one preset to one layer. The AI's job is to choose presets and order blocks — never to emit raw keyframes.
Hard constraints (these define the rails)
These come from the Round 1 research pass and are non-negotiable. Every primitive in this document is designed to fit inside them.
- Chrome Heavy Ad Intervention. An ad is unloaded if it exceeds:
- 4 MB cumulative network bandwidth
- 15 s of main-thread CPU within any rolling 30 s window
- 60 s of total main-thread CPU over the page lifetime
- Composite properties only. Animations use
transform(translate, scale, rotate) andopacityexclusively. Animatingwidth,height,top,left,margin, orletter-spacingforces layout recalculation and burns the CPU budget. The schema cannot express these. - Duration. 15 s default, 30 s hard ceiling (CM360). Loops global, 3× maximum, must fit inside the duration ceiling.
- Initial load weight. 150 KB zip cap (CM360 / IAB LEAN). Polite load may add up to 2.2 MB.
- CDN-hosted runtime. GSAP loads from
s0.2mdn.net/ads/studio/cached_libs/and does not count against the 150 KB cap. - GSAP free tier only. No Club GreenSock plugins. SplitText is replaced by the MIT-licensed SplitType library.
Out of scope for V1
- Keyframe authoring UI. Designers compose presets in template code; no visual timeline editor.
- Custom easing curves. The GSAP stock catalog is the V1 surface.
- Lottie import or export. We do not ingest After Effects work.
- Interaction triggers (hover, click-to-play state changes). Timelines are time-anchored only.
- Audio, video, expandable, or rich-media formats.
2. The Schema
2.1 Why a custom schema (and not Lottie)
Lottie is the industry standard for vector animation interchange. It is also the wrong fit for V1. The Round 2 research pass landed on a custom schema for three reasons, all load-bearing:
- AI authoring. Lottie's bezier-tangent representation (
i.x,o.yfloating-point arrays) forces the orchestrator to emit cubic-bezier math instead of semantic strings like"power2.out". This is the exact failure mode that hallucinations love. - Diff legibility. A version diff that reads
"ease": "power2.out", "preset": "scale_pop"is auditable. A diff that reads"i": {"x": [0.25], "y": [0.1]}is not. The version service depends on humans being able to read diffs. - Composite-only enforcement. Lottie intrinsically mixes composite
(
p,s,r,o) with non-composite (fc,sw, vector paths, mattes, expressions). Stripping the non-composite half at export time is a translation layer we would have to write, maintain, and defend against drift.
The schema described below is inspired by Lottie's declarative timeline shape, but every field maps 1:1 to a GSAP call. The orchestrator produces JSON; the runtime hands it to GSAP without transformation.
2.2 Top-level shape
{
"version": "1.0.0",
"meta": {
"duration": 15.0, // seconds, ≤ 30
"loops": 1, // 1 = play once, 2 = play+repeat, 3 = play+repeat+repeat. Max 3.
"fps_target": 60
},
"blocks": [
// one entry per animation event, ordered by `start`
]
}
blocks is ordered, time-anchored, and never overlaps in semantics: each
block describes one preset applied to one layer over one interval. Multiple
blocks may run simultaneously (different layers animating in parallel is
the common case), but the timeline reads top-to-bottom in start order.
2.3 Block shape
{
"id": "block_hero_in", // stable id for diffs and overrides
"layer_id": "hero_image", // references BannerSpec layer
"preset": "scale_up_fade", // see Section 3
"start": 0.0, // seconds, relative to timeline 0
"duration": 0.8, // seconds, overrides preset default
"ease": "power2.out", // overrides preset default. Optional.
"stagger": null, // see Section 5 for per-character blocks
"mask_ref": null // see Section 4 for animated masks
}
Three rules govern the block:
presetis required. No anonymous keyframes. If a designer needs a motion that isn't in the preset library, the answer is to add it to the library, not to inline keyframes.durationandeaseare optional overrides. The preset's defaults are the right answer 95% of the time; the override is for the 5% where a designer has a specific reason.startis absolute, not relative. Block N+1 does not implicitly begin when block N ends. This sounds verbose, but it makes the diff trivial: editing block 3's duration doesn't ripple into block 4's start time.
2.4 Worked example
A 300×250 banner with hero image, headline, and CTA. Hero scales in, headline reveals per-character, CTA pops:
{
"version": "1.0.0",
"meta": { "duration": 4.0, "loops": 1, "fps_target": 60 },
"blocks": [
{
"id": "hero_in",
"layer_id": "hero",
"preset": "scale_up_fade",
"start": 0.0,
"duration": 0.8
},
{
"id": "headline_in",
"layer_id": "headline",
"preset": "fade_up_chars",
"start": 0.5,
"duration": 0.6,
"stagger": 0.02
},
{
"id": "cta_in",
"layer_id": "cta",
"preset": "scale_pop",
"start": 1.4,
"duration": 0.6
},
{
"id": "cta_pulse",
"layer_id": "cta",
"preset": "pulse_gentle",
"start": 2.2,
"duration": 1.5
}
]
}
A reviewer reading this diff knows exactly what changed. The orchestrator
emitting this JSON has four lookups and three integer-ish numbers to
produce per block. The runtime translates each block to one gsap.fromTo()
call against the preset's defined start and end states.
2.5 Loops
meta.loops is a global counter. loops: 3 replays the entire timeline
three times; the cumulative duration must fit inside the IAB / CM360
ceiling (3 × meta.duration ≤ 30 s). Per-block loops are not supported
in V1 — they invite nested-infinite-loop bugs that an AI generator will
ship with surprising regularity.
For "always-on" effects that read as loops (the gentle pulse on a CTA, the float on a product image), use the preset's built-in yoyo (Section 3). The preset, not the meta loop counter, owns that behavior.
2.6 Where the schema lives in the BannerSpec
The animation timeline attaches to each Artboard:
interface Artboard {
artboard_id: string;
width: number;
height: number;
layers: ResolvedLayer[];
animation: AnimationTimeline; // ← this document
}
Each artboard has its own timeline because animation choices vary per size
(a 728×90 leaderboard cannot afford the same staggered entrance as a
300×600 half-page). The orchestrator emits one timeline per size as part
of its per-size copy decision. Rationale lands in the existing
ai_reasoning.animation_rationale field shared across sizes.
3. The Preset Library
25 named presets, organized by category. Every preset compiles to one or
more GSAP fromTo calls against composite properties only. Every preset
defines a start state, an end state, a default ease, and a default duration.
Designers reference presets by name; the AI orchestrator selects by name.
The selection criteria for inclusion:
- Maps to composite transforms or opacity only.
- Renders identically in Konva and in headless Chromium.
- Is a pattern that appears in production-quality display advertising, not just motion-design tutorial reels.
- Does not require a Club GreenSock plugin.
3.1 Entrance presets
| Preset | Start | End | Ease | Duration | Use |
|---|---|---|---|---|---|
fade_in |
opacity 0 | opacity 1 | power1.inOut |
0.8 s | Backgrounds, disclaimers, logos. |
slide_in_left |
x −100%, op 0 | x 0, op 1 | power2.out |
0.8 s | Hero imagery, body copy. |
slide_in_right |
x 100%, op 0 | x 0, op 1 | power2.out |
0.8 s | Side-panel reveals. |
slide_in_up |
y 100%, op 0 | y 0, op 1 | power2.out |
0.8 s | CTAs, bottom-anchored copy. |
slide_in_down |
y −100%, op 0 | y 0, op 1 | power2.out |
0.8 s | Top-anchored headers, badges. |
scale_up_fade |
scale 0.8, op 0 | scale 1, op 1 | power2.out |
0.8 s | Hero products, centered logos. |
scale_down_fade |
scale 1.2, op 0 | scale 1, op 1 | power2.out |
1.0 s | Lifestyle backgrounds settling in. |
scale_pop |
scale 0.5, op 0 | scale 1, op 1 | back.out(1.7) |
0.6 s | Badges, CTA buttons, price circles. |
3.2 Exit presets
| Preset | Start | End | Ease | Duration | Use |
|---|---|---|---|---|---|
fade_out |
opacity 1 | opacity 0 | power1.inOut |
0.6 s | Scene transitions, legacy copy. |
slide_out_left |
x 0, op 1 | x −100%, op 0 | power2.in |
0.6 s | Sweep clearing the frame. |
slide_out_right |
x 0, op 1 | x 100%, op 0 | power2.in |
0.6 s | Sweep clearing the frame. |
3.3 Emphasis presets
These are yoyo presets — they animate from base state to peak state and
return. The block's duration field is the full there-and-back time.
| Preset | State change | Ease | Duration | Use |
|---|---|---|---|---|
pulse_gentle |
scale 1 ↔ 1.05 | sine.inOut |
1.5 s | Sustained CTA attention. |
pulse_strong |
scale 1 ↔ 1.15 | power2.inOut |
0.8 s | Urgent promotional badges. |
float_vertical |
y 0 ↔ −10 px | sine.inOut |
2.0 s | Floating product imagery. |
3.4 Typography presets
All typography presets require text splitting. The stagger field on the
block controls the interval between consecutive characters/words/lines.
Splitting happens at runtime via SplitType — see Section 5.
| Preset | Start | End | Ease | Stagger | Split | Use |
|---|---|---|---|---|---|---|
fade_up_chars |
y 20, op 0 | y 0, op 1 | power2.out |
0.02 s | chars | Premium headlines. |
fade_up_words |
y 20, op 0 | y 0, op 1 | power2.out |
0.04 s | words | Subheads, longer copy. |
typewriter |
op 0 | op 1 | steps(1) |
0.04 s | chars | Tech, narrative, informative. |
scramble_chars |
text random | text final | linear |
0.03 s | chars | Cyber, high-tech promos. |
scale_pop_chars |
scale 0, op 0 | scale 1, op 1 | back.out(1.5) |
0.04 s | chars | Bold, energetic typography. |
3.5 Mask presets
Masks animate the visibility of a layer through a moving clip shape. See Section 4 for the full mask system; these presets reference it.
| Preset | Mask shape | Animates | Ease | Duration | Use |
|---|---|---|---|---|---|
mask_wipe_right |
rectangle | clip 0% → 100% width | power2.inOut |
1.0 s | Revealing new background. |
mask_wipe_up |
rectangle | clip 0% → 100% height | power2.inOut |
1.0 s | Rising imagery reveal. |
mask_circle_out |
circle | r 0 → max | power3.inOut |
1.2 s | Cinematic scene transitions. |
mask_text_reveal |
layer bbox | y 100% → 0% | power3.out |
0.8 s | Text rising from invisible floor. |
3.6 List/stagger presets
For multi-element layers (carousels, icon rows, bulleted lists). Stagger applies to the layer's children, not to characters.
| Preset | Start | End | Ease | Stagger | Use |
|---|---|---|---|---|---|
stagger_slide_in |
x −50 px, op 0 | x 0, op 1 | power2.out |
0.08 s | Bullets, multi-product rows. |
stagger_pop_up |
scale 0.8, op 0 | scale 1, op 1 | back.out(1.5) |
0.08 s | Social icons, logo lockups. |
3.7 Explicit exclusions
Presets in this list are common in tutorial-grade libraries but are not
shipped in V1. The rationale is recorded so the next person who asks
"why not jello?" has an answer.
jello,rubberBand,wobble,tada,headShake,swing— multi-axis skew/scale combos that read as amateur in premium display. Animate.css ships them; production-quality ads don't.shake_horizontaland other shake patterns — would require GSAP'sroughease, which is excluded in Section 4. Shake patterns read as rendering errors more often than as deliberate emphasis.blur_in— animatesfilter: blur(Npx). Filter is a composite-eligible property in Chromium, but it forces repaint, not just compositor work, and burns the CPU budget faster than transforms. The use case (premium reveals) is better served byscale_down_fadeandmask_circle_out. Blur belongs in video, not display.- Drop-shadow and box-shadow animation — same repaint cost as blur. If a layer needs a dimensional shadow that "appears," bake the shadow into a transparent asset and animate that asset's opacity.
- All Club GreenSock plugins —
MorphSVG,SplitText,Physics2D,DrawSVG,MotionPath,CustomEase. V1 stays on the free GSAP tier with SplitType as the SplitText replacement.
4. Easing
Easing is the difference between mechanical motion and motion that reads as designed. The V1 surface is the GSAP stock ease catalog, restricted to the curves that actually appear in production display advertising.
4.1 Principles
- Entrance eases out. Elements arriving on stage decelerate so the eye can catch them.
- Exit eases in. Elements leaving accelerate away to clear the visual field.
- Emphasis uses
sine.inOutorpower*.inOut. Symmetric eases for symmetric (yoyo) motion. - Linear is reserved for typewriters, progress bars, and continuous panning. Almost never the right answer for entrances or exits.
4.2 Easing matrix
| Category | Default | Alt 1 | Alt 2 |
|---|---|---|---|
| Entrance (translation) | power2.out |
power3.out |
expo.out |
| Entrance (scale) | power2.out |
back.out(1.5) |
elastic.out(1, 0.5) |
| Exit (translation) | power2.in |
power3.in |
expo.in |
| Exit (scale) | power2.in |
back.in(1.2) |
power1.in |
| Emphasis (pulse, float) | sine.inOut |
power1.inOut |
power2.inOut |
| Slow reveal (fade) | power1.inOut |
linear |
sine.out |
| Snappy state change | expo.inOut |
power4.inOut |
steps(1) |
| Per-character stagger | power2.out |
back.out(1.2) |
linear |
| Mask reveal | power3.inOut |
power2.inOut |
expo.inOut |
4.3 Excluded eases
The GSAP catalog includes curves that are rarely or never used in production display. They are hidden from the V1 surface to keep the orchestrator's decision space small.
bounce.in/bounce.out— gravity-simulation cartoon physics. Reads as children's brand or casual gaming.backcovers the use case with more taste.slow— cinematic speed-ramping. Lingers in the middle of the transition. Wrong for 15-second display.rough— randomized jitter. Reads as a rendering error.circ— mathematically rigid circular arc. Thepower*family feels more physical.
The accessible palette is: power1–power4, expo, sine, back,
elastic, linear, steps. Each with .in, .out, .inOut variants
where applicable. ~24 distinct eases — enough range for production motion
design, small enough that the AI doesn't get decision fatigue.
5. Masks
Masks are first-class in V1. They unlock the "reveal" patterns that read as premium motion (logo-shaped reveals, circular wipes, text rising from an invisible floor) and are the difference between "ad" and "ad you remember."
5.1 The mask as a layer property
A mask is not a standalone timeline layer. It is a property on the masked layer. Co-locating the mask with the layer it masks keeps z-index relationships implicit and keeps the diff readable.
{
"layer_id": "hero_image",
"mask": {
"type": "clip-path", // "clip-path" | "image"
"geometry": "circle", // for clip-path: "rect" | "circle" | "polygon"
"asset_id": null, // for type: "image", references the mask asset
"animation": {
"preset": "mask_circle_out",
"duration": 1.2,
"ease": "power3.inOut"
}
}
}
Blocks reference an animated mask via mask_ref pointing to the
masked layer's id. The block describes the layer's animation; the mask's
animation is internal to the mask object.
5.2 clip-path vs. mask-image: hit-testing decides
CSS gives us two mask primitives. They are not interchangeable.
clip-path— actually clips the element's geometry. Pointer events outside the clip do not fire.mask-image— modifies the alpha channel only. The element's bounding box still receives pointer events, including in the "invisible" region.
For animated reveals over interactive elements — which is the most common
case, because CTAs are interactive and CTAs are the most-revealed thing —
clip-path is mandatory. mask-image lets users click an "invisible"
button that's mid-reveal, fires the click handler, and ships the wrong
analytics. This is a real bug we will not have because the schema doesn't
let us write it: mask.type: "clip-path" is the default and any geometric
mask animation uses it.
mask-image is reserved for non-interactive layers (background imagery,
decorative elements) where alpha-channel masking is the only way to
achieve the effect (e.g., a brushstroke wipe, a textured reveal).
5.3 Konva ↔ HTML parity
Each mask type renders in both runtimes:
| Mask type | Konva | HTML |
|---|---|---|
clip-path rect |
Group.clipFunc with a rect path |
CSS clip-path: inset(...) |
clip-path circle |
Group.clipFunc with an arc |
CSS clip-path: circle(...) |
clip-path polygon |
Group.clipFunc with a polygon |
CSS clip-path: polygon(...) |
image |
globalCompositeOperation: 'source-in' on cached group |
CSS mask-image: url(...) |
The known parity risk is mask-image SVG scaling. Konva masks are pixel-
exact; CSS mask-size and mask-position can drift if the underlying
layer's box model changes between preview and render. The parity test
suite (Section 9) renders each mask preset in both runtimes and diffs the
output PNG. Drift > 2px on any sampled frame fails the test.
5.4 Animated mask geometry
The four mask presets in Section 3.5 cover the patterns that show up in production display:
- Linear wipe (
mask_wipe_right,mask_wipe_up): rectangular clip-path animating one edge from 0% to 100%. - Circular reveal (
mask_circle_out): clip-path circle with radius animating from 0 to a value large enough to cover the layer's bounding box. - Text rising from a floor (
mask_text_reveal): clip-path rect fixed at the layer's bbox; text inside animatesy: 100% → 0%. The mask itself doesn't animate — the masked content does.
Mask geometry that requires designer authorship (logo-shaped masks,
custom polygon shapes) is supported by mask.geometry: "polygon" with
a path string supplied by the template, but is not part of any preset.
The orchestrator does not invent mask shapes.
6. Per-Character Text Animation
Per-character animation is the difference between "templated copy" and "designed message." It is also the single most performance-sensitive primitive in the V1 system. This section is precise because it has to be.
6.1 The split happens at runtime, in the DOM, via SplitType
For the headless Chromium render (the one that ships), text splitting happens in the DOM at runtime using SplitType. The reasons are non- negotiable:
- Accessibility. Screen readers must read the headline as one coherent string, not as 47 phonetic letters. SplitType supports the ARIA pattern (see 6.2). Canvas-rendered text destroys accessibility outright.
- DOM-based QA. Ad-server review bots parse the DOM to verify text content. Canvas text is invisible to them and triggers rejection.
- Text selection and SEO. Native browser text selection works on DOM text. Canvas text does not select.
For the Konva preview (the one designers and reviewers see), per-character
positioning is computed at spec-resolution time from Dropflow's glyph
positions. Konva renders each character as a KText node positioned to
match Dropflow's output. This preserves the preview-render parity
contract — the preview is positionally identical to what SplitType
produces at runtime, sub-pixel kerning and ligatures aside.
The split-of-labor:
- Dropflow (spec time, both runtimes): computes glyph positions used to drive Konva preview rendering and to validate the assumption that SplitType will produce the same layout.
- SplitType (render time, Chromium only): does the actual DOM split for the GSAP-driven animation.
6.2 The accessibility contract
Splitting text into per-character spans destroys its semantic continuity. The compilation engine must apply the standard ARIA mitigation, and the schema does not let the designer or the orchestrator forget:
For every layer where the block's preset is in Section 3.4 (typography
presets requiring char-split), the export pipeline must:
- Set
aria-label="<original text content>"on the parent layer element. - Set
aria-hidden="true"on every SplitType-generated child span.
This is automated, not designer-authored. A QA gate (Section 9) verifies that every char-split layer in the exported HTML has the parent label and the hidden children before the export passes.
6.3 The performance ceiling
Per-character animation creates one DOM node per character. Every node becomes its own GPU composite layer when animated via transform. Mobile browsers have a hard ceiling on simultaneous composite layers, and breaching it causes jank visible in the final render.
The V1 ceiling: 150 simultaneously animated character nodes.
This is not a recommendation; it's a QA gate. If a banner's exported timeline has more than 150 char-split nodes animating in a single 30-second window, the export fails.
Implications for the orchestrator:
- A 60-char headline with
fade_up_charsconsumes 60 of the budget. - A second char-split element (a 40-char subhead with
fade_up_chars) brings the total to 100. - A third char-split element is risky. The orchestrator should prefer
fade_up_words(5–10 nodes) for subheads and reserve char-splits for headlines.
Body copy and legal disclaimers must never use char-split. The orchestrator
selects fade_up_words or fade_up_lines for any layer whose resolved
text exceeds 80 characters, regardless of designer preset choice.
6.4 Concrete per-character parameters
The defaults below are baked into the preset library and should not be overridden lightly. They reflect production-grade motion design, not arbitrary timings:
| Pattern | Stagger | Ease | Distance / scale | Max chars |
|---|---|---|---|---|
| Typewriter | 0.03–0.05 s | steps(1) |
n/a | 80 |
| Fade-up | 0.02 s | power2.out |
15–20 px | 80 |
| Scale-pop | 0.04 s | back.out(1.5) |
scale 0 → 1 | 50 |
| Scramble | 0.03 s | linear |
n/a (text mutation) | 40 |
7. The Animated Bounding Box
The most consequential and least obvious part of the animation system is the asset-sizing problem.
7.1 The problem
A hero image animating with scale_down_fade starts at scale: 1.2 and
ends at scale: 1.0. The source image must be sized for the largest
frame — 1.2 × layer_width — not the final frame. Ship the source at the
final-frame size and the first frame is upscaled, blurry, and visibly
broken.
The same logic applies to:
- Translation (
slide_in_*): source must cover the layer's position at every keyframe, including the offscreen start. - Rotation: a rotated rectangle's axis-aligned bounding box grows with the angle. A 600×600 image rotated 15° needs a ~775×775 source.
- Compound transforms (scale + rotate + translate, e.g. an entrance
that combines
scale_up_fadewith a small rotation): the bounding box is the union of every transformed corner at every sampled time.
7.2 The math
For a layer with rectangular bounding box, given a transform with
translation (tx, ty), scale (sx, sy), and rotation θ at time t:
For each corner (cx, cy) of the base rectangle:
x'(t) = tx(t) + sx(t) · (cx · cos(θ(t)) − cy · sin(θ(t)))
y'(t) = ty(t) + sy(t) · (cx · sin(θ(t)) + cy · cos(θ(t)))
The layer's axis-aligned bounding box at time t is the min/max of
x'/y' over the four corners. The animation's bounding box is the
union of every per-t bounding box.
Analytical solution of the extrema is computationally expensive and brittle across compound transforms. V1 uses discrete sampling: 30 samples per second of animated duration, taking the union of all sampled bounding boxes. At 30 fps over a 1.0-second animation, that's 30 4-corner evaluations — negligible cost, and the union over-approximates the true bbox by at most one sample's worth of motion, which is well within tolerance for source-asset sizing.
7.3 unionBoundingBox() — one function, many call sites
A single function in packages/layout-engine:
function unionBoundingBox(
layer: ResolvedLayer,
blocks: AnimationBlock[],
fps: number = 30
): BoundingBox;
It is called from:
- Smart asset selection (
packages/api-lib/asset-selection): picks source crop sized for the union bbox, not the layer rect. - Render worker (
packages/render-worker): fails fast if the source asset's dimensions are smaller than the union bbox. - QA gate (
packages/qa-gates): flags upscaled frames before export, with the field path of the offending layer.
7.4 What changes in the BannerSpec
A resolved layer gains a required_source_size field:
interface ResolvedLayer {
// ... existing fields
required_source_size?: { width: number; height: number };
}
The orchestrator does not populate this field. The layout engine computes it after resolving the layer's animation blocks. The render worker reads it and fails the render if the actual source is smaller. The asset selector reads it when requesting a crop from the asset service.
8. Library and Runtime Mechanics
8.1 GSAP via the Google CDN
GSAP 3 loads from https://s0.2mdn.net/ads/studio/cached_libs/gsap_3.9.1_min.js.
This URL is whitelisted by CM360, which means GSAP's bundle does not count
against the 150 KB initial-load cap. The HTML export injects exactly this
URL into the document head. Local-relative paths or other CDNs (including
unpkg, jsDelivr) cause CM360 to flag the library as a 4th-party call and
count its weight, which reliably fails the 150 KB gate.
8.2 SplitType in place of SplitText
The MIT-licensed SplitType library (~2 KB minified) replaces GSAP's Club-only SplitText. SplitType is bundled into the exported HTML, not loaded from a CDN, because no major ad CDN whitelists it. It is small enough to fit within the 150 KB budget alongside the rest of the banner. The export pipeline applies the aria-label/aria-hidden contract from 6.2 in the same pass that calls SplitType.
8.3 No localStorage, no sessionStorage
CM360 rejects creatives that reference browser storage APIs. The export
pipeline's static-analysis pass scans the compiled JS bundle for any
reference to localStorage or sessionStorage and fails the export
if found. None of our code uses these APIs; this gate exists to catch
a third-party dependency drift.
8.4 will-change: transform on animated layers
The export pipeline annotates every layer that has at least one
animation block with will-change: transform (and opacity if
applicable). This promotes the layer to its own GPU compositor layer
ahead of time, preventing first-frame jank from late-binding the layer
when GSAP starts animating it.
8.5 prefers-reduced-motion
The exported HTML includes a @media (prefers-reduced-motion: reduce)
block that snaps every animation to its end state instantly. This serves
two purposes:
- Accessibility: users who have requested reduced motion see the final composition immediately.
- Headless PNG capture: Playwright launches with
--force-prefers-reduced-motion, which deterministically forces the banner to its final-frame state. Capturing the static backup PNG becomes a single screenshot with nowaitForTimeout, which removes the largest source of flakiness in the render pipeline.
9. QA Gates
Every gate below runs at export time. An export that fails any gate is held — the spec is not corrupt, but it's not shippable, and the review UI flags it for human resolution.
9.1 Schema gates
- G1 — Composite-only. Every block's preset is in the V1 library. Inline keyframes are not expressible in the schema, so this gate is enforced by the schema's TypeScript types, not by a runtime check.
- G2 — Duration ceiling.
meta.duration × meta.loops ≤ 30. - G3 — Block ordering. Blocks are sorted by
start. No block'sstart + durationexceedsmeta.duration. - G4 — Loop count.
meta.loopsis in{1, 2, 3}.
9.2 Performance gates
- G5 — Char-split ceiling. Sum of animated char-split nodes across all simultaneously-running blocks in any 30-second window ≤ 150.
- G6 — Weight budget. Final zipped HTML ≤ 150 KB initial load.
- G7 — No storage APIs. Static analysis finds no reference to
localStorageorsessionStorage. - G8 — GSAP via CDN. The exported HTML loads GSAP from
s0.2mdn.net, not from a local path.
9.3 Asset gates
- G9 — Source size. Every layer's actual source asset dimensions are
≥ the
required_source_sizecomputed byunionBoundingBox. Fail with the layer id and the deficit. - G10 — Crossorigin. Every image element has
crossorigin="anonymous"(required for Konva canvas reads in the preview).
9.4 Accessibility gates
- G11 — Aria contract. Every layer with a char-split preset has
aria-labelset to the original text on the parent andaria-hidden="true"on every child span.
9.5 Parity gates
- G12 — Konva ↔ Playwright pixel diff. For each preset, the parity test suite renders a reference banner in both runtimes and compares PNG output. Per-frame pixel diff > 2 px on any sampled keyframe fails the build (not the export — this is a development-time gate, not a per-banner gate).
10. Open Questions
These were debated during the research passes and design and are deferred — either to a later V1 iteration or to V2. Tracking them here so they don't get lost.
- Interaction triggers. V1 timelines are time-anchored. The Bannerflow pattern of decoupling animation from text content supports localization but not user interaction. When V2 introduces hover, click-to-expand, or in-banner video, the schema gains an event-trigger model that block references can hook into.
- Custom eases. GSAP's
CustomEaseis Club-only. If a designer ever needs a custom curve, V2 either licenses Club GreenSock or implements cubic-bezier eases via the free GSAPpower.in/outfamily with bespoke control points. The schema field is alreadystring; this is purely an authoring decision, not a schema change. - Mask shape authoring. V1 supports rect, circle, and designer-supplied
polygons. Logo-shaped masks (SVG paths) work as
mask.geometry: "polygon", but there is no authoring UI — designers paste path strings into the template. V2 ships a small editor for this. - Per-block loops. Not in V1. If a use case for per-block loops emerges (a single icon that pulses 5× while the rest of the timeline plays through once), it's expressible as multiple back-to-back blocks in V1. Native per-block looping waits.
- The 150-character ceiling on char-split. This is the conservative number from research. A production benchmarking pass on real hardware (low-end Android, mid-tier iPhone) may push it higher or lower. Treat the current ceiling as a placeholder that gates G5 should be calibrated against empirically.
11. What This Replaces and What It Defers
This document replaces:
- The animation section of
ARCHITECTURE.md. The original sketch predates the preset library and the bounding-box concept. - The implicit
fade_in / hold / fade_outpreset set in the vertical slice. Those three names are kept (mapped tofade_in, no-op,fade_out) but every other preset in the slice is replaced by an explicit entry from Section 3.
This document does not specify:
- The timeline authoring UI (V2).
- The Figma sync path for animation specs (V2+).
- The trafficking-sheet representation of animation choices (V1, but
documented separately in
TRAFFICKING_V1.mdwhen that exists).
The next implementation step after this document is approved is to
flesh out packages/types with the schema in Section 2.2/2.3 and to
seed packages/layout-engine with unionBoundingBox(). Everything
else (preset library implementation, QA gates, runtime mechanics)
hangs off those two artifacts.