banner_studio/EXECUTIVE_SUMMARY.md
Simeon Schecter 988a47c797 Initial commit: Day 1 + Day 2 of the vertical slice
Day 1 (monorepo + Node layout engine):
- Turborepo + pnpm workspaces with apps/web, apps/render-worker, and
  packages for types, layout-engine, prompts, api-lib.
- @banner-studio/types: BannerSpec contract, every layer kind, ResolvedLayer,
  zod schemas mirroring each interface.
- @banner-studio/layout-engine: Dropflow WASM wrapper, text measurement,
  shrink-to-fit, push_siblings, resolveLayout. Snapshot-tested.

Day 2 (browser parity + AI pipeline):
- Layout engine ./browser subpath: same resolveLayout in the browser via
  Dropflow WASM build. Quarantined wasm-locator import (dropflow 0.5.1
  exports gap).
- Cross-group push_siblings bug fix: deltas now thread through group
  recursion via a shared accumulator; regression test added.
- DEMO_TEMPLATE_300x250 promoted to packages/layout-engine/src/templates/.
- @banner-studio/prompts: versioned extract + generate prompts with
  zod-defined tool schemas (claude-sonnet-4-6, forced tool-use).
- @banner-studio/api-lib: CSV feed loader, extract/generate/route-node/
  assemble agents, orchestrator returning fully-resolved BannerSpec.
  Generate agent retries on character-limit overflow.
- apps/web (Next.js 14 App Router): /api/generate route, /parity diff page,
  promise-singleton browser engine init.
- feeds/demo.csv with five hand-authored rows of varied length.
- SLICE_DEVIATIONS.md documents the five intentional gaps from
  ARCHITECTURE.md with V1 reversal paths.

Verified end-to-end: POST /api/generate against the live Claude API
returns three resolved BannerSpecs and two honestly-skipped rows
(overflow after two attempts). 26 unit + integration tests passing.
2026-05-15 10:25:21 -04:00

8.9 KiB

EXECUTIVE_SUMMARY.md

A briefing document for non-engineering stakeholders. Source material for slides, leadership decks, and client conversations.


The product in one paragraph

A banner production platform built on a clean split of labor: humans design the templates, AI scales them. A designer builds a master template with character constraints, layout rules, and approved asset crops once. Producers then feed in either a structured data feed or a brief, and AI generates the full size matrix — every artboard, every copy variant, every market. Reviewers edit inline, approve, and export production-ready HTML5 banners that meet IAB and ad server specs. Every human edit is logged, every version is preserved, and re-generation never destroys human work.


The problem this solves

Three failure modes plague existing creative automation platforms:

Fragility at the data boundary. Production environments break on minor feed schema drift. A misformatted field corrupts the generation pipeline. Recovery requires manual intervention. We solve this with strict ingestion validation and a dead letter queue — malformed records are isolated, the primary pipeline never sees corrupted input.

Destructive regeneration. When AI regenerates banners against a new feed, existing platforms overwrite human refinements. Hours of CD work disappear. We solve this with event-sourced version control and a separate override layer — human edits live independently of generated content and survive every regeneration, with explicit conflict resolution when AI and human have changed the same field.

Bloated, non-compliant output. Code generation tools inject proprietary libraries, fail to optimize assets, and ship files that violate ad server weight limits. We solve this with profile-driven export — each ad server's specific requirements (click tag pattern, weight limit, animation duration, backup PNG) are codified, and every export runs through automated QA gates before a zip is produced.


What the AI actually does

The system uses a four-agent pipeline, not a single monolithic prompt. Each stage has a typed interface; failures in one stage cannot contaminate others.

  1. Extract — parses a brief or feed record into a structured context object.
  2. Generate — produces copy variants against character constraints, brand voice, and locked-copy fields.
  3. Route — selects assets via deterministic metadata query against the variant group library. This step has no AI involvement at all; it is a database query.
  4. Assemble — constructs the final banner specification with timing, animation preset selection, and a reasoning log.

The reasoning log is visible in the review UI. When a CD asks "why did Claude pick this logo lockup," there is a literal answer.

The architecture is deliberate about what AI does not do: AI does not write to the database, AI does not creatively choose assets (asset selection is rule-based metadata matching), AI does not bypass character limits (programmatic validation runs after every generation), and AI never has the final say on output (every banner is human-reviewed before export).


What makes this defensible

The text group system. Variable-length copy in absolutely-positioned banner layouts is the central technical problem of dynamic creative optimization. Most existing tools either ignore it (Bannerify exports rigid Figma layouts that break on text changes), partially solve it for static formats (Abyssale), or solve it inside heavyweight enterprise platforms with months of onboarding (Celtra).

Our approach: a WebAssembly-compiled headless layout engine (Dropflow) runs identical text measurement in both the designer's canvas preview and the server-side render worker. When a headline grows from 30 to 65 characters, the engine reduces font size to fit, and if the font hits its minimum, expands the container and cascades the layout — moving subheadlines, CTAs, and disclaimers along defined push rules with hard ceilings. The reviewer sees the same pixels the final HTML5 banner will produce.

This is the differentiator. Everything else is solvable with known patterns.


V1 scope

Template Builder. Canvas-based design surface with magnetic snapping, layer types (text, smart asset, shape, group), text groups with cascade behavior, character limit simulator, linked multi-artboard sets, variant groups for logo lockups.

Asset Library. Upload, structured tagging (not freeform), pre-approved crops per template size, rights tracking with expiry warnings, variant grouping.

Brief and Feed Intake. Structured brief form with field locking. CSV and JSON feed ingestion with strict schema validation and dead letter queue for malformed records.

AI Generation. Four-agent pipeline producing typed banner specifications with full reasoning logs.

Review. Grid view across all artboards and copy variants, synchronized animation playback, inline editing with zero-latency re-render, AI reasoning panel, conflict resolution UI.

Version Control. Full snapshots on every AI generation, deltas on every human edit, override preservation across regeneration, rollback to any version.

Export. IAB Standard and CM360 profiles, GSAP animation, polite load implementation, backup PNG generation, auto-generated trafficking sheet, programmatic QA gates blocking non-compliant exports.

Roles. Designer, Producer, Creative Director, Trafficker.


What V1 does not include

By design, to ship a tight V1: no Figma plugin, no vision-model focal point detection, no video assets, no social formats, no multi-client workspaces, no comment system, no keyframe animation editor, and only two ad server profiles (IAB + CM360).

The data model has the hooks for Figma integration. Tables exist empty. Layer types have optional Figma fields. This means V2 Figma integration ships with zero migration cost when it's time.


The Figma question

Figma is a future bridge, not a dependency. The product is fully functional without Figma. V2 adds a plugin and webhook-driven sync for clients who already have hours of Figma design investment and would otherwise have to rebuild that work in our template builder.

The integration is architected around a clean ownership boundary. Figma owns typography, color, layout, and visual hierarchy. The banner tool owns character limits, push rules, AI field designations, animation presets, approved crops, and override history. Where both sides legitimately change the same property — like a font family — a human resolves the conflict in a dedicated UI. There is no merge syntax, no diff view, no technical language. Two versions, side by side, pick one.

This means clients can keep Figma as their source of truth for visual design while the banner tool owns the production layer Figma was never built to handle.


Technical stack at a glance

For audiences who want it:

  • Frontend: Next.js 14, react-konva for the design canvas, Zustand for state.
  • Layout: Dropflow WebAssembly — identical execution in browser preview and server render.
  • Backend: Node.js + TypeScript, tRPC for end-to-end type safety, PostgreSQL with JSONB for the banner specs.
  • AI: Claude API. Opus for orchestration, Sonnet for generation, Haiku for validation.
  • Rendering: Playwright in Docker with fonts baked in, BullMQ + Redis queue.
  • Animation: GSAP — the only library whitelisted by major ad servers and excluded from weight calculations.
  • Versioning: Event sourcing with jsondiffpatch deltas. Human overrides as a separate layer that survives regeneration.

Build plan

Fourteen phases, scoped to be completed sequentially. Phase 1 sets up the foundation: data layer, types, the headless layout engine. The canvas comes in Phase 4 — deliberately not first, because the canvas depends on the layout engine. The AI pipeline comes in Phase 8, after templates, assets, and data ingestion are stable. The render worker is last among major components (Phase 11) because it depends on everything else being stable.

The build sequence is deliberate about risk concentration. The two highest-risk technical areas — the layout engine and the version service's override preservation algorithm — are built early, isolated, and thoroughly tested before any UI depends on them. The lowest-risk area — the canvas — is built after the engine that drives it is proven.


What success looks like

V1 in production with one live client. A designer builds a template in an afternoon. A producer uploads a feed and gets a full size matrix back in minutes. A CD reviews the grid, edits two headlines, approves with confidence that those edits will not disappear when the feed updates next week. A trafficker exports CM360 zips that upload on the first try, with a trafficking sheet that maps every file to its placement.

The internal metric: cycle time from feed update to trafficked banners, end to end, on a single producer's workflow. We are looking for an order-of-magnitude reduction against current agency baseline.