obsidian/wiki/web-agency/kling-multi-elements-api.md at d4504b68d1bcaa88d00a3ecb63ab4be098f87f3d

Vadym/obsidian

Fork 0

Vadym Samoilenko 562e9aed2c vault backup: 2026-04-26 21:17:25

2026-04-26 21:17:25 +01:00

6.2 KiB

Raw Blame History

title

aliases

Overview

The Multi-Elements API lets you add, swap, or remove specific objects within an existing video using AI segmentation. You mark elements by clicking coordinates, then submit an editing task.

Base URL: https://api-singapore.klingai.com

Supported model: kling-v1-6 only.

Workflow (5 Steps)

Init — POST /v1/videos/multi-elements/init-selection — parse video, get session_id
Add selection — POST /v1/videos/multi-elements/add-selection — click a point on a frame to mark an object
Preview — POST /v1/videos/multi-elements/preview-selection — see masked overlay before committing
Create task — POST /v1/videos/multi-elements — choose edit_mode + prompt + optional reference images
Query — GET /v1/videos/multi-elements/{task_id} — poll until succeed

Optional cleanup steps: delete-selection (remove specific points) or clear-selection (wipe all).

Step 1 — Init Selection

POST /v1/videos/multi-elements/init-selection

Field	Type	Notes
`video_id`	string (optional)	Kling-generated video, last 30 days only
`video_url`	string (optional)	Public `.mp4` / `.mov` URL

Video constraints:

Duration: 2–5 s or 7–10 s
Resolution: 720–2160 px (both dimensions)
Frame rate: 24, 30, or 60 fps

Response fields:

session_id — valid for 24 hours, used in all subsequent calls
fps, original_duration, total_frame — required when creating the task
normalized_video — URL of the processed video

Step 2 — Add Selection

POST /v1/videos/multi-elements/add-selection

Field	Type	Notes
`session_id`	string	Required
`frame_index`	int	Which frame to mark (max 10 frames total)
`points`	array	`{x, y}` in `[0,1]` range (top-left = 0,0); up to 10 points per frame

Response: returns rle_mask_list — RLE-encoded segmentation masks + PNG base64 per object.

Decoding the RLE Mask (TypeScript)

export type RLEObject = { size: [h: number, w: number]; counts: string }

export function decode(rleObj: RLEObject): Uint8Array {
  // Returns flat Uint8Array (row-major): 1 = masked pixel, 0 = background
  // ... see full implementation in source article
}

Rendering the Mask Overlay (Canvas)

function drawMask(rleMask: string, height: number, width: number) {
  const decodeData = decode({ counts: rleMask, size: [height, width] })
  // Paint pixels with RGBA (116, 255, 82, 163) where decodeData[y*w+x] === 1
}

Step 3 — Delete / Clear Selection (optional)

POST /v1/videos/multi-elements/delete-selection

Same body as add-selection; points must exactly match coordinates used when adding.

POST /v1/videos/multi-elements/clear-selection

Only requires session_id — wipes all marked areas.

Step 4 — Preview Selection (optional)

POST /v1/videos/multi-elements/preview-selection

Returns: video (masked overlay), video_cover, tracking_output (per-frame mask).

Step 5 — Create Task

POST /v1/videos/multi-elements

Field	Type	Notes
`model_name`	enum	`kling-v1-6`
`session_id`	string	Required
`edit_mode`	enum	`addition` / `swap` / `removal`
`image_list`	array	Required for add/swap; omit for removal
`prompt`	string	Use `<<<video_1>>>` / `<<<image_1>>>` references; max 2500 chars
`negative_prompt`	string	Optional, max 2500 chars
`mode`	enum	`std` (cost-effective) / `pro` (high-quality)
`duration`	enum	`5` or `10` seconds
`callback_url`	string	Optional webhook
`external_task_id`	string	Optional custom ID

edit_mode Details

Mode	image_list	Prompt template
`addition`	1–2 images (pre-cropped)	`Using the context of <<<video_1>>>, seamlessly add [x] from <<<image_1>>>`
`swap`	1 image only	`swap [x] from <<<image_1>>> for [x] from <<<video_1>>>`
`removal`	not required	`Delete [x] from <<<video_1>>>`

Image requirements (for add/swap):

Formats: .jpg / .jpeg / .png
Max 10 MB; min 300 px; aspect ratio 1:2.5–2.5:1
Base64: raw string only — no data:image/png;base64, prefix

Step 6 — Query Task

GET /v1/videos/multi-elements/{task_id}
GET /v1/videos/multi-elements?pageNum=1&pageSize=30

Task statuses: submitted → processing → succeed / failed

Result video URL expires after 30 days — download and store promptly.

Key Takeaways

Session-based workflow: init once → mark objects → edit → query. Session lives 24 h.
Three edit modes: add (needs 1–2 ref images), swap (1 ref image), remove (no images needed).
Object selection uses normalized [0,1] click coordinates on specific frame indices — up to 10 frames, 10 points each.
Response masks are RLE-encoded; decode to Uint8Array for canvas rendering.
Only kling-v1-6 supports multi-elements; duration must be 5 s or 10 s matching source video length bracket.
Generated videos auto-delete after 30 days; use callback_url for async workflows.
Base64 images must be raw (no data-URI prefix).

wiki/web-agency/kling-text-to-video-api — generate new videos from prompts
wiki/web-agency/kling-image-to-video-api — animate still images
wiki/web-agency/kling-multi-image-to-video-api — composite 2–4 reference images
wiki/web-agency/kling-motion-control-api — pose/motion transfer
wiki/web-agency/claude-code-nanobanana-website-workflow — end-to-end agency workflow using Kling

Sources

Raw: raw/Kling AI Next-Gen AI Video & AI Image Generator 5.md
Origin: Kling AI API docs — /v1/videos/multi-elements

6.2 KiB Raw Blame History Unescape Escape