obsidian/wiki/web-agency/kling-multi-elements-api.md
2026-04-26 21:17:25 +01:00

173 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Kling AI Multi-Elements API Reference"
aliases: [kling-multi-elements, kling-video-element-editing, kling-add-swap-remove]
tags: [kling, api, video-editing, multi-elements, ai-video]
sources: ["raw/Kling AI Next-Gen AI Video & AI Image Generator 5.md"]
created: 2026-04-26
updated: 2026-04-26
---
## Overview
The Multi-Elements API lets you **add**, **swap**, or **remove** specific objects within an existing video using AI segmentation. You mark elements by clicking coordinates, then submit an editing task.
Base URL: `https://api-singapore.klingai.com`
Supported model: `kling-v1-6` only.
---
## Workflow (5 Steps)
1. **Init**`POST /v1/videos/multi-elements/init-selection` — parse video, get `session_id`
2. **Add selection**`POST /v1/videos/multi-elements/add-selection` — click a point on a frame to mark an object
3. **Preview**`POST /v1/videos/multi-elements/preview-selection` — see masked overlay before committing
4. **Create task**`POST /v1/videos/multi-elements` — choose `edit_mode` + prompt + optional reference images
5. **Query**`GET /v1/videos/multi-elements/{task_id}` — poll until `succeed`
Optional cleanup steps: `delete-selection` (remove specific points) or `clear-selection` (wipe all).
---
## Step 1 — Init Selection
`POST /v1/videos/multi-elements/init-selection`
| Field | Type | Notes |
|-------|------|-------|
| `video_id` | string (optional) | Kling-generated video, last 30 days only |
| `video_url` | string (optional) | Public `.mp4` / `.mov` URL |
**Video constraints:**
- Duration: 25 s or 710 s
- Resolution: 7202160 px (both dimensions)
- Frame rate: 24, 30, or 60 fps
**Response fields:**
- `session_id` — valid for **24 hours**, used in all subsequent calls
- `fps`, `original_duration`, `total_frame` — required when creating the task
- `normalized_video` — URL of the processed video
---
## Step 2 — Add Selection
`POST /v1/videos/multi-elements/add-selection`
| Field | Type | Notes |
|-------|------|-------|
| `session_id` | string | Required |
| `frame_index` | int | Which frame to mark (max 10 frames total) |
| `points` | array | `{x, y}` in `[0,1]` range (top-left = 0,0); up to 10 points per frame |
**Response:** returns `rle_mask_list` — RLE-encoded segmentation masks + PNG base64 per object.
### Decoding the RLE Mask (TypeScript)
```typescript
export type RLEObject = { size: [h: number, w: number]; counts: string }
export function decode(rleObj: RLEObject): Uint8Array {
// Returns flat Uint8Array (row-major): 1 = masked pixel, 0 = background
// ... see full implementation in source article
}
```
### Rendering the Mask Overlay (Canvas)
```typescript
function drawMask(rleMask: string, height: number, width: number) {
const decodeData = decode({ counts: rleMask, size: [height, width] })
// Paint pixels with RGBA (116, 255, 82, 163) where decodeData[y*w+x] === 1
}
```
---
## Step 3 — Delete / Clear Selection (optional)
`POST /v1/videos/multi-elements/delete-selection`
- Same body as add-selection; `points` must **exactly match** coordinates used when adding.
`POST /v1/videos/multi-elements/clear-selection`
- Only requires `session_id` — wipes all marked areas.
---
## Step 4 — Preview Selection (optional)
`POST /v1/videos/multi-elements/preview-selection`
Returns: `video` (masked overlay), `video_cover`, `tracking_output` (per-frame mask).
---
## Step 5 — Create Task
`POST /v1/videos/multi-elements`
| Field | Type | Notes |
|-------|------|-------|
| `model_name` | enum | `kling-v1-6` |
| `session_id` | string | Required |
| `edit_mode` | enum | `addition` / `swap` / `removal` |
| `image_list` | array | Required for add/swap; omit for removal |
| `prompt` | string | Use `<<<video_1>>>` / `<<<image_1>>>` references; max 2500 chars |
| `negative_prompt` | string | Optional, max 2500 chars |
| `mode` | enum | `std` (cost-effective) / `pro` (high-quality) |
| `duration` | enum | `5` or `10` seconds |
| `callback_url` | string | Optional webhook |
| `external_task_id` | string | Optional custom ID |
### edit_mode Details
| Mode | image_list | Prompt template |
|------|-----------|-----------------|
| `addition` | 12 images (pre-cropped) | `Using the context of <<<video_1>>>, seamlessly add [x] from <<<image_1>>>` |
| `swap` | 1 image only | `swap [x] from <<<image_1>>> for [x] from <<<video_1>>>` |
| `removal` | not required | `Delete [x] from <<<video_1>>>` |
**Image requirements (for add/swap):**
- Formats: `.jpg` / `.jpeg` / `.png`
- Max 10 MB; min 300 px; aspect ratio 1:2.52.5:1
- Base64: raw string only — **no** `data:image/png;base64,` prefix
---
## Step 6 — Query Task
`GET /v1/videos/multi-elements/{task_id}`
`GET /v1/videos/multi-elements?pageNum=1&pageSize=30`
Task statuses: `submitted``processing``succeed` / `failed`
Result video URL expires after **30 days** — download and store promptly.
---
## Key Takeaways
- Session-based workflow: init once → mark objects → edit → query. Session lives 24 h.
- Three edit modes: **add** (needs 12 ref images), **swap** (1 ref image), **remove** (no images needed).
- Object selection uses normalized `[0,1]` click coordinates on specific frame indices — up to 10 frames, 10 points each.
- Response masks are RLE-encoded; decode to `Uint8Array` for canvas rendering.
- Only `kling-v1-6` supports multi-elements; duration must be 5 s or 10 s matching source video length bracket.
- Generated videos auto-delete after 30 days; use `callback_url` for async workflows.
- Base64 images must be raw (no data-URI prefix).
---
## Related
- [[wiki/web-agency/kling-text-to-video-api|Kling Text-to-Video API]] — generate new videos from prompts
- [[wiki/web-agency/kling-image-to-video-api|Kling Image-to-Video API]] — animate still images
- [[wiki/web-agency/kling-multi-image-to-video-api|Kling Multi-Image-to-Video API]] — composite 24 reference images
- [[wiki/web-agency/kling-motion-control-api|Kling Motion Control API]] — pose/motion transfer
- [[wiki/web-agency/claude-code-nanobanana-website-workflow|Claude Code + Nanobanana 2 Workflow]] — end-to-end agency workflow using Kling
---
## Sources
- Raw: `raw/Kling AI Next-Gen AI Video & AI Image Generator 5.md`
- Origin: Kling AI API docs — `/v1/videos/multi-elements`