6.2 KiB
| title | aliases | tags | sources | created | updated | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Kling AI Multi-Elements API Reference |
|
|
|
2026-04-26 | 2026-04-26 |
Overview
The Multi-Elements API lets you add, swap, or remove specific objects within an existing video using AI segmentation. You mark elements by clicking coordinates, then submit an editing task.
Base URL: https://api-singapore.klingai.com
Supported model: kling-v1-6 only.
Workflow (5 Steps)
- Init —
POST /v1/videos/multi-elements/init-selection— parse video, getsession_id - Add selection —
POST /v1/videos/multi-elements/add-selection— click a point on a frame to mark an object - Preview —
POST /v1/videos/multi-elements/preview-selection— see masked overlay before committing - Create task —
POST /v1/videos/multi-elements— chooseedit_mode+ prompt + optional reference images - Query —
GET /v1/videos/multi-elements/{task_id}— poll untilsucceed
Optional cleanup steps: delete-selection (remove specific points) or clear-selection (wipe all).
Step 1 — Init Selection
POST /v1/videos/multi-elements/init-selection
| Field | Type | Notes |
|---|---|---|
video_id |
string (optional) | Kling-generated video, last 30 days only |
video_url |
string (optional) | Public .mp4 / .mov URL |
Video constraints:
- Duration: 2–5 s or 7–10 s
- Resolution: 720–2160 px (both dimensions)
- Frame rate: 24, 30, or 60 fps
Response fields:
session_id— valid for 24 hours, used in all subsequent callsfps,original_duration,total_frame— required when creating the tasknormalized_video— URL of the processed video
Step 2 — Add Selection
POST /v1/videos/multi-elements/add-selection
| Field | Type | Notes |
|---|---|---|
session_id |
string | Required |
frame_index |
int | Which frame to mark (max 10 frames total) |
points |
array | {x, y} in [0,1] range (top-left = 0,0); up to 10 points per frame |
Response: returns rle_mask_list — RLE-encoded segmentation masks + PNG base64 per object.
Decoding the RLE Mask (TypeScript)
export type RLEObject = { size: [h: number, w: number]; counts: string }
export function decode(rleObj: RLEObject): Uint8Array {
// Returns flat Uint8Array (row-major): 1 = masked pixel, 0 = background
// ... see full implementation in source article
}
Rendering the Mask Overlay (Canvas)
function drawMask(rleMask: string, height: number, width: number) {
const decodeData = decode({ counts: rleMask, size: [height, width] })
// Paint pixels with RGBA (116, 255, 82, 163) where decodeData[y*w+x] === 1
}
Step 3 — Delete / Clear Selection (optional)
POST /v1/videos/multi-elements/delete-selection
- Same body as add-selection;
pointsmust exactly match coordinates used when adding.
POST /v1/videos/multi-elements/clear-selection
- Only requires
session_id— wipes all marked areas.
Step 4 — Preview Selection (optional)
POST /v1/videos/multi-elements/preview-selection
Returns: video (masked overlay), video_cover, tracking_output (per-frame mask).
Step 5 — Create Task
POST /v1/videos/multi-elements
| Field | Type | Notes |
|---|---|---|
model_name |
enum | kling-v1-6 |
session_id |
string | Required |
edit_mode |
enum | addition / swap / removal |
image_list |
array | Required for add/swap; omit for removal |
prompt |
string | Use <<<video_1>>> / <<<image_1>>> references; max 2500 chars |
negative_prompt |
string | Optional, max 2500 chars |
mode |
enum | std (cost-effective) / pro (high-quality) |
duration |
enum | 5 or 10 seconds |
callback_url |
string | Optional webhook |
external_task_id |
string | Optional custom ID |
edit_mode Details
| Mode | image_list | Prompt template |
|---|---|---|
addition |
1–2 images (pre-cropped) | Using the context of <<<video_1>>>, seamlessly add [x] from <<<image_1>>> |
swap |
1 image only | swap [x] from <<<image_1>>> for [x] from <<<video_1>>> |
removal |
not required | Delete [x] from <<<video_1>>> |
Image requirements (for add/swap):
- Formats:
.jpg/.jpeg/.png - Max 10 MB; min 300 px; aspect ratio 1:2.5–2.5:1
- Base64: raw string only — no
data:image/png;base64,prefix
Step 6 — Query Task
GET /v1/videos/multi-elements/{task_id}
GET /v1/videos/multi-elements?pageNum=1&pageSize=30
Task statuses: submitted → processing → succeed / failed
Result video URL expires after 30 days — download and store promptly.
Key Takeaways
- Session-based workflow: init once → mark objects → edit → query. Session lives 24 h.
- Three edit modes: add (needs 1–2 ref images), swap (1 ref image), remove (no images needed).
- Object selection uses normalized
[0,1]click coordinates on specific frame indices — up to 10 frames, 10 points each. - Response masks are RLE-encoded; decode to
Uint8Arrayfor canvas rendering. - Only
kling-v1-6supports multi-elements; duration must be 5 s or 10 s matching source video length bracket. - Generated videos auto-delete after 30 days; use
callback_urlfor async workflows. - Base64 images must be raw (no data-URI prefix).
Related
- wiki/web-agency/kling-text-to-video-api — generate new videos from prompts
- wiki/web-agency/kling-image-to-video-api — animate still images
- wiki/web-agency/kling-multi-image-to-video-api — composite 2–4 reference images
- wiki/web-agency/kling-motion-control-api — pose/motion transfer
- wiki/web-agency/claude-code-nanobanana-website-workflow — end-to-end agency workflow using Kling
Sources
- Raw:
raw/Kling AI Next-Gen AI Video & AI Image Generator 5.md - Origin: Kling AI API docs —
/v1/videos/multi-elements