🧪 Skills

Sogni Gen

Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "...

v1.5.16
❤️ 2
⬇️ 578
👁 1
Share

Description


name: sogni-gen version: "1.5.16" description: Generate images and videos using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "draw", "generate", "create an image", or "make a video/animate" from a prompt or reference image. homepage: https://sogni.ai metadata: clawdbot: emoji: "🎨" primaryEnv: "SOGNI_API_KEY" os: ["darwin", "linux", "win32"] requires: bins: ["node"] anyBins: ["ffmpeg"] env: - "SOGNI_API_KEY" - "SOGNI_USERNAME" - "SOGNI_PASSWORD" - "SOGNI_CREDENTIALS_PATH" - "SOGNI_LAST_RENDER_PATH" - "SOGNI_MEDIA_INBOUND_DIR" - "OPENCLAW_CONFIG_PATH" - "OPENCLAW_PLUGIN_CONFIG" - "FFMPEG_PATH" - "SOGNI_DOWNLOADS_DIR" - "SOGNI_MCP_SAVE_DOWNLOADS" - "SOGNI_ALLOWED_DOWNLOAD_HOSTS" config: - "/.config/sogni/credentials" - "/.openclaw/openclaw.json" - "/.clawdbot/media/inbound" - "/.config/sogni/last-render.json" - "~/Downloads/sogni" install: - id: npm kind: exec command: "cd {{skillDir}} && cp skill-package.json package.json && npm i" label: "Prepare runtime dependencies"

Sogni Image & Video Generation

Generate images and videos using Sogni AI's decentralized GPU network.

Setup

  1. Get Sogni credentials at https://app.sogni.ai/
  2. Create credentials file:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
# or:
# SOGNI_USERNAME=your_username
# SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentials

You can also export SOGNI_API_KEY, or SOGNI_USERNAME + SOGNI_PASSWORD, instead of writing the file.

  1. Install dependencies (if cloned):
cd /path/to/sogni-gen
npm i
  1. Or install from npm (no git clone):
mkdir -p ~/.clawdbot/skills
cd ~/.clawdbot/skills
npm i sogni-gen
ln -sfn node_modules/sogni-gen sogni-gen

When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from skill-package.json during install. That avoids relying on a root package.json being present in the published skill artifact.

Filesystem Paths and Overrides

Default file paths used by this skill:

  • Credentials file (read): ~/.config/sogni/credentials
  • Last render metadata (read/write): ~/.config/sogni/last-render.json
  • OpenClaw config (read): ~/.openclaw/openclaw.json
  • Media listing for --list-media (read): ~/.clawdbot/media/inbound
  • MCP local result copies (write): ~/Downloads/sogni

Path override environment variables:

  • SOGNI_CREDENTIALS_PATH
  • SOGNI_LAST_RENDER_PATH
  • SOGNI_MEDIA_INBOUND_DIR
  • OPENCLAW_CONFIG_PATH
  • SOGNI_DOWNLOADS_DIR (MCP)
  • SOGNI_MCP_SAVE_DOWNLOADS=0 to disable MCP local file writes
  • SOGNI_ALLOWED_DOWNLOAD_HOSTS to override which HTTPS hosts the MCP server may auto-download and save locally

Usage (Images & Video)

# Generate and get URL
node sogni-gen.mjs "a cat wearing a hat"

# Save to file
node sogni-gen.mjs -o /tmp/cat.png "a cat wearing a hat"

# JSON output (for scripting)
node sogni-gen.mjs --json "a cat wearing a hat"

# Check token balances (no prompt required)
node sogni-gen.mjs --balance

# Check token balances in JSON
node sogni-gen.mjs --json --balance

# Quiet mode (suppress progress)
node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"

Options

Flag Description Default
-o, --output <path> Save to file prints URL
-m, --model <id> Model ID z_image_turbo_bf16
-w, --width <px> Width 512
-h, --height <px> Height 512
-n, --count <num> Number of images 1
-t, --timeout <sec> Timeout seconds 30 (300 for video)
-s, --seed <num> Specific seed random
--last-seed Reuse seed from last render -
--seed-strategy <s> Seed strategy: random|prompt-hash prompt-hash
--multi-angle Multiple angles LoRA mode (Qwen Image Edit) -
--angles-360 Generate 8 azimuths (front -> front-left) -
--angles-360-video Assemble looping 360 mp4 using i2v between angles (requires ffmpeg) -
--azimuth <key> front|front-right|right|back-right|back|back-left|left|front-left front
--elevation <key> low-angle|eye-level|elevated|high-angle eye-level
--distance <key> close-up|medium|wide medium
--angle-strength <n> LoRA strength for multiple_angles 0.9
--angle-description <text> Optional subject description -
--steps <num> Override steps (model-dependent) -
--guidance <num> Override guidance (model-dependent) -
--output-format <f> Image output format: png|jpg png
--sampler <name> Sampler (model-dependent) -
--scheduler <name> Scheduler (model-dependent) -
--lora <id> LoRA id (repeatable, edit only) -
--loras <ids> Comma-separated LoRA ids -
--lora-strength <n> LoRA strength (repeatable) -
--lora-strengths <n> Comma-separated LoRA strengths -
--token-type <type> Token type: spark|sogni spark
--balance, --balances Show SPARK/SOGNI balances and exit -
-c, --context <path> Context image for editing -
--last-image Use last generated image as context/ref -
--video, -v Generate video instead of image -
--workflow <type> Video workflow (t2v|i2v|s2v|ia2v|a2v|v2v|animate-move|animate-replace) inferred
--fps <num> Frames per second (video) 16
--duration <sec> Duration in seconds (video) 5
--frames <num> Override total frames (video) -
--auto-resize-assets Auto-resize video assets true
--no-auto-resize-assets Disable auto-resize -
--estimate-video-cost Estimate video cost and exit (requires --steps) -
--photobooth Face transfer mode (InstantID + SDXL Turbo) -
--cn-strength <n> ControlNet strength (photobooth) 0.8
--cn-guidance-end <n> ControlNet guidance end point (photobooth) 0.3
--ref <path> Reference image for video or photobooth face required for video/photobooth
--ref-end <path> End frame for i2v interpolation -
--ref-audio <path> Reference audio for s2v -
--ref-video <path> Reference video for animate/v2v workflows -
--controlnet-name <name> ControlNet type for v2v: canny|pose|depth|detailer -
--controlnet-strength <n> ControlNet strength for v2v (0.0-1.0) 0.8
--sam2-coordinates <coords> SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2) -
--trim-end-frame Trim last frame for seamless video stitching -
--first-frame-strength <n> Keyframe strength for start frame (0.0-1.0) -
--last-frame-strength <n> Keyframe strength for end frame (0.0-1.0) -
--last Show last render info -
--json JSON output false
--strict-size Do not auto-adjust i2v video size for reference resizing constraints false
-q, --quiet No progress output false
--extract-last-frame <video> <image> Extract last frame from video (safe ffmpeg wrapper) -
--concat-videos <out> <clips...> Concatenate video clips (safe ffmpeg wrapper) -
--list-media [type] List recent inbound media (images|audio|all) images

OpenClaw Config Defaults

When installed as an OpenClaw plugin, sogni-gen will read defaults from:

~/.openclaw/openclaw.json

{
  "plugins": {
    "entries": {
      "sogni-gen": {
        "enabled": true,
        "config": {
          "defaultImageModel": "z_image_turbo_bf16",
          "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
          "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
          "videoModels": {
            "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v",
            "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
            "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
            "ia2v": "ltx2-19b-fp8_ia2v_distilled",
            "a2v": "ltx2-19b-fp8_a2v_distilled",
            "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
            "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
            "v2v": "ltx2-19b-fp8_v2v_distilled"
          },
          "defaultVideoWorkflow": "t2v",
          "defaultNetwork": "fast",
          "defaultTokenType": "spark",
          "seedStrategy": "prompt-hash",
          "modelDefaults": {
            "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
            "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
          },
          "defaultWidth": 768,
          "defaultHeight": 768,
          "defaultCount": 1,
          "defaultFps": 16,
          "defaultDurationSec": 5,
          "defaultImageTimeoutSec": 30,
          "defaultVideoTimeoutSec": 300,
          "credentialsPath": "~/.config/sogni/credentials",
          "lastRenderPath": "~/.config/sogni/last-render.json",
          "mediaInboundDir": "~/.clawdbot/media/inbound"
        }
      }
    }
  }
}

CLI flags always override these defaults. If your OpenClaw config lives elsewhere, set OPENCLAW_CONFIG_PATH. Seed strategies: prompt-hash (deterministic) or random.

Image Models

Model Speed Use Case
z_image_turbo_bf16 Fast (~5-10s) General purpose, default
flux1-schnell-fp8 Very fast Quick iterations
flux2_dev_fp8 Slow (~2min) High quality
chroma-v.46-flash_fp8 Medium Balanced
qwen_image_edit_2511_fp8 Medium Image editing with context (up to 3)
qwen_image_edit_2511_fp8_lightning Fast Quick image editing
coreml-sogniXLturbo_alpha1_ad Fast Photobooth face transfer (SDXL Turbo)

Video Models

WAN 2.2 Models

Model Speed Use Case
wan_v2.2-14b-fp8_i2v_lightx2v Fast Default video generation
wan_v2.2-14b-fp8_i2v Slow Higher quality video
wan_v2.2-14b-fp8_t2v_lightx2v Fast Text-to-video
wan_v2.2-14b-fp8_s2v_lightx2v Fast Sound-to-video
wan_v2.2-14b-fp8_animate-move_lightx2v Fast Animate-move
wan_v2.2-14b-fp8_animate-replace_lightx2v Fast Animate-replace

LTX-2 / LTX-2.3 Models

Model Speed Use Case
ltx2-19b-fp8_t2v_distilled Fast (~2-3min) Text-to-video, 8-step
ltx2-19b-fp8_t2v Medium (~5min) Text-to-video, 20-step quality
ltx2-19b-fp8_i2v_distilled Fast (~2-3min) Image-to-video, 8-step
ltx2-19b-fp8_i2v Medium (~5min) Image-to-video, 20-step quality
ltx2-19b-fp8_ia2v_distilled Fast (~2-3min) Image+audio-to-video
ltx2-19b-fp8_a2v_distilled Fast (~2-3min) Audio-to-video
ltx2-19b-fp8_v2v_distilled Fast (~3min) Video-to-video with ControlNet
ltx2-19b-fp8_v2v Medium (~5min) Video-to-video with ControlNet, quality
ltx23-22b-fp8_t2v_distilled Fast (~2-3min) Text-to-video, LTX-2.3

Image Editing with Context

Edit images using reference images (Qwen models support up to 3):

# Single context image
node sogni-gen.mjs -c photo.jpg "make the background a beach"

# Multiple context images (subject + style)
node sogni-gen.mjs -c subject.jpg -c style.jpg "apply the style to the subject"

# Use last generated image as context
node sogni-gen.mjs --last-image "make it more vibrant"

When context images are provided without -m, defaults to qwen_image_edit_2511_fp8_lightning.

Photobooth (Face Transfer)

Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use --photobooth with --ref pointing to their face image.

# Basic photobooth
node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"

# Multiple outputs
node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"

# Custom ControlNet tuning
node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"

Uses SDXL Turbo (coreml-sogniXLturbo_alpha1_ad) at 1024x1024 by default. The face image is passed via --ref and styled according to the prompt. Cannot be combined with --video or -c/--context.

Agent usage:

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Multiple photobooth outputs
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"

Multiple Angles (Turnaround)

Generate specific camera angles from a single reference image using the Multiple Angles LoRA:

# Single angle
node sogni-gen.mjs --multi-angle -c subject.jpg \
  --azimuth front-right --elevation eye-level --distance medium \
  --angle-strength 0.9 \
  "studio portrait, same person"

# 360 sweep (8 azimuths)
node sogni-gen.mjs --angles-360 -c subject.jpg --distance medium --elevation eye-level \
  "studio portrait, same person"

# 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
node sogni-gen.mjs --angles-360 --angles-360-video /tmp/turntable.mp4 \
  -c subject.jpg --distance medium --elevation eye-level \
  "studio portrait, same person"

The prompt is auto-built with the required <sks> token plus the selected camera angle keywords. --angles-360-video generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.

360 Video Best Practices

When a user requests a "360 video", follow this workflow:

  1. Default camera parameters (do not ask unless they specify):

    • Elevation: default to medium
    • Distance: default to medium
  2. Map user terms to flags:

    User says Flag value
    "high" angle --elevation high-angle
    "medium" angle --elevation eye-level
    "low" angle --elevation low-angle
    "close" --distance close-up
    "medium" distance --distance medium
    "far" --distance wide
  3. Always use first-frame/last-frame stitching - the --angles-360-video flag automatically handles this by generating i2v clips between consecutive angles including last→first for seamless looping.

  4. Example command:

    node sogni-gen.mjs --angles-360 --angles-360-video /tmp/output.mp4 \
      -c /path/to/image.png --elevation eye-level --distance medium \
      "description of subject"
    

Transition Video Rule

For any transition video work, always use the Sogni skill/plugin (not raw ffmpeg or other shell commands). Use the built-in --extract-last-frame, --concat-videos, and --looping flags for video manipulation.

Insufficient Funds Handling

When you see "Debit Error: Insufficient funds", reply:

"Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"

Video Generation

Generate videos from a reference image:

# Text-to-video (t2v)
node sogni-gen.mjs --video "ocean waves at sunset"

# Basic video from image
node sogni-gen.mjs --video --ref cat.jpg -o cat.mp4 "cat walks around"

# Use last generated image as reference
node sogni-gen.mjs --last-image --video "gentle camera pan"

# Custom duration and FPS
node sogni-gen.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"

# Sound-to-video (s2v)
node sogni-gen.mjs --video --ref face.jpg --ref-audio speech.m4a \
  -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"

# Image+audio-to-video (ia2v, LTX)
node sogni-gen.mjs --video --workflow ia2v --ref cover.jpg --ref-audio song.mp3 \
  "music video with synchronized motion"

# Audio-to-video (a2v, LTX)
node sogni-gen.mjs --video --workflow a2v --ref-audio song.mp3 \
  "abstract audio-reactive visualizer"

# LTX-2.3 text-to-video
node sogni-gen.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
  "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."

# Animate (motion transfer)
node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 \
  --workflow animate-move "transfer motion"

Video-to-Video (V2V) with ControlNet

Transform an existing video using LTX-2 models with ControlNet guidance:

# Basic v2v with canny edge detection
node sogni-gen.mjs --video --workflow v2v --ref-video input.mp4 \
  --controlnet-name canny "stylized anime version"

# V2V with pose detection and custom strength
node sogni-gen.mjs --video --workflow v2v --ref-video dance.mp4 \
  --controlnet-name pose --controlnet-strength 0.7 "robot dancing"

# V2V with depth map
node sogni-gen.mjs --video --workflow v2v --ref-video scene.mp4 \
  --controlnet-name depth "watercolor painting style"

ControlNet types: canny (edge detection), pose (body pose), depth (depth map), detailer (detail enhancement).

Photo Restoration

Restore damaged vintage photos using Qwen image editing:

# Basic restoration
sogni-gen -c damaged_photo.jpg -o restored.png \
  "professionally restore this vintage photograph, remove damage and scratches"

# Detailed restoration with preservation hints
sogni-gen -c old_photo.jpg -o restored.png -w 1024 -h 1280 \
  "restore this vintage photo, remove peeling, tears and wear marks, \
  preserve natural features and expression, maintain warm nostalgic color tones"

Tips for good restorations:

  • Describe the damage: "peeling", "scratches", "tears", "fading"
  • Specify what to preserve: "natural features", "eye color", "hair", "expression"
  • Mention the era for color tones: "1970s warm tones", "vintage sepia"

Finding received images (Telegram/etc):

node {{skillDir}}/sogni-gen.mjs --json --list-media images

Do NOT use ls, cp, or other shell commands to browse user files. Always use --list-media to find inbound media.

IMPORTANT KEYWORD RULE

  • If the user message includes the word "photobooth" (case-insensitive), always use --photobooth mode with --ref set to the user-provided face image.
  • Prioritize this rule over generic image-edit flows (-c) for that request.

LTX-2.3 Prompt Rule

Whenever the chosen video model is ltx23-22b-fp8_t2v_distilled, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling sogni-gen.

  • Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting.
  • Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps.
  • Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources.
  • Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
  • Give the scene one main action thread from start to finish. Use connectors like as, while, and then so motion reads as a continuous filmed moment.
  • If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
  • Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
  • Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like beautiful or nice, or structural markup such as [DIALOGUE].
  • Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
  • Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.

Duration-Aware Pacing

Match scene density to clip length so prompts stay filmable:

  • About 1-4s: describe exactly 1 action or moment.
  • About 5-8s: describe about 2 sequential actions.
  • About 9-12s: describe about 3 sequential actions.
  • Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.

Orientation Mapping

When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions:

  • vertical, portrait, story, reel, tiktok -> -w 1088 -h 1920
  • landscape, horizontal, widescreen, youtube, 16:9 -> -w 1920 -h 1088
  • square, 1:1 -> -w 1088 -h 1088
  • 4:3 portrait -> -w 832 -h 1088
  • 4:3 landscape -> -w 1088 -h 832

Camera Language Normalization

When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt:

  • zoom in -> slow push-in
  • zoom out -> slow pull-back
  • pan left / pan right -> smooth pan left / smooth pan right
  • orbit / circle around -> slow arc left or slow arc right
  • follow -> tracking follow

Short example:

User ask: "4k video of a woman in a neon alley"

Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."

Agent Usage

When user asks to generate/draw/create an image:

# Generate and save locally
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "user's prompt"

# Edit an existing image
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"

# Generate video from image
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."

# Generate text-to-video
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."

# HD / "4K" text-to-video: prefer LTX-2.3
node {{skillDir}}/sogni-gen.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."

# HD / "4K" image-to-video: prefer LTX i2v
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -m ltx2-19b-fp8_i2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."

# Photobooth: stylize a face photo
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"

# Check current SPARK/SOGNI balances (no prompt required)
node {{skillDir}}/sogni-gen.mjs --json --balance

# Find user-sent images/audio
node {{skillDir}}/sogni-gen.mjs --json --list-media images

# Then send via message tool with filePath

High-Res Video Routing

When the user asks for video in "hd", "1080p", "4k", "uhd", or "high-res", do not use the default WAN video models.

  • For text-to-video, use -m ltx23-22b-fp8_t2v_distilled.
  • For image-to-video, use -m ltx2-19b-fp8_i2v_distilled.
  • Prefer LTX-sized dimensions such as -w 1920 -h 1088.
  • If the user explicitly asks for vertical, portrait, story, reel, tiktok, square, or 4:3, apply the matching dimensions from the Orientation Mapping rules instead of defaulting to 16:9.
  • Rewrite the user's request using the LTX-2.3 Prompt Rule before invoking the command. Do not send short slogan-style prompts to LTX.
  • Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even if the exact output is not literal 3840x2160.

Security: Agents must use the CLI's built-in flags (--extract-last-frame, --concat-videos, --list-media) for all file operations and video manipulation. Never run raw shell commands (ffmpeg, ls, cp, etc.) directly.

Animate Between Two Images (First-Frame / Last-Frame)

When a user asks to animate between two images, use --ref (first frame) and --ref-end (last frame) to create a creative interpolation video:

# Animate from image A to image B
node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"

Animate a Video to an Image (Scene Continuation)

When a user asks to animate from a video to an image (or "continue" a video into a new scene):

  1. Extract the last frame of the existing video using the built-in safe wrapper:
    node {{skillDir}}/sogni-gen.mjs --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png
    
  2. Generate a new video using the last frame as --ref and the target image as --ref-end:
    node {{skillDir}}/sogni-gen.mjs -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt"
    
  3. Concatenate the videos using the built-in safe wrapper:
    node {{skillDir}}/sogni-gen.mjs --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4
    

This ensures visual continuity — the new clip picks up exactly where the previous one ended.

Do NOT run raw ffmpeg commands. Always use --extract-last-frame and --concat-videos for video manipulation.

Always apply this pattern when:

  • User says "animate image A to image B" → use --ref A --ref-end B
  • User says "animate this video to this image" → extract last frame, use as --ref, target image as --ref-end, then stitch
  • User says "continue this video" with a target image → same as above

JSON Output

{
  "success": true,
  "prompt": "a cat wearing a hat",
  "model": "z_image_turbo_bf16", 
  "width": 512,
  "height": 512,
  "urls": ["https://..."],
  "localPath": "/tmp/cat.png"
}

On error (with --json), the script returns a single JSON object like:

{
  "success": false,
  "error": "Video width and height must be divisible by 16 (got 500x512).",
  "errorCode": "INVALID_VIDEO_SIZE",
  "hint": "Choose --width/--height divisible by 16. For i2v, also match the reference aspect ratio."
}

Balance check example (--json --balance):

{
  "success": true,
  "type": "balance",
  "spark": 12.34,
  "sogni": 0.56
}

Cost

Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient.

Troubleshooting

  • Auth errors: Check SOGNI_API_KEY or the credentials in ~/.config/sogni/credentials
  • i2v sizing gotchas: Video sizes are constrained (min 480px, max 1536px, divisible by 16). For i2v, the client wrapper resizes the reference (fit: inside) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size (example: 1024x1536 requested but ref becomes 1024x1535).
  • Auto-adjustment: With a local --ref, the script will auto-adjust the requested size to avoid non-16 resized reference dimensions.
  • If the script adjusts your size but you want to fail instead: pass --strict-size and it will print a suggested --width/--height.
  • Timeouts: Try a faster model or increase -t timeout
  • No workers: Check https://sogni.ai for network status

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs