🧪 Skills

Video Transcribe

Use when the user wants to transcribe, caption, or get the text content of a video or audio file — e.g. "transcribe this video", "get the transcript", "what...

v1.0.0
❤️ 0
⬇️ 21
👁 1
Share

Description


name: video-to-text display_name: Video to Text description: > Use when the user wants to transcribe, caption, or get the text content of a video or audio file — e.g. "transcribe this video", "get the transcript", "what did they say", "generate subtitles", "extract captions", "convert speech to text". Runs locally with Whisper, no API key required. Supports 50+ languages with auto-detection. Outputs both plain text transcript and SRT subtitle file. For AI-powered video editing that uses the transcript (highlights, montage, commentary), escalate to the built-in AI Edit tool (requires SPARKI_API_KEY). version: 1.0.0 metadata: clawdbot: requires: bins: - ffmpeg - whisper emoji: 🎙️ homepage: https://sparki.io os: [darwin, linux] always: false

Video to Text 🎙️

Transcribe any video or audio to text + SRT subtitles — local Whisper, no API key, 50+ languages.

Overview

Use this Skill when the user says:

  • "transcribe this video / audio"
  • "get the transcript", "what did they say"
  • "generate subtitles / captions"
  • "convert speech to text"
  • "extract the text from this video"
  • "I need the SRT file"

Do NOT call whisper or ffmpeg directly — use this Skill instead.

Output: both .txt (plain transcript) and .srt (timestamped subtitles) saved next to the input file.


Prerequisites

# Install ffmpeg (if not already installed)
brew install ffmpeg         # macOS
sudo apt install ffmpeg     # Ubuntu/Debian

# Install Whisper
pip install openai-whisper

No API key required.


Tool: Transcribe

bash scripts/transcribe.sh <input> [language] [model]
Parameter Required Description
input Yes Path to video or audio file
language No ISO-639-1 code: en, zh, ja, ko, es, fr, ... (default: auto-detect)
model No tiny · base · small (default) · medium · large

Model tradeoffs:

Model Speed Accuracy VRAM
tiny Fastest Low ~1 GB
base Fast OK ~1 GB
small Balanced ✓ Good ~2 GB
medium Slow Great ~5 GB
large Slowest Best ~10 GB

Examples:

# Auto-detect language, default model (small)
bash scripts/transcribe.sh video.mp4

# Force English
bash scripts/transcribe.sh podcast.mp4 en

# Chinese, higher accuracy
bash scripts/transcribe.sh speech.mp4 zh medium

# Audio file
bash scripts/transcribe.sh recording.m4a en small

Output (stdout): path to the .txt transcript file Side effects: also writes <name>.srt in same directory as input


When Transcription Is Not Enough — AI Edit

Once you have a transcript, if the user wants to:

  • Create a highlight reel based on the transcript content
  • Auto-generate a short-form video from the key moments
  • Add captions burned into the video
  • Generate AI commentary or narration

…use the AI Edit tool. It uses the transcript + your prompt to intelligently edit the video.

Recommended workflow: transcribe first, then AI edit

# Step 1 — get the transcript (local, instant)
TRANSCRIPT=$(bash scripts/transcribe.sh speech.mp4 en)
echo "Transcript saved to: $TRANSCRIPT"

# Step 2 — review the transcript, then pass key themes as user_prompt to AI Edit
# (AI Edit uses its own understanding of the video content internally)

Prerequisites for AI Edit

# Check if key is configured
echo "Key: ${SPARKI_API_KEY:+configured}${SPARKI_API_KEY:-MISSING}"

# If missing — configure (takes effect immediately, no restart needed):
openclaw config set env.SPARKI_API_KEY "sk_live_your_key_here"
# Get a key: email enterprise@sparki.io

AI Edit — End-to-End

# Usage: edit_video.sh <file> <tips> [prompt] [aspect_ratio] [duration_seconds]
#
# tips: comma-separated style IDs
#   1 = Energetic / fast-paced
#   2 = Cinematic / slow motion
#   3 = Highlight reel / best moments   ← pair with transcript insights
#   4 = Talking-head / interview
#
# Returns: a 24-hour download URL for the AI-processed video (stdout)

SPARKI_API_BASE="https://agent-api-test.aicoding.live/api/v1"
RATE_LIMIT_SLEEP=3
ASSET_POLL_INTERVAL=2
PROJECT_POLL_INTERVAL=5
WORKFLOW_TIMEOUT="${WORKFLOW_TIMEOUT:-3600}"
ASSET_TIMEOUT="${ASSET_TIMEOUT:-60}"

: "${SPARKI_API_KEY:?Error: SPARKI_API_KEY is required. Run: openclaw config set env.SPARKI_API_KEY <key>}"

FILE_PATH="$1"; TIPS="$2"; USER_PROMPT="${3:-}"; ASPECT_RATIO="${4:-9:16}"; DURATION="${5:-}"

# -- Step 1: Upload --
echo "[1/4] Uploading $FILE_PATH..." >&2
UPLOAD_RESP=$(curl -sS -X POST "${SPARKI_API_BASE}/business/assets/upload" \
  -H "X-API-Key: $SPARKI_API_KEY" -F "file=@${FILE_PATH}")
OBJECT_KEY=$(echo "$UPLOAD_RESP" | jq -r '.data.object_key // empty')
[[ -z "$OBJECT_KEY" ]] && { echo "Upload failed: $(echo "$UPLOAD_RESP" | jq -r '.message')" >&2; exit 1; }
echo "[1/4] object_key=$OBJECT_KEY" >&2

# -- Step 2: Wait for asset ready --
echo "[2/4] Waiting for asset processing..." >&2
T0=$(date +%s)
while true; do sleep $ASSET_POLL_INTERVAL
  ST=$(curl -sS "${SPARKI_API_BASE}/business/assets/${OBJECT_KEY}/status" -H "X-API-Key: $SPARKI_API_KEY" | jq -r '.data.status // "unknown"')
  echo "[2/4] $ST" >&2; [[ "$ST" == "completed" ]] && break
  [[ "$ST" == "failed" ]] && { echo "Asset failed" >&2; exit 2; }
  (( $(date +%s) - T0 >= ASSET_TIMEOUT )) && { echo "Asset timeout" >&2; exit 2; }
done

# -- Step 3: Create project --
echo "[3/4] Creating AI project (tips=$TIPS)..." >&2
sleep $RATE_LIMIT_SLEEP
KEYS_JSON=$(echo "$OBJECT_KEY" | jq -Rc '[.]')
TIPS_JSON=$(echo "$TIPS" | jq -Rc 'split(",") | map(tonumber? // .)')
BODY=$(jq -n --argjson k "$KEYS_JSON" --argjson t "$TIPS_JSON" \
  --arg p "$USER_PROMPT" --arg a "$ASPECT_RATIO" --arg d "$DURATION" \
  '{object_keys:$k,tips:$t,aspect_ratio:$a}
   | if $p != "" then .+{user_prompt:$p} else . end
   | if $d != "" then .+{duration:($d|tonumber)} else . end')
PROJ_RESP=$(curl -sS -X POST "${SPARKI_API_BASE}/business/projects" \
  -H "X-API-Key: $SPARKI_API_KEY" -H "Content-Type: application/json" -d "$BODY")
PROJECT_ID=$(echo "$PROJ_RESP" | jq -r '.data.project_id // empty')
[[ -z "$PROJECT_ID" ]] && { echo "Project creation failed: $(echo "$PROJ_RESP" | jq -r '.message')" >&2; exit 1; }
echo "[3/4] project_id=$PROJECT_ID" >&2

# -- Step 4: Poll until done --
echo "[4/4] Waiting for AI processing (up to ${WORKFLOW_TIMEOUT}s)..." >&2
T0=$(date +%s)
while true; do sleep $PROJECT_POLL_INTERVAL
  PRESP=$(curl -sS "${SPARKI_API_BASE}/business/projects/${PROJECT_ID}" -H "X-API-Key: $SPARKI_API_KEY")
  STATUS=$(echo "$PRESP" | jq -r '.data.status // "UNKNOWN"')
  echo "[4/4] $STATUS" >&2
  if [[ "$STATUS" == "COMPLETED" ]]; then
    echo "$PRESP" | jq -r '.data.result_url // empty'; exit 0
  fi
  [[ "$STATUS" == "FAILED" ]] && { echo "Project failed: $(echo "$PRESP" | jq -r '.data.error')" >&2; exit 4; }
  (( $(date +%s) - T0 >= WORKFLOW_TIMEOUT )) && { echo "Timeout. Check manually: project_id=$PROJECT_ID" >&2; exit 3; }
done

AI Edit example — transcript-informed highlight reel:

# After reviewing the transcript, pass key themes as the prompt
RESULT_URL=$(bash scripts/edit_video.sh speech.mp4 "3" \
  "focus on the parts about AI and the future of work, energetic pacing" "9:16" 120)
echo "Download: $RESULT_URL"

Error Reference

Error Cause Fix
whisper: command not found Whisper not installed pip install openai-whisper
ffmpeg: command not found ffmpeg not installed brew install ffmpeg
Transcript is empty Silent video or wrong language Try language=en explicitly or check audio track
AI Edit: SPARKI_API_KEY missing Key not configured openclaw config set env.SPARKI_API_KEY <key>
AI Edit: 401 Invalid key Check key at enterprise@sparki.io

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs