Enables agents to reply in the same modality as received: voice messages get voice replies, text messages get text replies, using Edge TTS and config snippets.
Text-to-speech using Kokoro local TTS. Use when the user wants to convert text to audio, read aloud, or generate speech.
Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input,...
Work safely with files inside the OpenClaw workspace sandbox. Use for listing directories, reading text files, writing text files, and searching files by nam...
Access ElevenLabs APIs for text-to-speech, speech-to-speech, realtime speech-to-text, voice/model management, and dialogue workflows with direct HTTP calls.
# Wavespeed NanoBanana2 Text-to-Image Skill ## Overview This skill enables text-to-image generation using the Wavespeed AI NanoBanana2 API. It allows you to generate high-quality images from textual
--- name: bailian-studio description: Call Aliyun Bailian via DashScope; OCR text extraction first + TTS speak. --- # Bailian Studio First feature: OCR text extraction via DashScope. ## Requirement
Query the RAG knowledge base (Qdrant kb_main) by semantic search. Returns top-k chunks with text, doc_id, source, text_type, topic_tags.
Generates images and videos using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, video editing (VACE, keyframe interpolation). Use when the user w
FREE voice recognition using Groq's complimentary Whisper API. Transcribe audio messages to text in 50+ languages at no cost. Perfect for voice-to-text autom...
Convert recipe text (pasted text, video transcript, image description, or any raw content) into a .paprikarecipes file that can be imported directly into the...
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to
Browse 4chan boards and extract thread discussions into structured text files. Use when you need to fetch catalog information or specific thread content (including post text and file metadata) from 4c
Generate 3D models using each::sense AI. Create 3D assets from text or images for games, products, architecture, characters, vehicles, and more with PBR text...
Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.
AI video, image & music generation. 60+ models — Sora, Veo 3, Kling, Seedance, GPT Image, Suno v5, Hailuo, WAN. Text-to-video, image-to-video, text-to-image,...
Transcribe audio to text using Volcano Engine (Volcengine/ARK) speech-to-text APIs. Use when the user wants to replace Whisper/OpenAI STT with Volcengine, tr...
Generate images from text prompts using ZenMux API (Vertex AI protocol with Gemini models). Use when: (1) User wants to generate/create images from text desc...
Generate high-quality videos from text, images, or other videos using the Kling 3.0 Omni model. Covers text-to-video, image-to-video, video editing, video re...
Generate images and videos via Runware API. Access to FLUX, Stable Diffusion, Kling AI, and other top models. Supports text-to-image, image-to-image, upscaling, text-to-video, and image-to-video. Use
Skill for Tencent Cloud HunYuan Text-to-Image Generation (混元生图). Provides AI image generation from text prompts using the HunYuan large model. Supports refer...
Offline speech-to-text conversion using Vosk local model; input audio file path, output transcript text.
View, extract, edit, and manipulate PDF files. Supports text extraction, text editing (overlay and replacement), merging, splitting, rotating pages, and getting PDF metadata. Use when working with PDF
Generate professional PDF invoices from simple text commands. Supports multiple currencies, tax calculation, CJK text, and customizable templates. No externa...