fal.ai API integration with managed API key authentication. Run AI models for image generation, video generation, audio processing, and more. Use this skill...
The cheapest AI media API on the market. Generate images (Flux), music (AceStep), speech with voice cloning, transcribe video/audio, OCR, video generation, b...
Create AI marketing videos for ads, promos, product launches, and brand content. Models: Veo, Seedance, Wan, FLUX for visuals, Kokoro for voiceover. Types: p...
ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps...
End-to-end voice workflow with Deepgram STT and TTS. Use when transcribing voice messages, generating spoken replies, or building a shell-based audio pipelin...
Open-source first AI inference — GLM-5 as default, Claude as fallback only. Own your inference forever via the Morpheus decentralized network. Stake MOR toke...
Complete Open WebUI API integration for managing LLM models, chat completions, Ollama proxy operations, file uploads, knowledge bases (RAG), image generation, audio processing, and pipelines. Use this
Telegram bot that transcribes voice messages using Whisper and replies in Chinese with Microsoft Edge text-to-speech.
Generate AI videos with Google Veo, Seedance, Wan, Grok and 40+ models via inference.sh CLI. Models: Veo 3.1, Veo 3, Seedance 1.5 Pro, Wan 2.5, Grok Imagine...
Transcribe audio files via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0, recorded audio → text) API from ByteDance/Volcengine. Best-in-class Chinese speech recognition...
PCClaw provides 16 native Windows AI skills for system control, automation, files, notifications, OCR, speech, LLM inference, and task management with minima...
Complete Telnyx toolkit — ready-to-use tools (STT, TTS, RAG, Networking, 10DLC) plus SDK documentation for JavaScript, Python, Go, Java, and Ruby.
--- name: voice-agent display-name: AI Voice Agent Backend version: 1.1.0 description: Local Voice Input/Output for Agents using the AI Voice Agent API. author: trevisanricardo homepage: https://githu
Local voice I/O for OpenClaw agents. Transcribe inbound audio/voice messages using local Whisper (whisper.cpp) and generate voice replies using local Piper T...
--- name: voice-stt-tts description: Full voice message setup (STT + TTS) for OpenClaw using faster-whisper and Edge TTS homepage: https://docs.openclaw.ai/nodes/audio metadata: { "openclaw":
Control Home Assistant devices and automations via REST API. 25 entity domains including lights, climate, locks, presence, weather, calendars, notifications, scripts, and more. Use when the user asks
--- name: avatar description: Interactive AI avatar with Simli video rendering and ElevenLabs TTS emoji: "\U0001F9D1\u200D\U0001F4BB" homepage: https://github.com/Johannes-Berggren/openclaw-avatar met
Produce complete code-based animated videos by scripting, generating narration, creating visual assets, and rendering final MP4s using the code2animation fra...
Decentralized compute and data marketplace for AI agents with spot pricing | 去中心化 AI Agent 计算和数据市场,支持 Spot 动态定价
Route Alibaba Cloud Model Studio requests to the right local skill (Qwen Image, Qwen Image Edit, Wan Video, Wan R2V, Qwen TTS, Qwen ASR and advanced TTS vari...
Provides daily updated authoritative data and APIs tracking state-of-the-art AI models across categories from LMArena, Artificial Analysis, and HuggingFace.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...
Transcribe recorded audio files to text via UniCloud ASR API, supporting multiple formats and domains like finance and customer service; requires configured...