Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation f
Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script...
Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...
Transcribe audio files via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0, recorded audio → text) API from ByteDance/Volcengine. Best-in-class Chinese speech recognition...
Anima Avatar - Interactive Video Generation Engine. Generates 16:9 videos with dynamic character sprites (Shutiao), synced audio (Fish Audio), and text overlay.
Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use wh...
Generate and publish a dual-host daily podcast. Fetches news, generates a conversational script between two hosts, synthesizes audio via Fish Audio or Edge T...
Simple text-to-speech skill using MiniMax Voice API. Converts text to audio with customizable voice selection. Use for generating speech audio from text.
Automates downloading YouTube audio, launching Doubao, playing audio, and capturing translations for full video subtitle extraction.
TTS (text-to-speech) via IMA Open API with seed-tts-2.0. Voice synthesis, speech from text, dubbing, audio content creation. Output: audio URL (mp3/wav). Flo...
--- name: byt-workflow description: YouTube video translation workflow, download audio, launch Doubao, play audio, capture translation tools: - youtube_translate --- # Byt Workflow ## Usage `
Use when the user needs local speech-to-text transcription for audio files, especially Chinese or mixed Chinese-English audio, without relying on cloud trans...
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use thi...
Generate lip-sync video from image + user's own audio recording. ✅ USE WHEN: - User provides their OWN audio file (voice recording) - Want to sync image to specific audio/voice - User recorded the
Transcribe audio files to text via Step ASR streaming API (HTTP SSE). Supports Chinese and English, multiple audio formats (PCM, WAV, MP3, OGG/OPUS), real-ti...
Transcribe Telegram voice messages and audio notes into text using the OpenAI Whisper API. Use when (1) a user sends a voice message or audio note via Telegr...
Convert text to podcast audio using Tencent Cloud TTS. Supports both short and long text processing, generates up to 30-minute long audio with automatic chun...
Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcript...
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
Transcribe audio via the self-hosted Whisper ASR instance running on Kubernetes. Use this skill whenever the user wants to transcribe audio files, convert sp...
Transcribe audio files via Groq's OpenAI-compatible speech-to-text API. Use when the user sends voice messages or audio files and you need fast cloud speech-...
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.