Generate AI podcast episodes from PDFs, text, notes, and links using MagicPodcast in OpenClaw. Creates natural two-person dialogue audio, supports custom lan...
Analyzes audio to detect BPM, key, structure, genre, mood, transcribe lyrics, and generate visual and textual summaries of music tracks.
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Generate AI-optimized Alt Text, file names, captions, and Schema markup for images, videos, and audio assets. Improves AI discoverability on Google Lens, Cha...
Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium s
Video lip synchronization using LipSync 2.0 API. Automatically synchronizes audio with lip movements in videos. Powered by Dreamface - AI tools for everyone....
Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...
Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to
Voice input and microphone button for OpenClaw WebChat Control UI. Adds a mic button to chat, records audio via browser MediaRecorder, transcribes locally vi...
ElevenLabs API integration with managed authentication. AI-powered text-to-speech, voice cloning, sound effects, and audio processing. Use this skill when users want to generate speech from text, clon
Transcribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search.
--- name: any-whisper-api description: Transcribe audio via API Whisper with any compatible local servers. homepage: https://platform.openai.com/docs/guides/speech-to-text metadata: {"clawdbot":{"emoj
Analyze Twitter Spaces and voice conversations to extract market intelligence, crypto alpha, sentiment analysis, and speaker-attributed insights. Transforms spoken audio into structured reports, full
Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing
Control AirPlay speakers via Airfoil from the command line. Connect, disconnect, set volume, and manage multi-room audio with simple CLI commands.
Transcribe YouTube videos to text by extracting captions and subtitles directly from the video URL using yt-dlp without audio processing.
Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.
Use when the user mentions Otter, Otter.ai, or wants to find, search, download, export, or manage meeting notes, transcripts, recordings, or audio from calls...
Offline speech-to-text conversion using Vosk local model; input audio file path, output transcript text.
Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download subtitles, or
Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list avai
Free, unlimited text-to-speech using Microsoft Edge neural voices via Python edge-tts. Use when generating long-form audio, podcasts, voice notes, spoken bri...
Consume the shared Whisper speech-to-text API over Tailnet at http://100.92.116.99:8765 using OpenAI-compatible audio transcription endpoint (/v1/audio/trans...