Transcribe audio files to text via Step ASR streaming API (HTTP SSE). Supports Chinese and English, multiple audio formats (PCM, WAV, MP3, OGG/OPUS), real-ti...
Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...
AI media generation via deAPI. Transcribe YouTube/audio/video, generate images from text, text-to-speech, OCR, remove backgrounds, upscale images, create vid...
Transcribe, index, and semantically search all voice recordings, extracting action items and meeting insights for comprehensive conversation intelligence.
Launch voice collection campaigns for feature phones, list active tasks, and monitor campaign stats. Validate and transcribe audio samples automatically to ensure high-quality datasets. Credit mobile
Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script...
Video to text converter. Downloads videos from Bilibili using bilibili-api, from other sites using yt-dlp, then transcribes audio using faster-whisper. Use w...
Analyze videos from TikTok, YouTube, Instagram, Twitter, and others by URL, transcribing audio locally and answering questions about the content.
Voice communication via Telegram. Automatically transcribes incoming voice messages using faster-whisper and replies with TTS voice. Use for all voice-relate...
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's
Transcribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search.
Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.
Create a verbatim transcript for a YouTube URL using Google Gemini (speaker labels, paragraph breaks; no time codes). Use when the user asks to transcribe a YouTube video or wants a clean transcript (
Smart Telegram reply workflow for OpenClaw: if the user sends text, reply with text; if the user sends a voice note/audio, transcribe locally using the insta...
Automatically track creator channels and transcribe new videos (YouTube, Bilibili, TikTok) with zero token cost during the pipeline. Use memory-based updates...
--- name: musa-torch-coding description: Transcribe audio via OpenAI Audio Transcriptions API (Whisper). homepage: https://platform.openai.com/docs/guides/speech-to-text metadata: { "openc
Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe...
macOS CLI for transcribing audio and video files using local Whisper models or Whisnap Cloud.
Set up mlx-whisper as the local audio transcription engine for OpenClaw on Apple Silicon Macs (M1/M2/M3/M4). Automatically transcribes voice notes sent via T...
Local speech-to-text using Vosk. Lightweight, fast, fully offline. Perfect for transcribing Telegram voice messages, audio files, or any speech-to-text task without cloud APIs.