Transcribe audio files using Sber Salute Speech async API. Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.
OpenAI API integration — chat completions, embeddings, image generation, audio transcription, file management, fine-tuning, and assistants via the OpenAI RES...
将文本转为语音并通过飞书 audio 消息发送给指定用户。用于“给用户发语音”“把这段话转语音并发飞书”“语音播报结果”等场景,尤其当普通文件
Extract audio from Douyin (抖音/TikTok China) videos and transcribe to text using Whisper. Trigger when user sends a Douyin link (v.douyin.com or www.douyin.co...
Use the Phaya SaaS backend to generate images, videos, audio, music, and run LLM chat completions via simple REST API calls. Use when the user wants to gener...
Comprehensive consumer electronics industry sourcing guide for international buyers – provides detailed information about China's smartphone, wearable, audio...
Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.
MCP 2025-11-25 specification compliance audit pack. Validates elicitation, tasks, resources/prompts, audio content, JSON Schema 2020-12, SSE transport, and i...
Transcribe speech from YouTube videos or audio URLs into text using Gladia API with up to 10 free hours of monthly transcription. Use when: you need to summa...
Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-se...
Transforms supplier or CJ source videos into 1080×1920 TikTok/Instagram Reels ads with clean zone detection, Pillow text overlays, CTA card, and trending audio.
Zoom RTMS Meeting Assistant — start on-demand to capture meeting audio, video, transcript, screenshare, and chat via Zoom Real-Time Media Streams. Handles meeting.rtms_started and meeting.rtms_stopp
Multilingual Text-to-Speech (TTS) with intelligent Pinyin-to-Hanzi conversion. Use when the user asks to generate audio for text that contains a mix of Vietn...
Generate talking videos from images using Talking Image API. Create talking videos from audio and images, supporting non-human faces like pets or animated ch...
Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.
Generate AI podcast episodes from PDFs, text, notes, and links using MagicPodcast in OpenClaw. Creates natural two-person dialogue audio, supports custom lan...
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency....
Auto-play TTS voice files with wake word detection. Only plays audio when user message contains wake words like "语音", "念出来", "voice", etc. Perfect for Webcha...
Process video/audio files using FFHub.io cloud FFmpeg API. Use when the user wants to convert, compress, trim, resize, extract audio, generate thumbnails, or...
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...
Local speech-to-text using Vosk. Lightweight, fast, fully offline. Perfect for transcribing Telegram voice messages, audio files, or any speech-to-text task without cloud APIs.
Playful virtual girlfriend voice companion. Use when the user wants short, flirty, friendly text replies returned as Bulbul v3 audio across chat channels (Discord/Telegram/WhatsApp). Generate a brief