Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice mes...
Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Text-to-speech conversion using GLM-TTS service via the `uvx zai-tts` command for generating audio from text. Use when (1) User requests audio/voice output w...
Command-line tool for fast, accurate speech-to-text transcription from local files, URLs, or live audio using Deepgram’s API with customizable options.
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...
当用户希望生成语音、配音或文字转音频时使用此技能。通过 Giggle.pro 文转音 API 将文本合成为 AI 语音。触发词:生成语音、文转音、文字转语音、配
将录音/语音转写为结构化演讲纪要。适用于:会议讲话、内部分享、演讲录音的转写整理。 触发条件:用户发送音频文件并要求整理/转写/纪要,或要
Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.
通用语音识别 Skill。支持多种音频格式(ogg/mp3/wav/m4a),使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件,或需要转录音频
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
Transcreve arquivos de áudio em português brasileiro para texto, suportando vários formatos e incluindo timestamps.
Access ElevenLabs APIs for text-to-speech, speech-to-speech, realtime speech-to-text, voice/model management, and dialogue workflows with direct HTTP calls.
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI
Local speech-to-text with the Whisper CLI (no API key). And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, ch...
TTS (text-to-speech) via IMA Open API with seed-tts-2.0. Voice synthesis, speech from text, dubbing, audio content creation. Output: audio URL (mp3/wav). Flo...
Text-to-speech generation on Volcengine (ByteDance) speech services. Use when users need narration, multi-language speech output, voice selection, or TTS tro...
Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the
AI task hub for image analysis, background removal, speech-to-text, text-to-speech, markdown conversion, and async execute/poll/presentation orchestration. U...