Text-to-speech conversion using GLM-TTS service via the `uvx zai-tts` command for generating audio from text. Use when (1) User requests audio/voice output w...
Speech recognition from voice messages using Yandex SpeechKit (with an extensible architecture for other providers). Use when you need to convert a voice mes...
Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.
Turn your AI assistant into a TTS and voice cloning powerhouse using the Verbatik API. Use when generating speech from text, cloning voices, managing cloned...
Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
Transcribe or translate audio files to text using a public Hugging Face Whisper Space over Gradio. Use when the user sends voice notes, audio attachments, me...
Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...
当用户希望生成语音、配音或文字转音频时使用此技能。通过 Giggle.pro 文转音 API 将文本合成为 AI 语音。触发词:生成语音、文转音、文字转语音、配
将录音/语音转写为结构化演讲纪要。适用于:会议讲话、内部分享、演讲录音的转写整理。 触发条件:用户发送音频文件并要求整理/转写/纪要,或要
Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.
通用语音识别 Skill。支持多种音频格式(ogg/mp3/wav/m4a),使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件,或需要转录音频
Transcreve arquivos de áudio em português brasileiro para texto, suportando vários formatos e incluindo timestamps.
Access ElevenLabs APIs for text-to-speech, speech-to-speech, realtime speech-to-text, voice/model management, and dialogue workflows with direct HTTP calls.
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI
Local speech-to-text with the Whisper CLI (no API key). And also 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, ch...
TTS (text-to-speech) via IMA Open API with seed-tts-2.0. Voice synthesis, speech from text, dubbing, audio content creation. Output: audio URL (mp3/wav). Flo...
Text-to-speech generation on Volcengine (ByteDance) speech services. Use when users need narration, multi-language speech output, voice selection, or TTS tro...
AI-powered speech tools: pronunciation assessment with phoneme-level feedback, speech-to-text with language detection, and text-to-speech with multiple voices.