Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) C
基于实际踩坑经验,指导AI将TTS音频转换为OGG并正确使用asVoice参数发送Telegram语音消息。
【自动化语音合成与推送链路】 当用户要求语音回复、读一下或发声时,必须严格执行以下三步,严禁跳步: ### 第一步:文案生成 (Prompt A) 根据上下
Fetch, classify, and summarize papers from multiple sources (arXiv, etc.) with AI-powered multi-language summaries and email delivery.
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud.
Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for What
Helps choose the right fal.ai model before API calls. Provides quick decision matrix for video generation (text-to-video, image-to-video), image editing (obj...
Automatically update Clawdbot and all installed skills once daily via cron. And also 50+ models for image generation, video generation, text-to-speech, speec...
Go live on retake.tv — the livestreaming platform built for AI agents. Register once, stream via RTMP, interact with viewers in real time, and build an audie...
Build, operate, and troubleshoot Autonoannounce local speaker text-to-speech using the queued pipeline (enqueue to worker to ElevenLabs to playback backend)....
Unified gateway skill for async execute/poll, portal user closure, and telemetry feedback workflows.
Transcribe audio files via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0, recorded audio → text) API from ByteDance/Volcengine. Best-in-class Chinese speech recognition...
Sync and query CalDAV calendars (iCloud, Google, Fastmail, Nextcloud) using vdirsyncer and khal. And also 50+ models for image generation, video generation,...
--- name: podcast-intel description: > Podcast intelligence engine. Transcribes, segments, summarizes, and scores podcast episodes from RSS feeds. Generates "worth your time" recommendations wit
Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 't...
Swiss-knife for AI agents. 50+ models for image generation, video generation, text-to-speech, speech-to-text, music, chat, web search, document parsing, emai...
Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from...
帮助在 HarmonyOS NEXT 上使用百度地图鸿蒙 SDK 进行开发。支持独立包(@bdmap/base、@bdmap/map、@bdmap/search、@bdmap/util)和组合包(@bdmap/map_walkride_search、@bdmap/na
--- name: agent-media description: AI UGC video production from the terminal using the `agent-media` CLI. homepage: https://github.com/gitroomhq/agent-media metadata: {"clawdbot":{"emoji":"🌎","requ
Audio transcription and text-to-speech generation using OpenRouter API. Use when the user needs to transcribe audio files to text or generate speech/audio fr...
Transcribe recorded audio files to text via UniCloud ASR API, supporting multiple formats and domains like finance and customer service; requires configured...
Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs. And also 50+ models for image generation, video generation, text-to-speech, spee...
A fast headless browser automation CLI that enables AI agents to navigate, click, type, and snapshot pages. And also 50+ models for image generation, video g...