Multilingual Text-to-Speech (TTS) with intelligent Pinyin-to-Hanzi conversion. Use when the user asks to generate audio for text that contains a mix of Vietn...
Simple local Speech-To-Text using Whisper. One-command install with auto model download. Supports 99+ languages.
视频自动字幕生成器,批量为视频生成字幕文件(SRT/VTT),结合视频帧提取和语音转文字,预览模式和撤销功能!
Query and manage Timeless meetings, rooms, transcripts, and AI documents. Capture podcast episodes and YouTube videos into Timeless for transcription. Use wh...
Opinionated creative production system for image/video generation, image editing, motion scenes, voiceovers, music, and Remotion assembly. Combines Freepik,...
Local Qwen3-TTS speech synthesis on Apple Silicon via MLX. Use for offline narration, audiobooks, video voiceovers, and multilingual TTS.
Text-to-Speech via Zvukogram API with SSML support. Use when you need to generate speech from text, create podcasts, voice notifications, or work with audio....
Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI...
Convert Bilibili (B站) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UP主 channels. Uses local whisper.cp...
批量处理与长时任务编排模式。涵盖队列管理、并发调度、中断恢复、熔断器、远程任务轮询、进度报告和反风控策略。适用于批量文件处理、AI API 调
Convert text or subtitle files into speech audio with options for voice cloning, emotion control, speed adjustment, and timeline-accurate dubbing using Kokor...
Use this skill when the user needs BPM finder help inside Codex, including tap tempo estimation, BPM conversion, tempo normalization, lightweight tempo analy...
Create AI videos with Sora 2, Veo 3, Seedance, Runway, and modern APIs using reliable prompt and rendering workflows.
Run the video-skill pipeline to convert narrated videos into structured step data and enriched timeline-ready outputs. Use when a user asks to process a vide...
Generate and publish a dual-host daily podcast. Fetches news, generates a conversational script between two hosts, synthesizes audio via Fish Audio or Edge T...
Unified 3D workflow: create models (AI), search (Thingiverse), slice, print. Modular—enable only what you need.
Text-to-speech, sound effects, music generation, voice management, and quota checks via the ElevenLabs API. Use when generating audio with ElevenLabs or mana...
Распознавание речи через Yandex SpeechKit API для голосовых сообщений в Telegram. Используй когда пользователь отправляе
Turn any idea into a finished podcast in one command. AudioMind handles ElevenLabs voice narration (29+ voices), AI background music, and server-side audio m...
Create AI images with GPT Image, Gemini Nano Banana, FLUX, Imagen, and top providers using prompt engineering, style control, and smart editing.
Zustand 状态管理实战模式。涵盖 Store 设计规范、Slice 工厂复用、persist 持久化、可恢复任务持久化、Electron IPC 联动、Store 测试和常见陷阱。适用于 React +
语音笔记转文字工具 v2.1 | Voice Note Transcriber. 支持多语言识别、实时转写、说话人识别、智能摘要、音频降噪、离线识别。触发词:转写、识别、语音。
视频转写工作流,支持B站和YouTube视频。自动判断有字幕/无字幕,有字幕则获取字幕,无字幕则下载音频+whisper转写。触发场景:(1) 用户要求总结视频
Control a GeekMagic holocube display as an AI emote system. Generate holographic sprite kits with Gemini, upload to device, and swap expressions based on agent state (idle, working, error, etc.). Use