Extract text from PDFs using Google Gemini OCR. Use when extracting text from PDFs, performing OCR on scanned documents, or processing image-based PDFs.
Extracts text, tables, and images from PDFs (including scanned PDFs) using the Mistral OCR API. Use when user asks to OCR a PDF/image, extract text from a PD...
Feishu/Lark Custom Bot API wrapper for sending messages to Feishu channels via webhook. Use when users need to send text messages, images, rich text posts, i...
FREE voice recognition using Groq's complimentary Whisper API. Transcribe audio messages to text in 50+ languages at no cost. Perfect for voice-to-text autom...
xAI Grok Imagine API integration for image generation, text-to-video, image-to-video, and editing via natural language. Use when you need to generate images or videos from text prompts, edit existing
Generate QR codes from text, URLs, or images. Use when users ask to 'generate QR code', 'create QR', or 'make QR code for'. Supports text content, URLs, and...
Generate AI videos and images using Alibaba's Wan 2.6 and Wan 2.5 — featuring text-to-video, image-to-video, video-to-video, text-to-image, and image editing...
Act as an Emotion Analyst. You are an expert in analyzing human emotions from text input. Your task is to identify underlying emotional tones and provide insights. You will: - Analyze text for emotion
Full local AI inference stack on Apple Silicon Macs via MLX. Includes: LLM chat (Qwen3-14B, Gemma3-12B), speech-to-text ASR (Qwen3-ASR, Whisper), text embedd...
Jarvis TTS text-to-speech using Microsoft edge-tts with afplay playback. Use when users request voice output, audio responses, or text-to-speech. Provides na...
Share markdown files and text as clean, readable web links via `plsreadme_share_file` and `plsreadme_share_text`; zero setup (`npx -y plsreadme-mcp` or `https://plsreadme.com/mcp`).
Generate QR codes for text, URLs, WiFi connections, and business cards (vCard). Use when: (1) creating QR codes for websites or text, (2) generating WiFi con...
Use Sarvam AI for Indian language Text-to-Speech (TTS), Speech-to-Text (STT), Translation, and Chat.
Visualize chess positions instantly from FEN. Copy readable text-based boards into chats, docs, and code. Speed up analysis and explanations without leaving your text workflow.
Generate images from text prompts using ZenMux API (Vertex AI protocol with Gemini models). Use when: (1) User wants to generate/create images from text desc...
Convert Chinese Mandarin text to high-quality audio using UniSound's WebSocket TTS API with adjustable voice, speed, volume, pitch, and format options.
Cross-platform mouse/keyboard automation skill. Supports mouse control (move/click/drag/scroll), keyboard control (key press/hotkeys/type text), screen opera...
AI image generation with SeeDream 4.5, Midjourney, Nano Banana 2, Nano Banana Pro. Text-to-image, image-to-image with intelligent model selection and knowled...
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
Automate TikTok slideshow marketing for any app or product. Researches competitors, generates AI images, adds text overlays, posts via Postiz, tracks analyti...
AI-powered translation services using TranslateFlow API - translation, translate text, language conversion, multilingual translation, language translation, d...
Creates TikTok image carousels (slideshows with text overlays on photos) via the ViralBaby API. Use when the user wants to: create TikTok slideshows or carou...