Search

139 results for "multimodal"

All 🧪 Skills 🔌 MCP Servers 📏 Rules 💬 Prompts

GLM Multimodal Analyzer

--- name: Multimodal Analyzer slug: multimodal version: 1.0.0 description: 使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档） --- # Multimodal Understanding Skill 使用

❤️ 0 ⬇️ 177

🧪 Skill

Novita AI Multimodal

Free

Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...

❤️ 0 ⬇️ 33

🧪 Skill

Novita AI Multimodal

Free

Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...

❤️ 0 ⬇️ 4

🧪 Skill

Multimodal Asset Tagger

Free

Generate AI-optimized Alt Text, file names, captions, and Schema markup for images, videos, and audio assets. Improves AI discoverability on Google Lens, Cha...

❤️ 0 ⬇️ 159

🧪 Skill

Multimodal Asset Tagger

Free

Generate AI-optimized Alt Text, file names, captions, and Schema markup for images, videos, and audio assets. Improves AI discoverability on Google Lens, Cha...

❤️ 0 ⬇️ 174

🧪 Skill

Ezviz Open Multimodal Analysis

Free

通过萤石设备抓图与智能体分析接口，实现对摄像头画面的多模态AI理解与场景识别分析。

❤️ 1 ⬇️ 19

🧪 Skill

multimodal-parser

Free

Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing

❤️ 0 ⬇️ 98

🧪 Skill

Jiekou Multimodal

Free

使用接口AI 执行多模态任务：文生图、图生图、文生视频、图生视频、TTS、STT。适用于：生成图片、生成视频、文字转语音、语音识别。

❤️ 0 ⬇️ 52

🧪 Skill

PPIO Multimodal Skill

Free

使用 PPIO 执行多模态任务：文生图、图生图、文生视频、图生视频、TTS、STT。适用于：生成图片、生成视频、文字转语音、语音识别。

❤️ 0 ⬇️ 37

🧪 Skill

GLM Multimodal Analyzer

Free

使用GLM-4.6V模型进行多模态内容理解（图片、视频、文档）

❤️ 0 ⬇️ 155

🧪 Skill

PPIO Multimodal Skill

Free

使用 PPIO 执行多模态任务：文生图、图生图、文生视频、图生视频、TTS、STT。适用于：生成图片、生成视频、文字转语音、语音识别。

❤️ 0 ⬇️ 7

🧪 Skill

Model Verifier

Free

Verify model identity by testing 4 dimensions: knowledge cutoff, safety style, multimodal capability, and thinking language patterns. Use when user says 'ver...

❤️ 1 ⬇️ 92

🧪 Skill

RxnIM

Free

Parse chemical reaction images into machine-readable data (reactants, products, conditions) using the RxnIM multimodal LLM. Supports web API (Hugging Face Sp...

❤️ 0 ⬇️ 64

🧪 Skill

SkedGo TripGo API

Free

Comprehensive interface for the SkedGo TripGo API, covering routing, public transport, trips, and location services. Use for multimodal journey planning, pub...

❤️ 0 ⬇️ 166

🧪 Skill

TripGo API

Free

Comprehensive interface for the TripGo API, covering routing, public transport, trips, and location services. Use for multimodal journey planning, public tra...

❤️ 0 ⬇️ 133

🧪 Skill

Seedance Prompt Designer

Free

Intelligently analyzes user-provided multimodal assets and creative intent to generate optimal, structured video generation prompts for the Seedance 2.0 model.

❤️ 1 ⬇️ 156

🧪 Skill

seedance2.0-guide

Free

The ultimate Seedance 2.0 storyboard director. Generate movie-grade 9:16 vlogs, cinematic prompts, and auto-audio scripts from multimodal inputs. Optimized f...

❤️ 3 ⬇️ 709

🧪 Skill

Academic Paper Summarizer

Free

Academic paper summarization with dynamic SOP selection based on paper topic classification. Supports method, dataset, multimodal, and other paper types with...

❤️ 0 ⬇️ 713

🧪 Skill

Google Gemini Media

Free

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand

❤️ 5 ⬇️ 3.0k

🧪 Skill

muapi-seedance-2

Free

Expert Cinema Director skill for Seedance 2.0 (ByteDance) — high-fidelity video generation using technical camera grammar and multimodal references. Supports...

❤️ 0 ⬇️ 12

🧪 Skill

Alicloud Ai Image Zimage Turbo

Free

Generate images with Alibaba Cloud Model Studio Z-Image Turbo (z-image-turbo) via DashScope multimodal-generation API. Use when creating text-to-image output...

❤️ 0 ⬇️ 863

🧪 Skill

Nodetool

Free

Visual AI workflow builder - ComfyUI meets n8n for LLM agents, RAG pipelines, and multimodal data flows. Local-first, open source (AGPL-3.0).

❤️ 0 ⬇️ 2.4k

🧪 Skill

Mulerouter

Free

Generates images and videos using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, video editing (VACE, keyframe interpolation). Use when the user w

❤️ 2 ⬇️ 887

🧪 Skill

universal-pdf-vision-parser

Free

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....

❤️ 0 ⬇️ 197