--- name: Multimodal Analyzer slug: multimodal version: 1.0.0 description: 使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档) --- # Multimodal Understanding Skill 使用
Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...
Generate AI-optimized Alt Text, file names, captions, and Schema markup for images, videos, and audio assets. Improves AI discoverability on Google Lens, Cha...
通过萤石设备抓图与智能体分析接口,实现对摄像头画面的多模态AI理解与场景识别分析。
Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing
使用接口AI 执行多模态任务:文生图、图生图、文生视频、图生视频、TTS、STT。 适用于:生成图片、生成视频、文字转语音、语音识别。
使用 PPIO 执行多模态任务:文生图、图生图、文生视频、图生视频、TTS、STT。 适用于:生成图片、生成视频、文字转语音、语音识别。
使用GLM-4.6V模型进行多模态内容理解(图片、视频、文档)
Verify model identity by testing 4 dimensions: knowledge cutoff, safety style, multimodal capability, and thinking language patterns. Use when user says 'ver...
Parse chemical reaction images into machine-readable data (reactants, products, conditions) using the RxnIM multimodal LLM. Supports web API (Hugging Face Sp...
Comprehensive interface for the SkedGo TripGo API, covering routing, public transport, trips, and location services. Use for multimodal journey planning, pub...
Comprehensive interface for the TripGo API, covering routing, public transport, trips, and location services. Use for multimodal journey planning, public tra...
Intelligently analyzes user-provided multimodal assets and creative intent to generate optimal, structured video generation prompts for the Seedance 2.0 model.
The ultimate Seedance 2.0 storyboard director. Generate movie-grade 9:16 vlogs, cinematic prompts, and auto-audio scripts from multimodal inputs. Optimized f...
Academic paper summarization with dynamic SOP selection based on paper topic classification. Supports method, dataset, multimodal, and other paper types with...
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
Expert Cinema Director skill for Seedance 2.0 (ByteDance) — high-fidelity video generation using technical camera grammar and multimodal references. Supports...
Generate images with Alibaba Cloud Model Studio Z-Image Turbo (z-image-turbo) via DashScope multimodal-generation API. Use when creating text-to-image output...
Visual AI workflow builder - ComfyUI meets n8n for LLM agents, RAG pipelines, and multimodal data flows. Local-first, open source (AGPL-3.0).
Generates images and videos using MuleRouter or MuleRun multimodal APIs. Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, video editing (VACE, keyframe interpolation). Use when the user w
Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....