🧪 Skills

Novita AI Multimodal

Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating vide...

v0.2.0
❤️ 0
⬇️ 33
👁 1
Share

Description


name: novita-multimodal description: | Execute multimodal tasks using Novita AI: text-to-image, image-to-image, text-to-video, image-to-video, TTS, STT. Use for: generating images, generating videos, text-to-speech, speech recognition.

Novita AI Multimodal Execution

Configuration (choose one, by priority)

Method 1: Config File (Recommended)

Create file ~/.novita/config.json:

{
  "api_key": "YOUR_API_KEY"
}

One command setup:

mkdir -p ~/.novita && echo '{"api_key": "YOUR_API_KEY"}' > ~/.novita/config.json

Method 2: Environment Variable

export NOVITA_API_KEY="YOUR_API_KEY"

Method 3: Direct Parameter

Include in request: Please use API Key sk_xxx to generate an image...


API Key Reading Logic

1. Check if user message contains API Key (starts with sk_)
2. Check config file ~/.novita/config.json
3. Check environment variable NOVITA_API_KEY
4. None found → Return configuration guide

Configuration guide (only shown when not configured):

You have not configured your Novita AI API Key.

Quick setup (copy and run):
mkdir -p ~/.novita && echo '{"api_key": "YOUR_KEY"}' > ~/.novita/config.json

Get Key: https://novita.ai/settings/key-management

Execution Flow (Important!)

User request → Identify task → Get Key → ⚠️ Send prompt first → Execute task → Return result

⚠️ Must Send Progress Prompt First

Before calling the API, you must reply to the user with a message:

🎨 Got it! Generating your image...

Task type: Text-to-Image
Model: Seedream 5.0 Lite
Estimated time: 5-15 seconds
Estimated cost: ~$0.035

Please wait, will send as soon as it's ready ⏳

This message must be sent BEFORE executing the API call! This way users know the task is being processed and won't think the system is stuck.

Progress Templates for Different Tasks

Text-to-Image:

🎨 Got it! Generating your image...
Model: Seedream 5.0 Lite
Estimated time: 5-15 seconds

Text-to-Video:

🎬 Got it! Generating your video...
Model: Vidu Q3 Pro
Estimated time: 1-3 minutes (video generation is slower, please be patient)

TTS:

🔊 Got it! Generating your audio...
Model: MiniMax Speech 2.8 Turbo
Estimated time: 5-15 seconds

Completion Response

✅ Generation complete!

[Image/Video/Audio URL]

Actual cost: $0.035

Video Task Polling Updates

Video generation requires polling, update status every 15 seconds:

🎬 Video generating...
Current status: Processing
Elapsed: 30 seconds
Estimated remaining: 1-2 minutes

API Configuration

Setting Value
Base URL https://api.novita.ai
Auth Authorization: Bearer <API_KEY>
Get Key https://novita.ai/settings/key-management

Task Types and Endpoints

Task Endpoint Model
Text-to-Image /v3/seedream-5.0-lite Seedream 5.0 Lite
Image Editing /v3/seedream-5.0-lite Seedream 5.0 Lite
Text-to-Video /v3/async/vidu-q3-pro-t2v Vidu Q3 Pro
Image-to-Video /v3/async/vidu-q3-pro-i2v Vidu Q3 Pro
TTS /v3/async/minimax-speech-2.8-turbo MiniMax Speech 2.8
STT /v3/glm-asr GLM ASR
Task Query /v3/async/task-result?task_id=xxx -

Execution Templates

Text-to-Image

curl -X POST "https://api.novita.ai/v3/seedream-5.0-lite" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "description"}'

Image Editing

curl -X POST "https://api.novita.ai/v3/seedream-5.0-lite" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "edit instruction", "reference_images": ["image_url"]}'

Text-to-Video

curl -X POST "https://api.novita.ai/v3/async/vidu-q3-pro-t2v" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "description", "duration": 4}'

Image-to-Video

curl -X POST "https://api.novita.ai/v3/async/vidu-q3-pro-i2v" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "motion description", "images": ["image_url"]}'

TTS

curl -X POST "https://api.novita.ai/v3/async/minimax-speech-2.8-turbo" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "text to convert",
    "voice_setting": {"voice_id": "male-qn-qingse", "speed": 1.0},
    "audio_setting": {"format": "mp3"}
  }'

Available voices:

  • Male: male-qn-qingse, male-qn-jingying
  • Female: female-shaonv, female-yujie

STT

curl -X POST "https://api.novita.ai/v3/glm-asr" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"file": "audio_url_or_base64"}'

Task Result Query

curl "https://api.novita.ai/v3/async/task-result?task_id=$TASK_ID" \
  -H "Authorization: Bearer $API_KEY"

Status: TASK_STATUS_QUEUEDTASK_STATUS_PROCESSINGTASK_STATUS_SUCCEED


Error Handling

Code Meaning Action
401 Invalid Key Check configuration
402 Insufficient balance Top up at https://novita.ai/billing
429 Rate limited Wait and retry

Pricing

https://novita.ai/pricing

Reviews (0)

Sign in to write a review.

No reviews yet. Be the first to review!

Comments (0)

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Compatible Platforms

Pricing

Free

Related Configs