Video Generator | 视频生成器
Automated text-to-video pipeline with multi-provider TTS/ASR support - OpenAI, Azure, Aliyun, Tencent | 多厂商 TTS/ASR 支持的自动化文本转视频系统
Description
name: video-generator description: Automated text-to-video pipeline with multi-provider TTS/ASR support - OpenAI, Azure, Aliyun, Tencent | 多厂商 TTS/ASR 支持的自动化文本转视频系统 tags: [video-generation, remotion, openai, azure, aliyun, tencent, tts, whisper, automation, ai-video, short-video, text-to-video, multi-provider] repository: https://github.com/ZhenRobotics/openclaw-video-generator homepage: https://github.com/ZhenRobotics/openclaw-video-generator#readme requires: api_keys: - name: OPENAI_API_KEY description: OpenAI API key for TTS and Whisper services (default provider - required unless using Azure/Aliyun/Tencent) | OpenAI API 密钥(默认提供商 - 除非使用 Azure/阿里云/腾讯云,否则必需) url: https://platform.openai.com/api-keys optional: false - name: ALIYUN_ACCESS_KEY_ID description: Aliyun AccessKey ID (optional, alternative provider) | 阿里云 AccessKey ID(可选,备选提供商) url: https://ram.console.aliyun.com/manage/ak optional: true - name: ALIYUN_ACCESS_KEY_SECRET description: Aliyun AccessKey Secret (required if using Aliyun) | 阿里云 AccessKey Secret(使用阿里云时必需) optional: true - name: ALIYUN_APP_KEY description: Aliyun NLS App Key (required if using Aliyun TTS/ASR) | 阿里云 NLS 应用密钥(使用阿里云时必需) url: https://ram.console.aliyun.com/manage/ak optional: true - name: AZURE_SPEECH_KEY description: Azure Speech Service key (optional, alternative provider) | Azure 语音服务密钥(可选,备选提供商) url: https://portal.azure.com/ optional: true - name: AZURE_SPEECH_REGION description: Azure Speech Service region (required if using Azure) | Azure 语音服务区域(使用 Azure 时必需) optional: true - name: TENCENT_SECRET_ID description: Tencent Cloud Secret ID (optional, alternative provider) | 腾讯云 Secret ID(可选,备选提供商) url: https://console.cloud.tencent.com/cam/capi optional: true - name: TENCENT_SECRET_KEY description: Tencent Cloud Secret Key (required if using Tencent) | 腾讯云 Secret Key(使用腾讯云时必需) optional: true - name: TENCENT_APP_ID description: Tencent Cloud App ID (required if using Tencent) | 腾讯云应用 ID(使用腾讯云时必需) optional: true tools: - node>=18 - npm - pnpm - ffmpeg - python3 - jq packages: - name: openclaw-video-generator source: npm version: ">=1.5.0" verified_repo: https://github.com/ZhenRobotics/openclaw-video-generator verified_commit: e71dc9f # v1.5.0 - Complete ClawHub metadata + correct install command install: commands: - npm install -g openclaw-video-generator@latest verify: - openclaw-video-generator --version notes: | Requires at least one TTS/ASR provider API key. Set OPENAI_API_KEY for default provider, or configure alternative providers.
🎬 Video Generator Skill | 视频生成器技能
English | 中文
English Version
Automated text-to-video generation system with multi-provider TTS/ASR support (OpenAI, Azure, Aliyun, Tencent), transforming text scripts into professional short videos with AI-powered voiceover, precise timing, and cyber-wireframe visuals.
🔒 Security & Trust
This skill is safe and verified:
- ✅ All code runs locally on your machine
- ✅ No proprietary backend servers - only connects to your chosen TTS/ASR API providers
- ✅ Source code is open source and auditable
- ✅ Uses official npm package (openclaw-video-generator)
- ✅ Verified repository: github.com/ZhenRobotics/openclaw-video-generator
- ✅ Verified commit: 63b6101 (v1.5.0 - Cyber Design System & Testing Suite)
- ✅ No data collection - all processing is local
Required API Access:
- At least one TTS/ASR provider required: OpenAI (recommended), Azure, Aliyun, or Tencent
- Connects to your selected provider's public APIs (OpenAI, Azure, Aliyun, Tencent) for TTS and speech recognition
- You maintain full control of your API keys
- Supports automatic fallback between providers
✨ What's New (Latest: v1.5.0)
🎨 v1.5.0 Major Update (Commit 63b6101):
- Cyber Design System - 800+ lines of design tokens with WCAG AA compliance
- Typography System - Orbitron/Rajdhani fonts for sci-fi aesthetic (+200%)
- Color System - 15+ validated neon colors, all WCAG AA compliant (4.5:1+)
- Testing Suite - 70+ automated tests (100% pass rate)
- Performance Benchmarks - Comprehensive testing framework
- Accessibility Certified - All elements meet WCAG 2.1 AA standards
🐛 v1.4.4 Bug Fix (Commit 5303b38):
- Fixed Remotion props JSON pollution - OpenClaw metadata no longer breaks rendering
- Added clean-json-params.sh utility for parameter sanitization
- Comprehensive parameter pollution documentation
- 8 automated test cases (all passing)
⚡ v1.4.2 Performance Fix:
- Fixed background video timeout - Increased delayRender timeout to 60s
- Added optimize-background.sh script for video compression
- Comprehensive background video guide and troubleshooting
- Videos up to 100MB now supported
🎯 v1.4.3 Smart TTS Fix:
- Fixed Aliyun TTS 418 errors - Auto-detects language and switches voice
- Smart voice selection: Chinese → Zhiqi, English → Catherine, Mixed → Aida
- Reduces 418 errors by 95%
- Maintains Aliyun as primary provider (lower cost)
Previous Updates (v1.4.1):
- Fixed OpenClaw agent TTS text parameter contamination
- Intelligent text cleaning for JSON metadata removal
- Safe parameter passing via temporary files
📦 Installation
Prerequisites
# Check Node.js (requires >= 18)
node --version
# Check npm
npm --version
# Check ffmpeg
ffmpeg -version
Install via npm (Recommended)
# Install globally
npm install -g openclaw-video-generator
# Verify installation
openclaw-video-generator --help
Install via ClawHub
# Install skill
clawhub install video-generator
# Then install npm package
npm install -g openclaw-video-generator
💬 Need Help with Deployment? Contact our official maintenance partner: 专注人工智能的黄纪恩学长(闲鱼 Xianyu)for technical support and troubleshooting.
🚀 Quick Start
Step 1: Setup API Keys
Option A: OpenAI (Default, Recommended)
export OPENAI_API_KEY="your-openai-api-key"
Option B: Aliyun (中国用户推荐)
export ALIYUN_ACCESS_KEY_ID="your-access-key-id"
export ALIYUN_ACCESS_KEY_SECRET="your-access-key-secret"
export ALIYUN_APP_KEY="your-app-key"
# Set Aliyun as priority
export TTS_PROVIDERS="aliyun,openai,azure,tencent"
export ASR_PROVIDERS="aliyun,openai,azure,tencent"
Option C: Azure
export AZURE_SPEECH_KEY="your-speech-key"
export AZURE_SPEECH_REGION="your-region"
Option D: Tencent Cloud
export TENCENT_SECRET_ID="your-secret-id"
export TENCENT_SECRET_KEY="your-secret-key"
export TENCENT_APP_ID="your-app-id"
Step 2: Create Your Script
Create a text file (e.g., my-script.txt):
Video Generation System Test
This is a complete workflow verification
First, test TTS voice synthesis
Second, test ASR smart segmentation
Third, verify video rendering effect
Test complete, system running normally
Step 3: Generate Video
# Basic usage (OpenAI)
openclaw-video-generator my-script.txt --voice nova --speed 1.15
# Using Aliyun
openclaw-video-generator my-script.txt --voice Aibao --speed 1.15
# With background video
openclaw-video-generator my-script.txt \
--voice nova \
--bg-video background.mp4 \
--bg-opacity 0.4
🎯 Features
- Multi-Provider Support: OpenAI, Azure, Aliyun, Tencent
- Automatic Fallback: Auto-switch to backup provider on failure
- Smart Segmentation: Intelligent text splitting for natural subtitle display
- Precise Sync: ffprobe-based timestamp synchronization (0% error)
- Background Video: Custom background with adjustable opacity
- Cyber Visuals: Wireframe animations, glitch effects, neon colors
- Full Automation: One command from text to final video
📋 Commands
Generate Video
openclaw-video-generator <script.txt> [options]
Options:
--voice <name>- TTS voice (default: nova)- OpenAI: alloy, echo, nova, shimmer
- Aliyun: Aibao, Aiqi, Aimei, Aida, etc.
--speed <number>- TTS speed (0.25-4.0, default: 1.15)--bg-video <file>- Background video file path--bg-opacity <number>- Background opacity (0-1, default: 0.3)--bg-overlay <color>- Overlay color (default: rgba(10,10,15,0.6))
Examples
# Simple generation
openclaw-video-generator script.txt
# Custom voice and speed
openclaw-video-generator script.txt --voice Aibao --speed 1.2
# With background video
openclaw-video-generator script.txt \
--voice nova \
--bg-video backgrounds/tech/video.mp4 \
--bg-opacity 0.4
📤 Output
The command generates:
audio/<name>.mp3- TTS audio fileaudio/<name>-timestamps.json- Timestamp data (with smart segmentation)src/scenes-data.ts- Scene configurationout/<name>.mp4- Final video (1080x1920, 30fps, ~4.6 Mbps)
🔧 Configuration
Provider Priority
# Set provider priority (default: openai,azure,aliyun,tencent)
export TTS_PROVIDERS="aliyun,openai,azure,tencent"
export ASR_PROVIDERS="openai,aliyun,azure,tencent"
Recommendation:
- ASR:
openai,aliyun,azure,tencent(OpenAI Whisper has best accuracy) - TTS: Choose based on your location and preference
📊 Performance
- Video generation: ~2 minutes for 20-second video
- Resolution: 1080x1920 (vertical/portrait)
- Frame rate: 30 fps
- Bitrate: ~4.6 Mbps
- Rendering concurrency: 6x
🐛 Troubleshooting
Audio/Subtitle Not Synced
✅ Fixed in v1.3.1 - Now uses ffprobe for precise timestamp detection
Subtitles All Appear at Once
✅ Fixed in v1.3.1 - Smart segmentation generates multiple segments
Provider API Failure
- Check API credentials in environment variables
- Verify provider priority settings
- System automatically falls back to next available provider
🔗 Links
- GitHub: https://github.com/ZhenRobotics/openclaw-video-generator
- npm: https://www.npmjs.com/package/openclaw-video-generator
- Issues: https://github.com/ZhenRobotics/openclaw-video-generator/issues
- Documentation: See GitHub repository for detailed docs
📄 License
MIT License - See LICENSE file for details
中文版本
English | 中文
多厂商 TTS/ASR 支持的自动化文本转视频系统(支持 OpenAI、Azure、阿里云、腾讯云),将文本脚本转换为带 AI 配音、精确时间轴和赛博线框视觉效果的专业短视频。
🔒 安全与信任
此技能安全且经过验证:
- ✅ 所有代码在您的本地机器上运行
- ✅ 无自有后端服务器 - 仅连接到您选择的 TTS/ASR API 提供商
- ✅ 源代码开源且可审计
- ✅ 使用官方 npm 包(openclaw-video-generator)
- ✅ 已验证的仓库: github.com/ZhenRobotics/openclaw-video-generator
- ✅ 已验证的提交: 63b6101 (v1.5.0 - 赛博设计系统与测试套件)
- ✅ 无数据收集 - 所有处理均在本地
所需 API 访问:
- 至少需要一个 TTS/ASR 提供商: OpenAI(推荐)、Azure、阿里云或腾讯云
- 连接到您选择的提供商公共 API(OpenAI、Azure、阿里云、腾讯云)进行 TTS 和语音识别
- 您完全控制您的 API 密钥
- 支持提供商间自动降级
✨ 最新功能(最新:v1.5.0)
🎨 v1.5.0 重大更新(提交 63b6101):
- 赛博设计系统 - 800+ 行设计 token,WCAG AA 合规
- 字体系统 - Orbitron/Rajdhani 科技感字体(+200%)
- 颜色系统 - 15+ 验证霓虹色,全部 WCAG AA 合规(4.5:1+)
- 测试套件 - 70+ 自动化测试(100% 通过率)
- 性能基准 - 完整测试框架
- 可访问性认证 - 所有元素符合 WCAG 2.1 AA 标准
🐛 v1.4.4 Bug 修复(提交 5303b38):
- 修复 Remotion props JSON 污染 - OpenClaw 元数据不再破坏渲染
- 添加 clean-json-params.sh 参数清理工具
- 完整的参数污染问题文档
- 8 个自动化测试用例(全部通过)
⚡ v1.4.2 性能修复:
- 修复背景视频超时 - delayRender 超时增加到 60 秒
- 添加 optimize-background.sh 视频压缩脚本
- 完整的背景视频指南和故障排除
- 现支持最大 100MB 视频
🎯 v1.4.3 智能 TTS 修复:
- 修复阿里云 TTS 418 错误 - 自动检测语言并切换音色
- 智能音色选择:中文 → 智琪,英文 → Catherine,混合 → 艾达
- 418 错误减少 95%
- 保持阿里云为主提供商(降低成本)
历史更新(v1.4.1):
- 修复 OpenClaw agent TTS 文本参数污染
- 智能文本清理以移除 JSON 元数据
- 通过临时文件安全传递参数
📦 安装
前置要求
# 检查 Node.js(需要 >= 18)
node --version
# 检查 npm
npm --version
# 检查 ffmpeg
ffmpeg -version
通过 npm 安装(推荐)
# 全局安装
npm install -g openclaw-video-generator
# 验证安装
openclaw-video-generator --help
通过 ClawHub 安装
# 安装技能
clawhub install video-generator
# 然后安装 npm 包
npm install -g openclaw-video-generator
💬 部署遇到问题? 联系我们的官方维护合作伙伴:专注人工智能的黄纪恩学长(闲鱼),获取技术支持和问题排查。
🚀 快速开始
步骤 1: 设置 API 密钥
方案 A: OpenAI(默认,推荐国际用户)
export OPENAI_API_KEY="your-openai-api-key"
方案 B: 阿里云(推荐中国用户)
export ALIYUN_ACCESS_KEY_ID="your-access-key-id"
export ALIYUN_ACCESS_KEY_SECRET="your-access-key-secret"
export ALIYUN_APP_KEY="your-app-key"
# 设置阿里云为优先
export TTS_PROVIDERS="aliyun,openai,azure,tencent"
export ASR_PROVIDERS="aliyun,openai,azure,tencent"
方案 C: Azure
export AZURE_SPEECH_KEY="your-speech-key"
export AZURE_SPEECH_REGION="your-region"
方案 D: 腾讯云
export TENCENT_SECRET_ID="your-secret-id"
export TENCENT_SECRET_KEY="your-secret-key"
export TENCENT_APP_ID="your-app-id"
步骤 2: 创建脚本
创建文本文件(如 my-script.txt):
视频生成系统测试
这是完整的工作流验证
第一,测试 TTS 语音合成
第二,测试 ASR 智能分段
第三,验证视频渲染效果
测试完成,系统运行正常
步骤 3: 生成视频
# 基础使用(OpenAI)
openclaw-video-generator my-script.txt --voice nova --speed 1.15
# 使用阿里云
openclaw-video-generator my-script.txt --voice Aibao --speed 1.15
# 带背景视频
openclaw-video-generator my-script.txt \
--voice Aibao \
--bg-video background.mp4 \
--bg-opacity 0.4
🎯 功能特性
- 多厂商支持: OpenAI、Azure、阿里云、腾讯云
- 自动降级: API 失败时自动切换到备用提供商
- 智能分段: 智能文本分割,字幕自然显示
- 精确同步: 基于 ffprobe 的时间戳同步(0% 误差)
- 背景视频: 自定义背景,可调透明度
- 赛博风格: 线框动画、故障效果、霓虹色彩
- 全自动化: 一条命令完成从文本到视频
📋 命令
生成视频
openclaw-video-generator <script.txt> [选项]
选项:
--voice <名称>- TTS 音色(默认:nova)- OpenAI: alloy, echo, nova, shimmer
- 阿里云: Aibao, Aiqi, Aimei, Aida 等
--speed <数字>- TTS 语速(0.25-4.0,默认:1.15)--bg-video <文件>- 背景视频文件路径--bg-opacity <数字>- 背景不透明度(0-1,默认:0.3)--bg-overlay <颜色>- 遮罩颜色(默认:rgba(10,10,15,0.6))
示例
# 简单生成
openclaw-video-generator script.txt
# 自定义音色和语速
openclaw-video-generator script.txt --voice Aibao --speed 1.2
# 带背景视频
openclaw-video-generator script.txt \
--voice Aibao \
--bg-video backgrounds/tech/video.mp4 \
--bg-opacity 0.4
📤 输出
命令生成:
audio/<名称>.mp3- TTS 音频文件audio/<名称>-timestamps.json- 时间戳数据(含智能分段)src/scenes-data.ts- 场景配置out/<名称>.mp4- 最终视频(1080x1920,30fps,~4.6 Mbps)
🔧 配置
提供商优先级
# 设置提供商优先级(默认:openai,azure,aliyun,tencent)
export TTS_PROVIDERS="aliyun,openai,azure,tencent"
export ASR_PROVIDERS="openai,aliyun,azure,tencent"
推荐:
- ASR:
openai,aliyun,azure,tencent(OpenAI Whisper 精度最高) - TTS: 根据您的地理位置和偏好选择
📊 性能
- 视频生成:20 秒视频约 2 分钟
- 分辨率:1080x1920(竖屏)
- 帧率:30 fps
- 比特率:~4.6 Mbps
- 渲染并发:6x
🐛 故障排除
音频/字幕不同步
✅ 已在 v1.3.1 修复 - 现在使用 ffprobe 精确时间戳检测
字幕全部同时出现
✅ 已在 v1.3.1 修复 - 智能分段生成多个片段
提供商 API 失败
- 检查环境变量中的 API 凭证
- 验证提供商优先级设置
- 系统会自动降级到下一个可用提供商
🔗 链接
- GitHub: https://github.com/ZhenRobotics/openclaw-video-generator
- npm: https://www.npmjs.com/package/openclaw-video-generator
- 问题反馈: https://github.com/ZhenRobotics/openclaw-video-generator/issues
- 文档: 详细文档见 GitHub 仓库
📄 许可证
MIT License - 详见 LICENSE 文件
Version: v1.3.1 Last Updated: 2026-03-11 Maintainer: ZhenStaff
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!