code2animation
Produce complete code-based animated videos by scripting, generating narration, creating visual assets, and rendering final MP4s using the code2animation fra...
Description
AgentSaaS Video Editor Skill
A comprehensive video editing and rendering skill that enables AI agents to create code-driven animations with text-to-speech narration and smooth transitions.
Purpose
This skill allows agents to:
- Create and preview interactive video projects with animations
- Generate TTS audio narration using Microsoft Edge TTS
- Render complete videos with synchronized audio and visual effects
- Support both portrait and landscape video formats
- Apply smooth transition effects between media elements
Core Capabilities
1. Interactive Video Preview
- Real-time preview of video projects in the browser
- Playback controls for testing and debugging
- Support for transitions, media clips, and timing adjustments
- Frame-by-frame seeking for precise editing
- Live transition preview with easing effects
2. Transition System
- transitionIn: Each media defines its own entrance animation
- Supported transitions: fade, zoom, slide2Left, slideUp, none
- Easing: Built-in easeOutCubic for smooth slide and zoom animations
- stayInClip: Media can persist throughout entire clip duration
- Cross-clip transitions: Automatic handling of clip boundaries
3. TTS Audio Generation
- Automated text-to-speech using Microsoft Edge TTS (msedge-tts)
- Support for multiple voices (English and Chinese)
- Word-level timing metadata for lip-sync and animations
- Audio file caching for faster previews
4. Video Rendering
- Automated frame-by-frame rendering using Puppeteer
- FFmpeg integration for video encoding and audio mixing
- 30 FPS output at 1920x1080 (landscape) or 1080x1920 (portrait)
- Deterministic rendering for consistent results
- Transition effects preserved in final output
Project Configuration Format
Media Item Properties
- src: HTML filename in the footage directory
- words: Trigger phrase from speech that activates this media
- transitionIn: Entrance animation type (optional)
- transitionDuration: Duration in seconds (optional, default: 0.6s)
- stayInClip: If true, media remains visible until clip ends (optional)
Transition Types
- fade: Opacity transition (0 → 1)
- zoom: Scale transition (2x → 1x) with opacity
- slide2Left: Horizontal slide from right (100% → 0%)
- slideUp: Vertical slide from bottom (100% → 0%)
- none: No transition effect
Transition Behavior
- transitionIn: Defines how this media enters the scene
- transitionDuration: Duration in seconds (default: 0.6s)
- stayInClip: If true, media remains visible until clip ends
- Easing: slide2Left and slideUp use easeOutCubic for smooth deceleration
Reference Implementation
See public/projects/agentSaasPromoVideo.json for a complete working example demonstrating all transition types and stayInClip behavior.
Technical Requirements
System Dependencies
- Node.js: 18 or higher
- FFmpeg: Required for video encoding and audio mixing
- Chromium/Chrome: Used by Puppeteer for headless rendering
Node.js Dependencies
- React & Vite: Frontend framework and build tool
- Puppeteer: Headless browser for frame capture
- msedge-tts: Microsoft Edge TTS for audio generation
- Express: Optional HTTP server (for TTS API endpoint)
- Motion (Framer Motion): Animation library
- Tailwind CSS: Styling framework
Shell Commands Used
This skill executes the following shell commands:
Audio Generation
npx tsx scripts/generate-audio.ts <projectId>
- Reads project JSON configuration
- Generates MP3 audio files using Edge TTS
- Saves word-level timing metadata
Video Rendering
node scripts/render.js <projectId> [--portrait]
- Starts a local Vite dev server
- Launches Puppeteer to capture frames
- Uses FFmpeg to encode video and mix audio
- Cleans up temporary files
FFmpeg Operations
- Frame encoding:
ffmpeg -framerate 30 -i frames/frame-%05d.jpg -c:v libx264 ... - Audio mixing:
ffmpeg -i video.mp4 -i audio1.mp3 -i audio2.mp3 -filter_complex ...
Browser Detection
- Uses
whichcommand to find Chrome/Chromium on Linux/macOS - Respects
PUPPETEER_EXECUTABLE_PATHenvironment variable
API Endpoints (Optional)
The skill may expose an HTTP endpoint for TTS generation:
POST /api/tts
Content-Type: application/json
{
"text": "Text to speak",
"voice": "en-US-GuyNeural",
"rate": "+0%",
"pitch": "+0Hz"
}
This endpoint is optional and only used when pre-generated audio files are not available.
Security Considerations
File System Access
- Reads from:
public/projects/<projectId>/ - Writes to:
public/projects/<projectId>/audio/,public/video/ - Creates temporary directories for frame storage
- Cleans up temporary files after rendering
Network Access
- Starts local HTTP server on port 5175+ (configurable)
- Connects to Microsoft Edge TTS service (external)
- No external API keys required for basic functionality
Process Execution
- Spawns child processes for: Vite dev server, FFmpeg encoding
- Uses
execSyncfor: browser detection, audio generation trigger - All commands are predefined and not user-controllable
Environment Variables
Optional configuration:
PUPPETEER_EXECUTABLE_PATH: Custom browser path for PuppeteerFASTMCP_LOG_LEVEL: Logging level (default: ERROR)
Project Structure
public/
projects/
<projectId>/
<projectId>.json # Project configuration
footage/ # HTML/CSS media components
audio/ # Generated TTS audio files
0.mp3, 1.mp3, ...
0.json, 1.json, ... # Word timing metadata
video/
render-<projectId>-landscape.mp4
render-<projectId>-portrait.mp4
Usage Example
# 1. Generate audio for a project
pnpm generate-audio agentSaasPromoVideo
# 2. Preview in browser
pnpm dev
# 3. Render final video
pnpm render agentSaasPromoVideo
# 4. Render portrait version
pnpm render agentSaasPromoVideo --portrait
HTML Animation Guidelines
When creating HTML animations for video rendering, use the CSS variable timeline model.
Core Model
- Renderer controls time: Puppeteer sets
--tevery frame. - Page only renders state:
DOM = f(t). - No lifecycle animation APIs (
play/start/reset) and no hidden runtime state. - Transition system handles entrance effects: Don't implement slide/fade transitions in HTML - use the project's
transitionInproperty instead.
✅ Required Patterns
- Define timeline root:
:root { --t: 0; } - Every animated property must derive from
--t. - Always clamp normalized progress values:
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1); - Express initial/ending states directly in CSS (seek-safe at any frame).
- Use small deterministic JS only for content mapping (e.g., subtitle/text index from
t). - Let transition system handle entrance: Focus on content animation, not entrance effects.
🚫 Forbidden Patterns
transitionanimation/@keyframeswindow.registerFrameAnimation(...)requestAnimationFrameloops for timeline progression- Implicit time from
Date.now()/performance.now()for visual state - Manual entrance transitions: Don't implement slide/fade in HTML - use
transitionInin project config - Fade-out effects: Elements should not disappear after animation completes. Use
opacity: var(--p)instead ofopacity: calc(var(--p) * (1 - var(--fade)))to keep elements visible at their final state.
Recommended Template
.element {
--start: 0.5;
--duration: 1;
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1);
opacity: var(--p);
transform: translateY(calc((1 - var(--p)) * 20px));
}
Easing (without transition)
Use math on progress directly:
--p: clamp(0, calc((var(--t) - var(--start)) / var(--duration)), 1);
--ease-out: calc(1 - (1 - var(--p)) * (1 - var(--p)));
opacity: var(--ease-out);
JS Hook Pattern (text/content only)
<script>
const labels = ['A', 'B', 'C'];
const el = document.getElementById('label');
window.onTimelineUpdate = (t) => {
const idx = Math.floor(Math.max(0, t) / 1.2) % labels.length;
el.textContent = labels[idx];
};
</script>
Time Semantics (t vs globalTime)
onTimelineUpdate(t, globalTime)supports two time domains:t: clip-local time (resets to0when clip changes). This is the default for most HTML animations.globalTime: continuous timeline across clips. Use only when an element must stay continuous through cross-clip transitions.
- Do not assume
tis media-local. If a media appears mid-clip,tmay already be large when it first becomes visible. - For media-local behavior (e.g., toggle starts animating when this media appears), anchor from first visible
globalTimeand derive:local = globalTime - mediaStartGlobalTime
- Keep fallback for compatibility:
window.onTimelineUpdate = (t, globalTime) => { const g = Number.isFinite(globalTime) ? globalTime : t; // use `t` for normal clip-local animation, `g` only when continuity is required };
Determinism Checklist
- Seeking to any
tyields exactly one deterministic frame. - Animation state must not depend on "previous frame".
- Cross-clip transition visuals should be continuous in both clips.
- Final frame (
t = totalDuration) must remain on the last clip (no wrap to first clip).
Limitations
- Requires FFmpeg to be installed on the system
- TTS generation requires internet connection (Microsoft Edge TTS)
- Rendering is CPU-intensive and may take several minutes
- Maximum TTS text length: ~1000 characters per clip
- Frame capture requires sufficient disk space
Transparency Statement
This skill executes shell commands and spawns child processes for video rendering. All operations are limited to:
- Starting a local development server (Vite)
- Running FFmpeg for video encoding
- Launching Puppeteer for frame capture
- Detecting browser executables on the system
No arbitrary code execution or user input is passed to shell commands. All file paths and commands are predefined and validated.
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!