Skill Video Caption Overlay
Render TikTok-style animated pill captions onto short-form videos using MoviePy + PIL. Takes a base MP4, a captions JSON, and optional background audio — out...
Description
name: skill-video-caption-overlay description: Render TikTok-style animated pill captions onto short-form videos using MoviePy + PIL. Takes a base MP4, a captions JSON, and optional background audio — outputs a final video with fade-in/out pill overlays. Fixes the PIL textbbox y-offset bug that causes text to sit outside pill boundaries. Use for TikTok ads, Reels, YouTube Shorts. metadata: openclaw: requires: { bins: ["uv"] }
Video Caption Overlay
Animated pill-style caption overlays for short-form video. No Premiere, no CapCut — pure Python.
Usage
uv run --with moviepy --with pillow scripts/overlay.py \
--video base.mp4 \
--output final.mp4 \
--captions scripts/example_captions.json \
--audio music.mp3 \
--audio-start 8 \
--audio-vol 0.5
No --audio if you want to keep the original video audio.
Custom fonts
--font-black /path/to/Montserrat-Black.ttf \
--font-bold /path/to/Montserrat-Bold.ttf
Falls back to Montserrat from ~/.local/share/fonts/ if not specified.
captions.json format
Array of phases — each phase is a time window with one or more pill lines stacked vertically.
[
{
"start": 0,
"end": 3.2,
"y_frac": 0.06,
"lines": [
{
"text": "POV:",
"size": 28,
"bold": true,
"bg": [0, 195, 255],
"fg": [0, 0, 0],
"bg_opacity": 0.9,
"px": 20, "py": 9, "r": 12
},
{
"text": "drink more water",
"size": 50,
"bg": [255, 255, 255],
"fg": [0, 0, 0]
}
]
}
]
| Field | Type | Default | Description |
|---|---|---|---|
start |
float | required | Phase start time (seconds) |
end |
float | required | Phase end time (seconds) |
y_frac |
float | 0.06 | Vertical position as fraction of video height |
lines[].text |
string | required | Caption text |
lines[].size |
int | 50 | Font size (px) |
lines[].bold |
bool | false | Use bold font (vs black/heavy) |
lines[].bg |
[R,G,B] | [255,255,255] | Pill background color |
lines[].fg |
[R,G,B] | [0,0,0] | Text color |
lines[].bg_opacity |
float | 0.93 | Pill background opacity (0–1) |
lines[].px |
int | 26 | Horizontal padding |
lines[].py |
int | 13 | Vertical padding |
lines[].r |
int | 18 | Border radius |
PIL textbbox fix
PIL's textbbox((0,0), text, font) returns (x0, y0, x1, y1) where y0 is a non-zero offset (typically 7–15px depending on font size). Drawing text at (x, y) without compensating for this offset causes text to appear below the pill's visual center.
Fix implemented in pill():
bb = draw.textbbox((0, 0), text, font=font)
x_off, y_off = bb[0], bb[1]
vis_w = bb[2] - bb[0] # actual visual width
vis_h = bb[3] - bb[1] # actual visual height
# Compensate offsets when drawing text
tx = cx - vis_w // 2 - x_off
ty = y - y_off
draw.text((tx, ty), text, font=font, fill=fg)
Emoji note
NotoColorEmoji.ttf fails with PIL at arbitrary sizes (bitmap font with limited supported sizes). Use text alternatives ("Free delivery" instead of "Free delivery 🚚") for reliable rendering.
Example output
See scripts/example_captions.json for the full 3-phase TikTok ad structure:
- Phase 1 (0–3.2s): Hook — top-screen pill stack
- Phase 2 (2.8–5.8s): Product claim — overlapping fade
- Phase 3 (5.3–8.0s): CTA — bottom-screen price + delivery + bio link
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!