clips — Pipeline

The TKK pipeline turns a topic into a finished 28-45 second vertical video. Every step is automated and driven by a single Claude Code session using MCP tools.

Step by Step

1

Write a Screenplay

A Python file defines 6 scenes following the mystery arc: hook, wrong answer, contradiction, proof, betrayal, punch.

Each scene is a Manim Scene class with a construct() method. The file includes VTT timing cues mapping narration segments to scenes, a TTS_SCRIPT with the full narration text, and DURATION constants controlling time allocation. Target: 150 words per minute narration pace.

2

Generate TTS

Narration text is sent to Fish Audio, which returns a natural voice recording using the ELITE voice model.

Fish Audio is the mandatory TTS provider. The returned audio drives all animation timing — scene durations are calibrated to match the audio length. generate_tts.py parses the screenplay's TTS_SCRIPT and produces per-scene tts_*.mp3 files. edge-tts is available as a fallback.

3

Render Preview Frames

Quick static PNG snapshots of each scene for visual review before committing to a full render.

Run with python3 {topic}_manim.py --preview. Generates 6 PNGs in previews/ (~10 seconds). Lets you check layout, text placement, zone coverage, and visual balance without waiting for a full video render.

4

Quality Assurance

Three automated QA checks verify the video will look and sound right before final render.

Layout QA (qa_layout.py) — checks that content fills the vertical frame, verifies zone coverage and balance.
Readability QA (qa_readability.py) — contrast ratios, margin clearance, text size minimums.
Sync QA (qa_sync.py) — AV drift (>2s = FAIL), dead time (animations finishing >3s early), overflow (animations exceeding allocated time), number sync (visual numbers in correct scene), scene budget (minimum 1.5s per scene).

5

Full Render

Manim renders each scene as a video clip. Clips are concatenated and merged with voice audio to produce the final MP4.

Output: 1080x1920, 30fps, H.264 + AAC. FFmpeg handles concatenation and audio merge. Takes 1-2 minutes. Result is saved as {topic}_final.mp4 in the vidgen directory and appears in the clips library.

MCP Server

The entire pipeline is exposed as MCP (Model Context Protocol) tools via mcp_server.py. This lets Claude Code drive every step programmatically — reading/writing screenplays, generating TTS, rendering previews, running QA, and producing final videos — all from a single conversation.

list_screenplays List all screenplays with status

read_screenplay Read a screenplay file

write_screenplay Write/update a screenplay

generate_tts Generate Fish Audio TTS

render_preview Render 6 preview PNGs

render_full Full render to final MP4

run_qa Layout + readability + sync QA

read_production_guide Read the production guide

Session Model

TKK uses a single Claude Code session to handle the full pipeline. No multi-agent orchestration, no autonomous workers, no message passing between processes. One session writes the screenplay, generates TTS, renders, runs QA, and fixes issues — all in a deterministic, linear flow.

This replaced an earlier multi-agent system (OpenClaw) that used 3 specialized agents. The single-session approach is simpler, more reliable, and easier to debug.

Tech Stack

Manim CE v0.20

Animation engine. Renders each scene as video frames from Python code.

Fish Audio

TTS API. Natural voice narration using the custom ELITE voice model.

FFmpeg

Video processing. Concatenates clips, merges audio, encodes H.264.

Claude Code + MCP

AI writes screenplays and drives the pipeline via MCP tools.

FastAPI

This dashboard. Library, workbench, chat, and about pages.

Caddy

Reverse proxy for clips.applesauce.chat on port 8020.