Pipeline

Overview Engines Pipeline Values Roadmap

The TKK pipeline turns a topic into a finished 28-45 second vertical video. Every step is automated and driven by a single Claude Code session using MCP tools.

Step by Step

1

Write a Screenplay

A Python file defines 6 scenes following the mystery arc: hook, wrong answer, contradiction, proof, betrayal, punch.

Each scene is a Manim Scene class with a construct() method. The file includes VTT timing cues mapping narration segments to scenes, a TTS_SCRIPT with the full narration text, and DURATION constants controlling time allocation. Target: 150 words per minute narration pace.
2

Generate TTS

Narration text is sent to Fish Audio, which returns a natural voice recording using the ELITE voice model.

Fish Audio is the mandatory TTS provider. The returned audio drives all animation timing — scene durations are calibrated to match the audio length. generate_tts.py parses the screenplay's TTS_SCRIPT and produces per-scene tts_*.mp3 files. edge-tts is available as a fallback.
3

Render Preview Frames

Quick static PNG snapshots of each scene for visual review before committing to a full render.

Run with python3 {topic}_manim.py --preview. Generates 6 PNGs in previews/ (~10 seconds). Lets you check layout, text placement, zone coverage, and visual balance without waiting for a full video render.
4

Quality Assurance

Three automated QA checks verify the video will look and sound right before final render.

Layout QA (qa_layout.py) — checks that content fills the vertical frame, verifies zone coverage and balance.
Readability QA (qa_readability.py) — contrast ratios, margin clearance, text size minimums.
Sync QA (qa_sync.py) — AV drift (>2s = FAIL), dead time (animations finishing >3s early), overflow (animations exceeding allocated time), number sync (visual numbers in correct scene), scene budget (minimum 1.5s per scene).
5

Full Render

Manim renders each scene as a video clip. Clips are concatenated and merged with voice audio to produce the final MP4.

Output: 1080x1920, 30fps, H.264 + AAC. FFmpeg handles concatenation and audio merge. Takes 1-2 minutes. Result is saved as {topic}_final.mp4 in the vidgen directory and appears in the clips library.

MCP Server

The entire pipeline is exposed as MCP (Model Context Protocol) tools via mcp_server.py. This lets Claude Code drive every step programmatically — reading/writing screenplays, generating TTS, rendering previews, running QA, and producing final videos — all from a single conversation.

list_screenplays List all screenplays with status
read_screenplay Read a screenplay file
write_screenplay Write/update a screenplay
generate_tts Generate Fish Audio TTS
render_preview Render 6 preview PNGs
render_full Full render to final MP4
run_qa Layout + readability + sync QA
read_production_guide Read the production guide

Session Model

TKK uses a single Claude Code session to handle the full pipeline. No multi-agent orchestration, no autonomous workers, no message passing between processes. One session writes the screenplay, generates TTS, renders, runs QA, and fixes issues — all in a deterministic, linear flow.

This replaced an earlier multi-agent system (OpenClaw) that used 3 specialized agents. The single-session approach is simpler, more reliable, and easier to debug.

Tech Stack

Manim CE v0.20
Animation engine. Renders each scene as video frames from Python code.
Fish Audio
TTS API. Natural voice narration using the custom ELITE voice model.
FFmpeg
Video processing. Concatenates clips, merges audio, encodes H.264.
Claude Code + MCP
AI writes screenplays and drives the pipeline via MCP tools.
FastAPI
This dashboard. Library, workbench, chat, and about pages.
Caddy
Reverse proxy for clips.applesauce.chat on port 8020.