I've directed AI video productions for Cannes Festival content, BBC commissions, and commercial clients across a range of industries. The workflow I use today would have been unrecognisable two years ago. Here's the complete pipeline as it actually runs in April 2026.
The Stack
Before the workflow, the tools. Not every tool for every job — the right tool for each layer.
- Brief & concept: Claude (language) + Midjourney (visual ideation)
- Video generation: Runway Gen-4.5 (hero shots, human subjects), Kling 3.0 (motion physics, volume generation), Pika 2.5 (concept testing, physics effects)
- Voice & audio: ElevenLabs Eleven v3 (VO and avatar voice), Murf AI (team workflows)
- Avatar/presenter: HeyGen Avatar IV (corporate, multilingual)
- Post-processing: Topaz Video AI (upscaling, noise reduction), CapCut (social delivery, captions)
- Editing: Premiere Pro with Firefly panel (editorial, B-roll), Descript (dialogue-heavy content)
Phase 1: Pre-Production
The quality of AI video output is determined almost entirely in pre-production. This is where most productions fail — they skip it because the tools feel instant.
Shot list first. Write every shot you need before opening a generation tool. What's the focal length? What's the action? What's the lighting motivation? AI models respond to cinematographic specificity. "A woman walks through a busy market" produces generic results. "Low angle tracking shot, golden hour backlight, shallow depth of field, woman in red jacket walks through crowded Moroccan medina" produces something.
Visual references. Midjourney a few reference frames first — not to generate final footage, but to establish the visual language. Bring those frames into Runway as reference images.
Script before voice. Write the full VO script and lock it before generating a single frame. Duration drives shot count. Shot count drives generation budget.
Phase 2: Generation
Test in Pika, produce in Runway/Kling. Pika 2.5 is fast and cheap. Use it to pressure-test your concept — does this shot work? Does this camera move read correctly? When the concept is approved, move to Runway or Kling for final quality.
Runway for human subjects. If the shot has a person in it who matters, Runway Gen-4.5 is the only model I trust for broadcast-quality photorealism. Use Image-to-Video with a Midjourney reference frame for consistency.
Kling for volume and motion physics. When I need 30 shots of product in motion, liquid, explosion effects, or dynamic environments, Kling at Standard quality is faster and more cost-effective than Runway. The motion physics are genuinely excellent.
AI Director for multi-shot continuity. Runway's AI Director mode has changed how I approach narrative sequences. Feed it a character description and scene context, let it maintain continuity across 6–8 shots. It's not perfect, but it removes 70% of the manual consistency management that used to make multi-shot AI sequences exhausting.
Phase 3: Voice and Audio
ElevenLabs first, always. The Eleven v3 model produces voice that's indistinguishable from a studio recording for most commercial applications. One-minute source sample for voice cloning. Write the script in natural spoken English — contractions, breath points, slight informality. It reads better than formal prose.
For multilingual delivery, HeyGen's real-time lip sync is now production viable for corporate content. Record once in English, deliver in 12 languages with matched lip sync. The accuracy above 95% for European languages is a real number — I've tested it against native speakers.
Phase 4: Post-Processing
Never deliver raw AI output. The post-processing layer is what separates professional work from demo content.
Topaz Video AI is mandatory for 4K delivery. Upscale with Proteus for general footage, Iris for anything with a human face. The detail reconstruction is extraordinary — footage that wouldn't pass a broadcast QC check raw passes cleanly after Topaz.
Frame rate conversion if you're delivering to social. Topaz's AI frame interpolation to 60fps for Instagram and TikTok is a standard step now.
CapCut for social finishing. Auto-captions that sync correctly on the first pass, background removal, platform-specific aspect ratio delivery. The fastest path from finished edit to multi-platform publish.
Phase 5: Delivery
Render masters at 4K ProRes 422 minimum. Deliver platform-specific versions separately — don't let clients compress masters. For broadcast deliverables, run a technical QC pass against whatever spec applies (AS-11 for UK broadcast, IMF for international).
The Business Reality
This stack, running efficiently, produces broadcast-quality AI video at a fraction of traditional production cost. The director layer — knowing which tool to use, in what order, briefed how — is what clients are paying for. The tools are available to everyone. The judgment is not.
Start with this pipeline. Adapt it to your clients' requirements. The core principle stays constant: test fast, produce slow, finish with precision.