Intelligence Feed
Tutorials10 min

AI Video Prompt Engineering: The Director's Vocabulary

The single biggest variable in AI video quality is not the model — it's the prompt. Two people using the same tool, the same settings, the same subject will produce wildly different results based on how precisely they direct the model.

This is not a beginner's guide to "write better prompts." This is the director's vocabulary — the specific language that unlocks cinematic output from Runway, Kling, Sora, and every other serious model.

The Principle: Brief Like a DoP

AI video models are trained on real film and video. They understand cinematographic language because the data they learned from was made by cinematographers. So write prompts the way you'd brief a Director of Photography — with technical precision, not descriptive prose.

Generic: "A woman walks through a city at sunset"

Cinematic: "Slow tracking shot, golden hour backlight, 50mm equivalent, shallow depth of field, woman in dark coat walks through out-of-focus pedestrian crossing, bokeh streetlights, handheld slight sway"

The second prompt doesn't take longer to write. It takes knowledge of what terms to use.

Camera Moves

Always name the camera move explicitly. Models respond to these:

  • Static — locked-off, no movement
  • Slow push in / slow pull back — subtle dolly movement toward or away from subject
  • Tracking shot — camera moves parallel to the subject
  • Pan left / pan right — camera pivots on vertical axis
  • Tilt up / tilt down — camera pivots on horizontal axis
  • Crane up / crane down — vertical camera elevation change
  • Handheld — slight organic camera movement, adds realism
  • Drone / aerial — high angle, descending or ascending movement
  • Whip pan — fast horizontal movement, used as a transition

Focal Length Language

Focal length language tells the model the visual compression and depth of field:

  • Wide angle / 16mm / 24mm — broad field of view, environmental, slight distortion
  • 35mm — natural, close to human eye, photojournalism
  • 50mm — neutral, slightly intimate
  • 85mm / portrait — compressed background, subject isolation, fashion/beauty
  • Long lens / telephoto / 200mm — heavy background compression, surveillance feeling, sports
  • Macro — extreme close-up, surface texture, product detail

Lighting

Lighting is where most AI video prompts fail. "Good lighting" means nothing. Specific lighting references produce specific results:

  • Golden hour — warm, directional, low-angle sun, long shadows
  • Magic hour — the 20 minutes after sunset, diffuse warm light, no hard shadows
  • Motivated practical light — light sources visible in frame (lamps, windows, screens)
  • Neon / practical neons — coloured light from signage, urban night
  • Hard light / harsh shadows — single direct source, high contrast
  • Diffuse / soft light — overcast, bounced, no hard shadows
  • Backlit / silhouette — light source behind subject
  • Chiaroscuro — dramatic high-contrast light and shadow, Caravaggio-style
  • Available light / cinema vérité — natural, uncontrolled light, documentary feel

Subject and Action Precision

AI models need to know exactly what is happening in the frame. Vague action = vague output.

Weak: "A man reacts to something" Strong: "CU on man's face, slow blink, slight jaw tension, eyes shift left, hint of recognition — no dialogue"

Describe performance like a director note:

  • "Hesitant, glances down before speaking"
  • "Confident stride, doesn't break eye contact"
  • "Relief — exhale, shoulders drop, faint smile"

Style and Reference Language

Reference real aesthetic traditions and the models understand them:

  • Cinéma vérité — handheld, natural light, documentary authenticity
  • Neo-noir — high contrast, cool shadows, motivated neon, urban night
  • Terrence Malick aesthetic — golden hour, whispering voiceover, nature, slow movement
  • Commercial clean — bright, neutral, minimal, product-forward
  • Desaturated grade — muted colours, slightly cool, tension/drama
  • Warm grade / Kodak 5207 — filmic warmth, slight grain, organic
  • Anamorphic — horizontal lens flare, oval bokeh, widescreen crop

Negative Prompting

Runway and Kling accept negative prompts. Use them:

  • "no camera shake" for clean, stable motion
  • "no text overlays" to prevent generated captions
  • "no cuts" to ensure a continuous single shot
  • "no people" for environment-only shots
  • "no artificial lighting" for natural light only

The Full Prompt Structure

A production-grade prompt follows this structure:

[Camera move] [Focal length] [Subject + precise action] [Environment + lighting] [Aesthetic/grade reference] [Duration note if relevant]

Example:

Slow push in, 85mm, elderly man's hands carefully fold a letter on a worn oak desk, warm practical lamp from left casting long shadows across paper grain, desaturated warm grade, slight filmic grain, 8 seconds

This prompt is 40 words. It will produce a cinematically specific result in any top-tier model. A 10-word prompt describing the same scene will produce something generic.

What Doesn't Work

  • Emotion words alone: "beautiful," "stunning," "amazing" — these are judgements, not direction
  • Story context: the model doesn't care why the person is walking, only what they're doing
  • Excessive length: more than 100 words starts to confuse outputs — be specific, not exhaustive
  • Contradictory directions: "fast handheld tracking shot, perfectly stable" — pick one

Practice

Take your last three AI video prompts. Apply this framework:

  1. Name the camera move
  2. Add a focal length reference
  3. Describe the action with performance precision
  4. Name the lighting condition
  5. Add one style/grade reference

Your output quality will improve immediately. The model was always capable — it needed the right direction.