You can tell you’ve hit the intermediate stage with Deevid AI when your prompts start doing 80% of the work, and the enhancer barely has anything left to add. Getting there takes deliberate practice — specifically, writing in a structure.

This tutorial walks through that structure in full, using the five-part format we use on every client brief.

The five parts

Every prompt that consistently produces usable output contains five elements, in roughly this order:

[subject] + [action]
in [setting]
with [lighting / mood]
[camera / lens]
[motion detail]
[duration]

You can write it as one long sentence or five clauses separated by commas. Either works. What matters is that all five are present.

A fully-formed example:

A potter shapes clay on a spinning wheel in a sunlit studio, warm afternoon light streaming through a window, close-up macro lens, shallow depth of field, slow dolly-in with subtle rotation, 8 seconds.

Let’s walk through each part in depth.

Subject

The subject is the focal object or person in your shot. It’s the noun the camera cares about. Weak subject descriptions are the single most common prompt failure.

Weak: A person, a car, a bird.

Better: A potter in a canvas apron, a 1960s red sedan, a sparrow on a weathered fence.

Best: A 50-year-old potter in a canvas apron, hands covered in clay, a 1960s red sedan with faded paint and chrome bumpers, a small brown sparrow with a single feather out of place.

Specificity compounds. Every additional detail you give about the subject reduces the space Deevid has to drift into generic territory. If you care about the output, spend a full sentence on the subject alone.

Action

The verb. What the subject is doing. Even for still subjects, you need an action — “sits motionless” is an action.

Weak: a potter, a car.

Better: a potter shaping clay, a car idling at an intersection.

Best: a potter slowly shaping a curved vase on a spinning wheel, a car idling at a rain-slicked intersection, wipers clearing the windshield.

The action verb should contain the motion you want the clip to show. If you write “a potter at a wheel” without a motion verb, expect a static tableau. If you write “a potter shaping clay,” you’ll get motion.

Setting

Where the action happens. Describe the environment in enough detail that Deevid can imagine the frame around the subject.

Weak: in a studio, on a street.

Better: in a small sunlit pottery studio, on a rain-slicked city street at dusk.

Best: in a small sunlit pottery studio with wooden shelves of unfired pots, on a rain-slicked city street at dusk with neon signs reflected in puddles.

The setting shapes lighting, mood, and background. If your setting is vague, your background will be generic AI filler — the plant you didn’t ask for, the window in the wrong place.

Lighting and mood

This is the clause that separates the pros from everyone else. Most creators skip it entirely. Don’t.

Good lighting clauses describe:

Source: window, overhead, tungsten, golden hour, neon, practical.
Quality: hard, soft, diffused, directional, flat.
Color: warm, cool, neutral, saturated, desaturated.
Direction: camera-left key light, rim-lit from behind, backlit, top-down.

Weak: good lighting, natural light.

Better: warm afternoon light, soft window light.

Best: warm afternoon light from camera-left through a large studio window, soft diffused shadows, soft window light with a cool rim from a practical lamp behind the subject.

Mood belongs in this same clause. Melancholy, serene, tense, energetic — one word is usually enough.

Camera and motion

Two linked clauses that describe how the camera frames the subject and how it moves.

For camera, cover:

Shot size: wide, mid, close-up, extreme close-up.
Angle: eye-level, low-angle, high-angle, overhead, three-quarter.
Lens: 35mm, 85mm, macro, wide-angle. Shallow or deep depth of field.

For motion, cover:

Motion type: static, dolly-in/out, pan left/right, orbit, crane down, hand-held.
Speed: slow, medium, fast.
Length of the motion: subtle, pronounced.

Weak: camera moves around the scene.

Better: slow dolly-in on a 35mm lens.

Best: slow 1-meter dolly-in on a 35mm lens with shallow depth of field, barely perceptible, ending in a mid-shot.

The more specific your camera clause, the less Deevid has to invent. Specific camera language is the single biggest lever in your prompt.

Duration

The easiest part. Say how long the clip should be.

Social: 4–6 seconds.
B-roll / product: 8–12 seconds.
Narrative: 12–20 seconds (Pro tier only).

Shorter clips are almost always safer. Longer clips drift. If you’re uncertain, generate at 6–8 seconds first, then re-prompt to extend if it works.

Putting it together: before and after

A weak prompt:

A potter in a studio makes a vase.

The output: a static, generic scene. No motion, unclear subject, no style.

The same scene, with all five parts present:

A 50-year-old potter in a canvas apron shapes a curved vase on a spinning wheel in a small sunlit pottery studio with wooden shelves behind them, warm afternoon light from camera-left through a large studio window with soft diffused shadows, close-up macro lens with shallow depth of field, slow 1-meter dolly-in ending in a mid-shot, 8 seconds.

The output: a specific, cinematic, on-brief clip.

The second prompt is 12× longer. It is also roughly 5× more likely to produce a usable first render. That trade is always worth making.

What to avoid

Three patterns that consistently produce weak output:

1. Adjective stacking. A beautiful, stunning, gorgeous, perfect potter... — Deevid does not reward superlatives. Specific nouns and verbs outperform enthusiastic adjectives every time.

2. Contradictory camera instructions. Dolly-in and pan-right with zoom-out — pick one primary motion. Camera moves stack poorly.

3. Over-prescribed color grading. Cool blue tone with warm orange highlights and desaturated shadows and film grain — Deevid’s color response is baked into the lighting clause. Heavy grading instructions often get ignored and sometimes fight the lighting you specified.

Practice exercise

Pick one of your own last ten renders. Open the prompt you used. Count how many of the five parts are present.

If it’s fewer than four, rewrite the prompt with all five. Run it three times. Compare the output to your original.

This single exercise, repeated on ten prompts, is worth more than reading any number of tutorials.

Next: once your individual prompts are strong, the next skill is multi-shot coherence. Read the character consistency workflow to learn how to keep the same subject across multiple generations.

Anatomy of a great Deevid AI prompt

The five parts

Subject

Action

Setting

Lighting and mood

Camera and motion

Duration

Putting it together: before and after

What to avoid

Practice exercise

Written by Marcus Hale

Related tutorials

Ready to try it? 3 days, no card.

Anatomy of a great Deevid AI prompt

The five parts

Subject

Action

Setting

Lighting and mood

Camera and motion

Duration

Putting it together: before and after

What to avoid

Practice exercise

Written by Marcus Hale

Related tutorials

Character consistency in Deevid AI: the workflow that works

Your first 30 minutes with Deevid AI

Ready to try it? 3 days, no card.