Voiceover: Directing Vocal Delivery
Trailer Voiceover
The voice acting industry earned approximately $6 billion globally before AI voice cloning became commercially viable. The technology to clone a voice from three minutes of audio exists, is commercially available, and has been deployed without consent in several documented cases. The legal and ethical framework has not caught up.
Scarlett Johansson vs OpenAI: "Sky" Voice Suspended After Actor Claims Uncanny Resemblance to Her Voice
OpenAI suspended its "Sky" voice assistant — described by users as closely resembling actress Scarlett Johansson's voice — after Johansson issued a public statement saying she had been approached to voice the assistant, declined, and was shocked to find a near-identical voice launched regardless. OpenAI denied the voice was modelled on Johansson's; Johansson retained legal counsel.
"She said no. They launched it anyway. They say they didn't. She has lawyers."
SAG-AFTRA Voice Actor Strike: Studios Demand AI Voice Right-to-Clone Clauses in New Contracts
Video game voice actors authorised a strike after major studios presented contract templates requiring actors to grant perpetual rights to AI clone their voice for any use in the studio's future catalogue. SAG-AFTRA called the clause "unlimited and uncompensated transfer of a performer's voice" and walked out.
Audiobook Publisher AI Narration Disclosure: "AI Narrated" Label Reduces Sales by 22% in Direct Comparison
Amazon's Audible disclosed A/B test data showing that audiobooks labelled "AI Narrated" sold 22% fewer copies than identical titles labelled "Human Narrated" over a 90-day window, controlling for title, author, and sample quality. The disclosure gap persisted even when reviewers rated the AI narration as equal in quality to the human version.
"Disclosure is not neutral. Knowing changes what people will pay."
James Earl Jones Estate Licenses His Voice to Disney — AI to Continue Darth Vader Indefinitely
The estate of James Earl Jones, who died in 2024, confirmed a pre-negotiated agreement with Disney allowing AI cloning of his voice to continue the Darth Vader role in perpetuity. The deal, negotiated before his death, was described as the first major estate AI voice licensing agreement and is expected to become a template for franchise IP continuity.
A voice actor records a voice performance under a contract. The production company later uses AI to clone that voice for additional content not specified in the contract. Has infringement occurred, and if so, under which law?
Trailer Vo Mechanics
How Trailer Voiceover Directs a Cut
A voiceover in a professional trailer is not an announcer reading words to pictures. It is a director's instruction to the audience: where to look, what to feel, how fast to think.
In the trailer, the voice, picture, and music are three independent spines. Music carries rhythm and emotional arc. Picture carries proof — the hero's face, the action, the world. The voiceover spine carries the story promise: what this film is about, and why you should care. All three work separately. When they collide — a dialogue fragment landing on a cut, a music hit and a close-up face at the same moment — the impact is the audio and visual aligning. But the voiceover can survive completely alone. Strip the picture and music, and a well-directed trailer VO still tells you everything: protagonist, goal, stakes, the conflict that hooks you.
The voice does this through three load-bearing choices a director makes:
1. Tone. Is the voice warm, cold, defiant, fearful, curious? The timbre and inflection tell you how the protagonist feels. Trailers for intimate dramas use a voice that sounds like it's confiding in you — close, breathy, uncertain. Trailers for action films use a voice pitched high and tight with adrenaline, or low and controlled and implacable. The tone trains the audience to expect a specific emotional genre before the picture confirms it.
2. Pace and breath. How fast does the voice move through the lines? Are there pauses? Does the voice catch, stammer, hold back? A fast, breathless read suggests escalation and urgency. A slow, measured read suggests control, weight, stakes. Breath placement — where you let the voice stop and inhale — marks the emotional beats. A pause before a word makes that word hit harder. Trailers layer this: early lines are slow and open, middle lines speed up as the conflict escalates, and the final button — the title card moment — either snaps tight or explodes. The pace is not dictated by the word count. It is a performance choice that the director instructs.
3. Placement on the cut. When does the VO land relative to the picture changing? A line that hits exactly as the picture cuts creates a snap — the voice and image reinforce each other and the audience feels smart for tracking both. A line that lags behind the cut creates a drift — suspense, a sense the voiceover is chasing the action. A line that arrives before the cut creates anticipation — the voice primes you for what you're about to see. Professional editors use this like punctuation. The VO is the drum that sets the cut rhythm.
When you script a voiceover for a trailer, you are writing not just words but directions to the performer. "Write the voiceover for this beat" means: decide what the line says, decide what tone and pace deliver that message most powerfully, decide when relative to the picture that line should hit. The performer — human or AI — executes your score. A bad script is one where the tone and pace don't match the word choice, or where the line lands in a place that confuses instead of clarifies.
AI voice synthesis tools (like ElevenLabs) let you script these three choices as parameters: tone cues in the script that signal how the voice should sound, punctuation and silence marks that dictate pauses, and timing instructions that place the line relative to your cut. You input the script plus the parameters, the AI synthesizes the voice, and you hear your direction executed. No human performer needed. The craft skill is identical: knowing what tone, pace, and timing your story needs at each moment.
This is why the voiceover stage sits between the beats stage (you've named the cut structure) and the edit stage (you'll sync everything together). You know what each beat does. Now you script the voice that will steer the audience through it.
THE VOICE
The Same Words, Two Ways, Are Two Different Films
A voiceover is not decoration. It is a second narrator alongside the image. What is said matters. How it is said matters more.
Prosody—the music of speech—carries meaning. Pace tells the viewer whether to rush or dread. Pitch reveals character emotion. Emphasis directs attention to the idea that matters. Learn to direct a voiceover that matches every beat of your trailer, turning words into film.
**Strongest pick: B — the child's whisper.** Horror works on *wrongness*, not volume. A child stating dread flatly violates the expected (children = safety) and lands the line as a threat from inside the house. - **A** is the safe, correct-for-action choice — but on horror it's a cliché the audience tunes out; it signals "blockbuster", not "dread." - **C** is tonally dead — a newsreader frames the line as information, killing the menace. A student who picks **A** can still win *if* they argue the trailer is positioned as elevated-action, not pure horror — frame beats instinct.
**Correct Analysis:** Line 1: **Tone = vulnerable, intimate, breathy** (the character is confiding). **Why?** The protagonist is alone, introspecting — the audience should feel close, like overhearing a thought. **Placement = slightly BEFORE the cut**, so the confession primes the visual that proves it. Line 2: **Pace accelerates** from the slow, open Line 1. **Breath pause = marks the moment of realization**; the voice catches and then pushes through "everything," showing the character's shock and decision to keep speaking. **Staging:** medium tempo, not frantic yet. Line 3: **Tone drops lower and hardens** (loss of vulnerability, shift to defiance or grim acceptance). **Pitch = lower and louder than Line 2** (resolve, not doubt). **Placement ON the cut = mechanical snap**, so audience is primed by the VO and the visual hit lands at the exact same moment. This creates authority and a sense the character has committed to action. **Grading:** Award full marks if the student identifies: - Tone shifts matching emotional progression (vulnerable → shock → resolve) - Pace and breath as performance tools, not just decoration - Timecode placement as intentional direction, not accident - Why each choice serves the story moment (e.g., the pause marks realization; the snap on the cut creates impact) Partial credit if tone is named but not linked to the protagonist's state, or if pace changes are noted but breath/pause placement is missed.
Script Vo For Your Beat
Script a 3-line voiceover for one beat from your trailer's beat sheet. Use the three beat slots: setup, escalation, and button. For the beat you choose, write the VO words (one line per slot), then annotate each line with: tone (warm / cold / defiant / fearful / curious / [your word]), pace (slow / medium / fast / staccato), and placement relative to the picture cut (before / on / after). Finally, describe in 2 sentences why that tone and pace sequence serves your story at that moment.