Google Vids is an AI-powered video creation app built into Google Workspace. It lets teams create explainer videos, presentations, and internal communications without video editing software. It generates scripts, voiceovers, and now animated AI avatars from text prompts.

How do you control avatars in Google Vids with text prompts?

In the Google Vids editor, select an AI avatar scene and type a natural language instruction such as 'point to the chart on the right' or 'walk toward the camera and wave'. The system interprets the instruction and generates avatar animation to match.

Google Vids is included in Google Workspace Business Standard, Business Plus, Enterprise, and Education plans. It is not available on the free Google account tier as of April 2026.

How does Google Vids compare to HeyGen or Synthesia?

HeyGen and Synthesia offer more avatar customization, custom avatar cloning, and more languages. Google Vids wins on integration — it connects directly to Google Slides, Drive, and Meet. For teams already in Workspace, Vids is the frictionless choice. For marketing agencies needing branded digital twins, HeyGen or Synthesia are still stronger.

TutorialApril 3, 2026·7 min read

Google Vids Now Lets You Direct AI Avatars With Text Prompts — Full Guide

Google Vids just added prompt-directed AI avatar control — type what you want your on-screen presenter to do, and the AI handles the animation. No keyframes, no motion capture, no video crew.

TL;DR

Google Vids now supports natural language avatar direction inside Google Workspace
Type instructions like "point at the slide" or "walk forward and smile" — the AI animates the avatar
Available on Workspace Business Standard and above
Best use cases: internal training videos, product explainers, HR announcements
Competes with HeyGen, Synthesia, and D-ID — but with native Google Drive/Slides integration

What Changed in Google Vids (April 2026)

The April 2026 Google Vids update ships one major new capability: text-prompt avatar control. Previously, Google Vids avatars were static presenters — you could choose a skin, voice, and background, but the avatar just stood there and talked.

Now you can choreograph the avatar with plain English. The system uses Google's Veo-based video generation stack to interpret instructions and produce synchronized motion that matches your prompt.

How to Use It: Step-by-Step

Step 1 — Open Google Vids

Go to vids.google.com or open a Vids file from Google Drive. You need a Workspace plan that includes Vids (Business Standard or above).

Step 2 — Add or Select an Avatar Scene

In the scene editor, click Add scene → Avatar presenter. Choose an avatar from the gallery or select one already in your project. Set the background — office, neutral, or a custom image.

Step 3 — Open the Avatar Direction Panel

With the avatar scene selected, click the Direction tab in the right panel (this is the new April 2026 addition). You will see a text input labeled "Tell your presenter what to do."

Step 4 — Type Your Instruction

Enter a natural language direction. Examples that work well:

"Point to the chart that will appear on the right side"
"Walk toward the camera, pause, then smile and wave"
"Gesture with both hands while explaining the three steps"
"Turn to face left, then turn back to camera"
"Nod slowly while speaking the first sentence"

Step 5 — Preview and Refine

Click Generate preview. The AI renders a 3-5 second clip of the motion. If it is not right, revise the prompt — more specific instructions produce better results. Add spatial cues ("the object in the upper left") and emotional tone ("confidently", "casually").

Step 6 — Sync With Voiceover

Google Vids automatically aligns avatar motion with the scene's AI-generated or recorded voiceover. If the motion feels out of sync, use the Timeline view to offset the motion start by tenths of a second.

What Works Well vs. What Does Not

Works Well	Limitations
Simple gestures: pointing, waving, nodding	Complex multi-step choreography often misinterprets
Walking toward or away from camera	Avatar cannot interact with physical objects in scene
Emotional tone cues (smile, frown, look surprised)	Lip sync can slip on non-English voiceovers
Turn left/right, face away from camera	No custom avatar cloning (must use gallery avatars)
Combined motion + speech timing ("pause, then say...")	Workspace-only — not available on free Google accounts

Best Use Cases

Internal Training Videos

HR teams and L&D departments can produce consistent, on-brand training content without scheduling a presenter. Prompt the avatar to demonstrate a behavior ("lean forward and make eye contact while explaining the compliance rule") to reinforce the learning moment.

Product Explainers

For SaaS companies, a prompted avatar can walk through a UI walkthrough while pointing at annotated screenshots on the same slide. Cuts video production time from days to hours.

Leadership Communications

Executives who hate being on camera can use an AI avatar to deliver all-hands updates, quarterly reviews, or policy announcements. The avatar maintains eye contact, uses natural gestures, and can be re-recorded for corrections in seconds.

Localized Content

Swap the voiceover language and Google Vids re-syncs the avatar automatically. One video becomes 10 language versions with a consistent visual presentation.

Google Vids vs. HeyGen vs. Synthesia

Feature	Google Vids	HeyGen	Synthesia
Prompt-directed motion	Yes (new)	Limited	Limited
Custom avatar cloning	No	Yes	Yes
Google Drive integration	Native	No	No
Languages supported	40+	170+	140+
Starting price	Included in Workspace	$29/mo	$22/mo

Bottom line: If your team lives in Google Workspace, Vids is the frictionless choice — no new accounts, billing, or integrations. For marketing agencies that need custom digital twin avatars or 100+ languages, HeyGen and Synthesia still lead.

Prompt Tips for Better Results

Be spatial. Reference directions (left, right, toward camera) instead of abstract concepts.
Separate motion from speech. Describe body movement in one prompt, emotional tone in another.
Use timing cues. "First ... then ..." structures produce more predictable sequencing.
Iterate fast. Previews render in under 10 seconds. Try 3-4 variations before committing.
Keep it under 5 seconds. Longer motion prompts degrade in fidelity. Chain short clips instead.

Looking for more AI video tools?

Happycapy covers every major AI video, image, and content tool — with hands-on guides written for teams, not just developers.

Try Happycapy

Sources & Further Reading

Google AI Google Research

← Back to all articles