Veo
Google DeepMind, with audio
Veo by Google DeepMind — top quality with native audio. Flagship 3.1 (with Fast and Lite tiers), previous 3.0 (with Fast) and oldest 2.0. All support "first/last frame" pairs.
Model versions
Veo 3.1
Top Veo 3.1 by Google with built-in audio (ambient, voices, lip-sync). Durations 6/8 sec, end-frame support via flf-endpoint, up to 4K resolution.
21/42 tok./sec
Veo 3.1 Fast
Fast/budget Veo 3.1: same motion and audio quality, 2× cheaper. Durations 6/8 sec, flf support, up to 4K.
18/27 tok./sec
Veo 3.1 Lite
The most affordable Veo: $0.03/sec without audio. Simpler scene quality, but motion and audio match the family. No 4K, durations 6/8 sec.
6/9 tok./sec
Veo 3.0
Previous Google flagship model. With audio, no 4K. Same settings as 3.1. Durations 6/8 sec, flf support.
21/42 tok./sec
Veo 3.0 Fast
Fast 3.0: $0.10/sec without audio. No 4K, durations 6/8 sec, flf support.
18/27 tok./sec
Veo 2.0
Veo 2 — first generation by Google. No audio, no 4K. Realistic motion, durations 5/6/7/8 sec, flf support.
53 tok./sec