New generative media models and tools, built with and for creators (2024)

May 14, 2024

[[read-time]] min read

We’re introducing Veo, our most capable model for generating high-definition video, and Imagen 3, our highest quality text-to-image model. We’re also sharing new demo recordings created with our Music AI Sandbox.

New generative media models and tools, built with and for creators (1)

Eli Collins

VP, Product Management

New generative media models and tools, built with and for creators (2)

Douglas Eck

Senior Research Director

Bullet points

Title: Unveiling Veo and Imagen 3: Next-Level Generative Media Models

Veo, our most advanced video generation model, creates high-quality 1080p videos with cinematic styles.
Imagen 3, our highest quality text-to-image model, generates photorealistic images with fewer artifacts.
Collaborations with Donald Glover, Wyclef Jean, Marc Rebillet, and Justin Tranter showcase the creative potential of our AI tools.
Responsible development includes safety tests, filters, guardrails, and digital watermarks for AI-generated content.
Join the waitlist to access Veo and Imagen 3 and explore the future of generative media.

Summaries were generated by Google AI. Generative AI is experimental.

New generative media models and tools, built with and for creators (3)

Over the past year, we’ve made incredible progress in enhancing the quality of our generative media technologies. We’ve been working closely with the creative community to explore how generative AI can best support the creative process, and to make sure our AI tools are as useful as possible at each stage.

Today, we’re introducing Veo, our latest and most advanced video generation model, and Imagen 3, our highest quality text-to-image model yet.

Veo: our most capable video generation model

Veo generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute. With an advanced understanding of natural language and visual semantics, it generates video that closely represents a user’s creative vision — accurately capturing a prompt’s tone and rendering details in longer prompts.

The model provides an unprecedented level of creative control, and understands cinematic terms like “timelapse” or “aerial shots of a landscape”. Veo creates footage that’s consistent and coherent, so people, animals and objects move realistically throughout shots.

To discover how Veo can best support the storyteller’s creative process, we’re inviting a range of filmmakers and creators to experiment with the model. These collaborations also help us improve the way we design, build and deploy our technologies to make sure creators have a voice in how they’re developed.

Here's a preview of our work with filmmaker Donald Glover and his creative studio, Gilga, who experimented with Veo for a film project.

10:25

Veo builds upon years of our generative video model work, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere — combining architecture, scaling laws and other novel techniques to improve quality and output resolution.

With Veo, we’ve improved techniques for how the model learns to understand what's in a video, renders high-definition images, simulates the physics of our world and more. These learnings will fuel advances across our AI research and enable us to build even more useful products that help people interact and communicate in new ways.

Starting today, Veo is available to select creators in private preview in VideoFX by joining our waitlist. In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.

Learn more about Veo’s capabilities.

Imagen 3: our highest quality text-to-image model

Over the last year, we’ve made incredible progress improving the quality and fidelity of our image generation models and tools.

Imagen 3 is our highest quality text-to-image model. It generates an incredible level of detail, producing photorealistic, lifelike images, with far fewer distracting visual artifacts than our prior models.

Imagen 3 better understands natural language, the intent behind your prompt and incorporates small details from longer prompts. The model’s advanced understanding helps it master a range of styles.

It’s also our best model yet for rendering text, which has been a challenge for image generation models. This capability opens up possibilities for generating personalized birthday messages, title slides in presentations and more.

Starting today, Imagen 3 is available to select creators in private preview in ImageFX, and by joining our waitlist. Imagen 3 will be coming soon to Vertex AI.

Learn more about Imagen 3’s capabilities.

Our collaborations with the music community

As part of our continued exploration into the role AI can play in art and music creation, we’re collaborating in partnership with YouTube, with some amazing musicians, songwriters and producers.

These collaborations are also informing the development of our generative music technologies, including Lyria, our most advanced model for AI music generation.

As part of this work, we’ve been developing a suite of music AI tools called Music AI Sandbox. These tools are designed to open a new playground for creativity, allowing people to create new instrumental sections from scratch, transform sound in new ways and much more.

10:25

We're partnering with musicians, songwriters, and producers to investigate the exciting role artificial intelligence can have in the music creation process.

Today, we’re continuing that experimentation in music with Grammy-winning musician Wyclef Jean, Grammy-nominated songwriter Justin Tranter and electronic musician Marc Rebillet — who are releasing new demo recordings on their YouTube channels, created with help from our music AI tools.

10:25

Wyclef Jean, Justin Tranter, and Marc Rebillet are the first to release new demos using the Music AI Sandbox, and each demo is now available for listening on their YouTube channels.

Responsible from design to deployment

We’re mindful about not only advancing the state of the art, but doing so responsibly. So we’re taking measures to address the challenges raised by generative technologies and helping enable people and organizations to responsibly work with AI-generated content.

For each of these technologies, we’ve been working with the creative community and other external stakeholders, gathering insights and listening to feedback to help us improve and deploy our technologies in safe and responsible ways.

We’ve been conducting safety tests, applying filters, setting guardrails, and putting our safety teams at the center of development. Our teams are also pioneering tools, such as SynthID, which can embed imperceptible digital watermarks into AI-generated images, audio, text and video. And starting today, all videos generated by Veo on VideoFX will be watermarked by SynthID.

The creative potential for generative AI is immense and we can’t wait to see how people around the world will bring their ideas to life with our new models and tools.

Collection Collection I/O 2024 Here’s a look at everything we announced at Google I/O 2024. See more

New generative media models and tools, built with and for creators (2024)

Bullet points

Veo: our most capable video generation model

Imagen 3: our highest quality text-to-image model

Our collaborations with the music community

Responsible from design to deployment

References