Text-to-video in advertising: Where are we now?

While text-to-video AI tools are fast becoming the industry's latest obsession, Campaign goes beyond the hype to explore the current capabilities on offer and limitations of AI video generators for the advertising landscape.

by Matthew Keegan | 10/22/2024

Screenshot from Adobe's Firefly Video Model launch
Shutterstock

The speed at which social media and the internet moves means that agencies increasingly need to create and generate content faster and more frequently than ever before. In particular, video content has previously been time consuming to produce, but now new text-to-video AI tools have the potential to provide a solution to the problem.

To that end, a number of new AI video generation tools are beginning to flood the market. OpenAI, the makers of ChatGPT, are developing Sora, an AI model that can create realistic and imaginative scenes from text instructions in a matter of minutes. Microsoft has Mora, it's own text-to-video generator, Meta has Movie Gen, Adobe has Firefly Video Model... the list goes on.

But beyond the hype and excitement of the new shiny toy, are any of these text-to-video tools ready for prime time in the advertising world? Can they really transform the work of creatives at present?

JY Lay, group creative director APAC at VaynerMedia Asia Pacific, talks about experimenting with these tools for internal brainstorming and client presentations. "Personally, I’ve explored text-to-video for fun, but not with client work just yet," added Lay. Yet, Lay is confident that text-to-video AI has real potential for advertising, especially as a creative sandbox for testing proof of concepts.

"Internally, it can help bring visions and ideas to life faster, including for purposes of presentations to clients, which we’ve encouraged the team to get familiar with."

However, currently, most agencies are still only using these tools to experiment with and in non-commercial contexts.

This caution is echoed by Kurt Loy, Web3 project manager at Invnt.Atom. "Right now, text-to-video AI is perfect for churning out social media content or ads that don’t require too much emotional depth," added Loy.

"But when it comes to more complex storytelling or creating a really strong visual identity, the technology has its limits. It can’t quite capture the emotional subtleties or the level of polish that you’d get from a human creative team. If you rely on it too much, the work can start to feel generic. So, it’s a great tool for certain tasks, but for campaigns that need that human touch, it’s not there yet."

Back in June, WPP launched its AI-powered Production Studio, an end-to-end production application that automates the creation of text, images and video. While text-to-video technology is still emerging, WPP is confident that the technology has potential to transform content creation for advertisers and marketers, with initial tests showing promising results.

"Our approach is particularly effective for simpler, large-scale video production needs, and clients are responding positively," says Stephan Pretorius, chief technology officer, WPP. "Early indications suggest this technology is highly appealing to brands seeking to achieve unparalleled scale, speed, accuracy and creativity in their marketing campaigns."

Meanwhile, Kellyn Coetzee, national head of AI and insights, Kinesso Australia, says the technology has made it possible to produce studio-quality creative at unprecedented speed.

"This technology is reshaping production timelines and costs and is democratising high-quality content creation. This is being pushed heavily by the major platforms with this built in capability."

Yet, Coetzee adds that while text-to-video AI tools offer exciting possibilities, we must guard against visual homogenisation and originality that is very core to our humanness.

"Our approach integrates AI's efficiency with human creativity, using these tools for rapid prototyping and scaling, while relying on human insight for brand storytelling," says Coetzee. "The future belongs to those who can blend AI's capabilities with uniquely human strategic thinking and emotional intelligence."

Beyond text-to-video, some platforms have also introduced image-to-video, which allows creatives to use an image as a starting point, combined with prompts to generate a video.

"This approach is particularly useful because it provides greater control over the art direction," says Yong Hock Chye, chief innovation officer, Dentsu Creative Singapore. "Defining the start frame visually is far more practical than relying solely on prompts, which can be unpredictable and often require multiple attempts to get right."

Teething troubles and ethical quandaries

Despite the hype, while the possibilities are exciting, most tools, including Sora, remain in beta testing with limited availability. In a March interview, OpenAI's now former CTO Mira Murati indicated Sora will be backed by sound features and a potential release “this year” and that it “could be a few months.”

While text-to-video AI is progressing rapidly, and while it can produce impressive results, many feel the technology is not fully there yet.

"The videos often still have an 'unnatural' look—visual artefacts that industry pros can easily spot," says Lay. "Sora, for example, remains accessible only to a select group of beta testers. Other tools like Mora and Movie Gen are also limited by output length and resolution, making them less suitable for agencies looking to create larger-format productions."

Certainly, at this early stage, the technology is not without its limitations. Generating videos that precisely match a brief often requires multiple attempts and prompts.

"We are not yet seeing a model that is completely capable of following prompts about camera movements accurately," says Manolis Perrakis, innovation director at We Are Social Singapore. "Most models are not able to generate videos that exceed 10 seconds for now. That’s not to say that it’s impossible to do, but it is definitely harder to generate longer content currently."

And increasingly, there are copyright issues based on what the models have been trained on.

"Brands and agencies need to be cautious of the tools they use as some have been trained on copyright materials," says Perrakis. "It is crucial to understand the data sets that models have been trained on to ensure you’re not breaching copyright laws or using materials you shouldn’t be."

There is already a copyright lawsuit against Stability AI, DeviantArt, Midjourney, and Runway AI by a group of visual artists who claim their work has been misused. And earlier this year, a document obtained by 404media, appears to show that Runway ML trained their latest video generation tool, Gen-3, on the YouTube channels of thousands of popular media companies, including Pixar, Netflix, Disney and Sony. Runway ML is currently being sued on potential copyright infringement and trade dress violations.

This month, software giant Adobe announced its AI-powered text-to-video generation tool, Firefly Video Model, and claims that its model is only trained on licensed video, potentially avoiding ethical and copyright difficulties that have beset previous generative AI technologies.

According to Adobe, the Firefly Video Model is "the first publicly available video model designed to be commercially safe." However, the software company based in San Jose, California, has not declared a general release date, and during the beta test phase, only those on a waiting list will be granted access.

"Copyright issues are just the tip of the iceberg," says Coetzee. "The rise of deepfakes and AI influencers raises serious ethical concerns. We need robust systems to label AI-generated content, similar to ad disclosures, ensuring transparency and maintaining consumer trust."

Other than growing copyright and ethical concerns, another hurdle with video generation tools is accuracy and hallucinations.

"Similar to early image generation where we have seen errors like humans with six fingers, artefacts and hallucinations are very common in video generation," says Perrakis. "These errors are going to take time to overcome and they will require a keen [human] eye for detail as well."

Additionally, while AI is great at generating images, it is not robust at editing them.

"In tools designed for creatives, the ability to modify parts of a video or its individual elements is essential," says Chye. "Currently, when adjusting any parameters or prompts, the AI generates a completely new video rather than refining the existing one. This can be very resource intensive and alter workflows if both clients and creative teams do not come to a collective understanding."

Another hurdle that creatives cite is the lack of nuanced understanding of brand-specific elements like colour palettes, fonts, and tone of voice, increasing the risk of misrepresenting brands, as Coetzee points out.

"We need the ability to apply custom guardrails that ensure AI-generated content is client-specific, diverse, and culturally sensitive," says Coetzee. "Our biggest concern as an agency is the legal risks whereby AI might inadvertently misrepresent products. For instance, depicting an automotive colour that a client doesn't actually offer."

Will text-to-video replace filmmakers?

It's natural for filmmakers and videographers to feel concerned about the rise of text-to-video AI or any new technology that threatens their livelihood. But a majority believe these tools should be viewed as an opportunity rather than a threat.

"The human touch in storytelling and emotional resonance remains irreplaceable," says Coetzee. "AI won't replace true artists; it will empower them to reach new heights of creativity."

At this stage at least, it's likely text-to-video will just add another tool to the creator’s toolkit.

"It allows us to visualise more exotic locations and complex scenes that typically require larger budgets or extended timeframes," says Chye. "In the future, the focus may shift more toward the editor and motion artist, but filmmakers and videographers will still be essential to capture key elements that bring the story together."

And while it will be easier to create complex scenes, filmmakers will still need to shoot the human aspects, such as emotions and character interactions, as video AI currently lacks in that area.

"More than being worried about being replaced, I think they [filmmakers] should focus on integrating AI production into their workflows as an offering," says Amaresh Godbole, CEO, digital technology business, Publicis Groupe India.

"We may get to a point where, for certain scales or desired personalisation levels, AI videos make more sense, whereas for the brand magnum opus, we still shoot. We may also see hybrid approaches where customised parts of a script are AI-generated and other parts are shot. All in all, I think it’s an exciting opportunity ahead for filmmakers, so long as they are open to embracing change."

The feature is a part of Campaign's Game Changers 2024 series. Throughout this week we will be navigating the hype and reality of emerging technologies and its implications in advertising.