Saturday, 17 February 2024

OpenAI launches video model that can instantly create short clips from text prompts.

Extract from ABC News 

ABC News Homepage


Sora can instantly generate short videos, such as this one, from text prompts.(Supplied: OpenAI)

Mind-boggling AI is thick on the ground in 2024, but even the most hardened AI experts are impressed by OpenAI's new text-to-video tool, Sora.

"This appears to be a significant step," according to Professor Toby Walsh, Chief Scientist at the AI Institute, University of New South Wales.

Sora, which is Japanese for "empty sky", can create detailed and convincing videos up to a minute long from simple text prompts or a still image.

"The model has a deep understanding of language … and generates compelling characters that express vibrant emotions," OpenAI said in a blog post announcing the new model on Friday morning.

"This will transform content creation," Professor Walsh said.

Still, it's not perfect. Not yet, anyway.

One user posted a surreal video of half a dozen dogs emerging from a single dog:

The technology isn't perfect: OpenAI says this video illustrates how animals or people can spontaneously appear.(Supplied: OpenAI)

OpenAI said Sora might struggle to accurately simulate "the physics of a complex scene", and may not understand cause and effect in some cases.

"For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark," it stated in its blog post.

It's not the world's first text-to-video AI tool.

Google and smaller companies such as Runway have their own models, which have similar functions to OpenAI's.

However, early users of the new model have praised Sora for its detailed and mostly realistic looking output.

This video was created with the prompt, "A Chinese Lunar New Year celebration video with Chinese Dragon."(Supplied: OpenAI)

So far, Sora has only been released to a small number of artists and "red teamers" — expert researchers employed to actively look for problems with the model, including bias, hateful content, and misinformation.

OpenAI hasn't confirmed it will release the model to the public, or given any kind of time frame for release if it were to.

However, the company is strongly signalling that a public release is on the cards.

The lengthy prompt for this video included instructions on this man's appearance and dress, as well as making him "deep in thought pondering the history of the universe".(Supplied: OpenAI)

If and when that time comes, Professor Toby Walsh will be watching for the impact on misinformation.

"With text-to-image tools, we saw fake images such as Trump being arrested by the NYPD, soon after such tools were first released," he said.

The prompt for this eye video included instructions such as "cinematic" and "film shot in 70mm".(Supplied: OpenAI)

"I expect these new text-to-video tools will be used to generate fake video to influence the US and other elections."

OpenAI is also planning to include a watermarking system, so members of the public can check if the video was made by Sora.

Existing watermarking systems have already proven relatively easy to circumvent, with the right skills.

"We are not used to disbelieving the video we see. Now you have to consider any digital content as suspicious," Professor Walsh said.

What about the risks?

Text-to-video's potential for harm extends beyond just misinformation.

One user of the social media platform X remarked, "the future of porn just changed forever".

In August, Australia's eSafety Commissioner warned that AI was being used by teenagers to create sexually explicit images of their peers.

OpenAI is up front about the risks.

"We cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it," it said in its statement.

It said if the model was released, it would have an in-built system to reject any text prompts that violate its policies, such as requesting "extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others".

Sora-generated video of Big Sur's garay point beach in California.(Supplied: OpenAI)

If OpenAI does choose to release the model, there may be risks for the company too, in the form of copyright lawsuits.

The company is currently facing several suits over the training data for its language model, ChatGPT, and image model, DALL-E.

The most high profile of those has been brought by the New York Times, which is suing both OpenAI and its partner, Microsoft over the alleged improper use of its news content.

In a statement to the New York Times, responding to questions about the release of Sora, OpenAI claims the model is trained on publicly available and licensed videos only — a statement Professor Walsh describes as "telling".

"I expect they're trying to avoid all the court cases they're now defending for their text-to-image tool DALL-E where they weren't as careful," he said.

The text prompt for this video requested "a litter of golden retriever puppies playing in the snow".(Supplied: OpenAI)

Beyond the short term impacts, good and bad, OpenAI is framing Sora as a forward leap on the road towards Artificial General Intelligence — AI that exceeds human capabilities overall.

"Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI," it said.

Professor Walsh is a little more cautious in his assessment, but concedes Sora represents progress in that direction.

"We are still a long way from AGI, even with these tools … but it is another step on the road."

Another AI-generated video set in Tokyo, this time featuring a "stylish woman" walking "confidently and casually".(Supplied: OpenAI)

No comments:

Post a Comment