Extract from ABC News
Mind-boggling AI is thick on the ground in 2024, but even the most hardened AI experts are impressed by OpenAI's new text-to-video tool, Sora.
"This appears to be a significant step," according to Professor Toby Walsh, Chief Scientist at the AI Institute, University of New South Wales.
Sora, which is Japanese for "empty sky", can create detailed and convincing videos up to a minute long from simple text prompts or a still image.
"The model has a deep understanding of language … and generates compelling characters that express vibrant emotions," OpenAI said in a blog post announcing the new model on Friday morning.
"This will transform content creation," Professor Walsh said.
Still, it's not perfect. Not yet, anyway.
One user posted a surreal video of half a dozen dogs emerging from a single dog:
OpenAI said Sora might struggle to accurately simulate "the physics of a complex scene", and may not understand cause and effect in some cases.
"For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark," it stated in its blog post.
It's not the world's first text-to-video AI tool.
Google and smaller companies such as Runway have their own models, which have similar functions to OpenAI's.
However, early users of the new model have praised Sora for its detailed and mostly realistic looking output.
So far, Sora has only been released to a small number of artists and "red teamers" — expert researchers employed to actively look for problems with the model, including bias, hateful content, and misinformation.
OpenAI hasn't confirmed it will release the model to the public, or given any kind of time frame for release if it were to.
However, the company is strongly signalling that a public release is on the cards.
If and when that time comes, Professor Toby Walsh will be watching for the impact on misinformation.
"With text-to-image tools, we saw fake images such as Trump being arrested by the NYPD, soon after such tools were first released," he said.
"I expect these new text-to-video tools will be used to generate fake video to influence the US and other elections."
OpenAI is also planning to include a watermarking system, so members of the public can check if the video was made by Sora.
Existing watermarking systems have already proven relatively easy to circumvent, with the right skills.
"We are not used to disbelieving the video we see. Now you have to consider any digital content as suspicious," Professor Walsh said.
What about the risks?
Text-to-video's potential for harm extends beyond just misinformation.
One user of the social media platform X remarked, "the future of porn just changed forever".
In August, Australia's eSafety Commissioner warned that AI was being used by teenagers to create sexually explicit images of their peers.
OpenAI is up front about the risks.
"We cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it," it said in its statement.
It said if the model was released, it would have an in-built system to reject any text prompts that violate its policies, such as requesting "extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others".
If OpenAI does choose to release the model, there may be risks for the company too, in the form of copyright lawsuits.
The company is currently facing several suits over the training data for its language model, ChatGPT, and image model, DALL-E.
The most high profile of those has been brought by the New York Times, which is suing both OpenAI and its partner, Microsoft over the alleged improper use of its news content.
In a statement to the New York Times, responding to questions about the release of Sora, OpenAI claims the model is trained on publicly available and licensed videos only — a statement Professor Walsh describes as "telling".
"I expect they're trying to avoid all the court cases they're now defending for their text-to-image tool DALL-E where they weren't as careful," he said.
Beyond the short term impacts, good and bad, OpenAI is framing Sora as a forward leap on the road towards Artificial General Intelligence — AI that exceeds human capabilities overall.
"Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI," it said.
Professor Walsh is a little more cautious in his assessment, but concedes Sora represents progress in that direction.
"We are still a long way from AGI, even with these tools … but it is another step on the road."
No comments:
Post a Comment