Toggle light / dark theme

AI startup OpenAI has unveiled a text-to-video model, called Sora, which could raise the bar for what’s possible in generative AI.

Like Google’s text-to-video tool Lumiere, Sora’s availability is limited. Unlike Lumiere, Sora can generate videos up to 1 minute long.

Text-to-video has become the latest arms race in generative AI as OpenAI, Google, Microsoft and more look beyond text and image generation and seek to cement their position in a sector projected to reach $1.3 trillion in revenue by 2032 — and to win over consumers who’ve been intrigued by generative AI since ChatGPT arrived a little more than a year ago.

OpenAI on Thursday announced Sora, a brand new model that generates high-definition videos up to one minute in length from text prompts. Sora, which means “sky” in Japanese, won’t be available to the general public any time soon. Instead, OpenAI is making it available to a small group of academics and researchers who will assess harm and its potential for misuse.

“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” the company said on its website. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

One of the videos generated by Sora that OpenAI shared on its website shows a couple walking through a snowy Tokyo city as cherry blossom petals and snowflakes blow around them.

When Lex Friedman visited our MIT AI Venture Studio class to talk about the future of AI, we got into some pretty interesting ideas about the near future.

At the top of Lex’s comments, he talked about disruption – predicting that two new trillion-dollar companies will emerge out of the AI era, and suggesting that Google, Meta and Microsoft will likely not be able to pivot quickly enough to maintain their dominance.

In terms of where we might see this innovation, one of his focus points was on language. Lex pointed out that in America, we take it for granted that everyone speaks English – but around the world, there is an enormous market for real, precise speech translation. People, he said, speak many languages in an “intimate” way – and that requires precision on the part of the technology.

The fourth group is Curium, an Iranian group that has used LLMs to generate phishing emails and code to evade antivirus detection. Chinese state-affiliated hackers have also used LLMs for research, scripting, translations, and refining their tools.

Fight AI with AI

Microsoft and OpenAI say they have not detected any significant attacks using LLMs yet, but they have been shutting down all accounts and assets associated with these groups. “At the same time, we feel this is important research to publish to expose early-stage, incremental moves that we observe well-known threat actors attempting, and share information on how we are blocking and countering them with the defender community,” says Microsoft.