OpenAI Released Sora, the First Text-Generated Video Model: Opening a New Era of AI Video Generation
OpenAI's release of Sora, the first text-generated video model, on February 16th, 2024, was a landmark milestone. This is probably the most disruptive technological revolution in the last six months, as it creates video in the true sense of the word, as opposed to AI video technologies like Runway or SDV, which can only generate seconds and small motion trails. The release of Sora marks a major leap in AI's ability to understand and interact with real-world scenarios, and opens up a new era of professionalism in the field of content creation by making it less difficult for a new era of AI video generation.
Sora is a general-purpose visual data model capable of generating high-fidelity videos of up to one minute based on textual descriptions entered by the user. It inherits the picture quality and command-following capability of Dall-E-3, and can quickly produce high-fidelity videos based on user text prompts, as well as acquire existing still images and generate videos from them. The model is able to understand the physical properties and their relationships between different elements in a complex scene, thus deeply simulating the real physical world and generating complex scenes with multiple characters that contain specific movements. Sora is a diffusion model that starts with a video that resembles static noise and gradually removes the noise through multiple steps, and the video is transformed from initially random pixels to a clear image scene, with the ability to generate multiple predicted frames at once, ensuring that the subject of the image remains coherent even when it is temporarily out of view.
OpenAI is an artificial intelligence company founded on December 11, 2015 by Sam Altman, Elon Musk and others. Since its inception, OpenAI has been deeply exploring the field of big models.2022 In November, OpenAI launched ChatGPT, a chatbot, which demonstrated the transcendence of AI's textual comprehension and logic ability compared to the past. With hundreds of millions of active users in just two months, the launch of ChatGPT is a milestone for AI-generated content (AIGC), leading a new revolution in the field of AIGC.
The release of Sora is the latest achievement in OpenAI's exploration of the field of large models. Sora builds on past research on Dall-E and GPT models, inherits the image quality and command-following capabilities of Dall-E-3, and generates both real and imagined scenes with textual prompts from the user. Individual videos are less than 1 minute in length, and can create complex scenes with numerous characters and backgrounds that contain specific movements. The release of Sora also sparked widespread industry attention.2023 to early 2024, Meta, Google, and other tech companies are releasing similar AI models for text-generated video. Visual algorithms breakthroughs in generalizability, promptability, generation quality and stability have driven the arrival of a technological inflection point as well as the emergence of explosive applications, and areas such as 3D asset generation and video generation have benefited from the maturation of diffusion algorithms.
However, the release of Sora also poses some challenges: Sora is limited in that it may not accurately model the physics of complex scenarios and may not be able to understand cause-and-effect relationships. For example, in the video footage described in the text as "five gray wolf pups playing and chasing each other on a remote gravel road," the number of wolves varies, with some appearing and disappearing out of nowhere. The model may also obfuscate spatial details of cues, such as confusing left and right, and may have difficulty accurately describing events over time, such as following a particular camera track. Nonetheless, the release of Sora is undoubtedly a major advancement in the field of AI. The introduction of Sora marks a major leap forward in AI's ability to understand and interact with real-world scenarios, and is a significant step towards realizing the goal of general artificial intelligence (AGI).
Sora, a new generative video model introduced by OpenAI, has received widespread attention and high praise from the community. MIT Technology Review rated Sora highly as an amazing new generative video model that will be one of the tech trends to watch in 2024. The magazine also noted that text-to-video generation is a hot research area, and the emergence of Sora has certainly reinvigorated the field. Tim Brooks, a scientist at OpenAI, also had very positive things to say about Sora. He believes that building models that can understand video and understand all these very complex interactions in our world is a very important step for all future AI systems. This suggests that the emergence of Sora is not only an important breakthrough for OpenAI, but also has a profound impact on the development of the entire field of artificial intelligence.
360 founder Zhou Hongyi also expressed great interest in the birth of Sora. He believes that the emergence of Sora could mean that the time to realize general artificial intelligence (AGI) could be shortened from 10 years to one or two years. He pointed out that OpenAI has taken advantage of its large language model to enable Sora to achieve understanding and simulation of the real world, so that the videos generated are realistic and can go beyond 2D to simulate the real physical world. Tang Lin Yao, an associate researcher at the Institute of Law of the Chinese Academy of Social Sciences, from the published videos, believes that Sora, compared to other video-based generative AIs, has improved dramatically in terms of picture clarity, content fluency, ideological depth and excitement. All these evaluations show the important position and great potential of Sora in the field of artificial intelligence.
What's Your Reaction?