OpenAI’s recent — and first! — video-generating model, Sora, can pull off some genuinely impressive cinematographic feats. However the model’s even more capable than OpenAI initially made it out to be, at the very least judging by a technical paper published this evening.
The paper, titled “Video generation models as world simulators,” co-authored by a number of OpenAI researchers, peels back the curtains on key features of Sora’s architecture — as an illustration revealing that Sora can generate videos of an arbitrary resolution and aspect ratio (as much as 1080p). Per the paper, Sora’s in a position to perform a variety of image and video editing tasks, from creating looping videos to extending videos forwards or backwards in time to changing the background in an existing video.
But most intriguing to this author is Sora’s ability to “simulate digital worlds,” because the OpenAI co-authors put it. In an experiment, OpenAI set Sora loose on Minecraft and had it render the world — and its dynamics, including physics — while concurrently controlling the player.
So how’s Sora in a position to do that? Well, as observed by senior Nvidia researcher Jim Fan (via Quartz), Sora’s more of a “data-driven physics engine” than a creative too. It’s not only generating a single photo or video, but determining the physics of every object in an environment — and rendering a photograph or video (or interactive 3D world, because the case could also be) based on these calculations.
“These capabilities suggest that continued scaling of video models is a promising path towards the event of highly-capable simulators of the physical and digital world, and the objects, animals and those who live inside them,” the co-authors write.
Now, Sora’s usual limitations apply within the video game domain. The model can’t accurately approximate the physics of basic interactions like glass shattering. And even with interactions it can model, Sora’s often inconsistent — for instance rendering an individual eating a burger but failing to render bite marks.
Still, if I’m reading the paper accurately, it seems Sora could pave the best way for more realistic — even perhaps photorealistic — procedurally generated games. That’s in equal parts exciting and terrifying (consider the deepfake implications, for one) — which might be why OpenAI’s selecting to gate Sora behind a very limited access program for now.
Here’s hoping we learn more sooner moderately than later.