AI’s Progress Now Depends on ‘World Models’ That Grasp Physical Reality

AI’s Progress Now Depends on ‘World Models’ That Grasp Physical Reality

Simply put

  • Fei-Fei Li, a Stanford computer science professor, said advances in AI are currently limited by systems that cannot understand physical space.
  • A world model is designed to simulate the environment and predict how the scene will change over time.
  • Early prototypes like Marble suggest how these models could reshape creative work, robotics, and science.

Robots and multimodal artificial intelligence still cannot grasp the physical world, and one prominent researcher says this shortcoming is currently the field’s biggest obstacle.

Fei-Fei Li, a computer scientist at Stanford University widely credited as a pioneer of modern computer vision, says the gap between AI and physical reality has become the technology’s most pressing problem, and argues that bridging this gap will require systems built around spatial reasoning, not just language.

AI is rapidly approaching the limits of text-based learning, and progress will ultimately depend on “world models,” Lee said in a report released Monday.

“At the core of unlocking spatial intelligence is the development of world models, a new type of generative AI that must address a fundamentally different set of challenges than LLM. These models must generate spatially coherent worlds that follow the laws of physics, process multimodal inputs from images to actions, and predict how those worlds will evolve and interact over time,” Li writes about X.

What exactly are these models?

The concept of the “world model” dates back to the early 1940s.when Scottish philosopher and psychologist Kenneth Crake conducted research in cognitive science..

This idea resurfaced in modern AI after a 2018 paper by David Ha and Jürgen Schmidhuber showed that neural networks can learn compact internal models of their environments and use them as simulators for planning and control.

Lee argued that world models are important because robots and multimodal systems still struggle with grounded spatial reasoning, unable to judge distance or scene changes or predict basic physical outcomes.

“Robots as human collaborators can expand parts of the workforce that desperately need more labor and productivity, whether it’s supporting scientists in the lab or assisting elderly people living alone,” Lee wrote. Real environments follow rules that current machines cannot capture, Lee argues.

From gravitational motion to materials that affect light, solving this requires a system that stores spatial memory and can model scenes in two or more dimensions.

In September, Lee’s company World Labs released a beta version of Marble, an early world model that generates explorable three-dimensional environments from text and image prompts.

Users could roam through these worlds without time limits or scene shifting, and the environments remained consistent rather than deforming or falling apart, the company claims.

“Marbles are just the first step in creating a truly spatially intelligent model of the world,” Lee wrote. “As progress accelerates, researchers, engineers, users, and business leaders alike are beginning to realize its extraordinary potential. The next generation of world models will enable machines to achieve entirely new levels of spatial intelligence, an achievement that unlocks critical capabilities that are still largely missing from today’s AI systems.”

Lee said use cases for the world model include support for a variety of applications to give AI an internal understanding of how the environment behaves.

Creators can use them to explore scenes in real-time, robots can use them to move and manipulate objects more safely, and scientific and medical researchers can run spatial simulations and improve imaging and laboratory automation.

Lee connected spatial intelligence research to early biological research, noting that humans learned to perceive and act long before developing language.

“Long before written words, humans told stories, painted pictures on cave walls, passed them down from generation to generation, and built entire cultures around shared stories,” she writes. “Stories are how we make sense of the world, connect across distance and time, explore what it means to be human, and most importantly, find meaning in life and love within ourselves.”

Lee said AI needs the same foundation to function in the physical world, arguing that AI’s role should be to support people, not replace them. But progress depends on models that not only explain the world, but also understand how the world works.

“The next frontier in AI is spatial intelligence, the technology that turns seeing into inference, perception into action, and imagination into creation,” Lee said.

generally intelligent Newsletter

A weekly AI journey told by Gen, a generative AI model.

Leave a Reply

Your email address will not be published. Required fields are marked *