Will World Models Allow Robots to Think?

Chris Paxton

Mar 26

A brief review of the technology powering a new generation of intelligent robots

Read →

7 Comments

Julian Estevez

Thank you for the clear summary of these models! It's such a useful article.

ScientistMom

Thank you for the great breakdown on world models! I just wrote about AI agents navigating the world of Minecraft, explaining their training methods, and how these actually mimic the behavior of children. I'd love to have your feedback if you are interested in reading about this: https://substack.com/home/post/p-191864960

Avik De

Great summary! In terms of the categorization is it unfair to put the DualWorld thing roughly in the video model + inverse dynamics bucket? It seems to me to be similar but that the V-JEPA in that DualWorld is just more complex (working over a longer sequence and at a faster rate, potentially). Either way, decoupling the robot form and action head from the big video model is a helpful feature for generalizability for both of those categories, IMO.

Reply (1)

Chris Paxton

Yeah I think that is reasonable. I think the framing of it as hierarchical planning just with video is interesting, although I cant find an actual paper so who knows.

Justin Bayer

You are saying that Dreamer etc don't have good long-horizon capabilities, but then Dreamer4 actually solves the diamond challenge. Do you have any other evidence?

jaycee

When they learn how air traffic controllers think.

Edward Grundy

How are world models used in practice with robot hardware? Are they coupled with hardware or can models be delivered separately from hardware and interface with the robot controller?

Decoupling suggests a huge market for building models, with benefits to data hoarders and so on.

It would be quite ironic if "AI hardware* turned out to be humanoid robots rather than lifestyle pins!

It Can Think!

Will World Models Allow Robots to Think?