Discussion about this post

User's avatar
ScientistMom's avatar

Thank you for the great breakdown on world models! I just wrote about AI agents navigating the world of Minecraft, explaining their training methods, and how these actually mimic the behavior of children. I'd love to have your feedback if you are interested in reading about this: https://substack.com/home/post/p-191864960

Avik De's avatar

Great summary! In terms of the categorization is it unfair to put the DualWorld thing roughly in the video model + inverse dynamics bucket? It seems to me to be similar but that the V-JEPA in that DualWorld is just more complex (working over a longer sequence and at a faster rate, potentially). Either way, decoupling the robot form and action head from the big video model is a helpful feature for generalizability for both of those categories, IMO.

5 more comments...

No posts

Ready for more?