Bringing robot skills from simulation to the…

Chris Paxton

Nov 21, 2024

We should use simulation data at scale for robot foundation models

Read →

3 Comments

sarhaan

Nov 22, 2024

At what level of granularity is data, say for humanoids, useful?

I'm trying to think if a dumb way of scaling this is to just open a large warehouse in a country where labor is cheap, divide it into 100s of rooms with green screens, have people do different tasks, etc.

If we can infer the required granularity with just cameras then it wouldn't be that expensive either

Expand full comment

Reply (1)

Chris Paxton

Nov 22, 2024

Really good question.

I think *human* video data is inherently limited for other reasons - there's always an environment mismatch, and i think a lot of "learn from human video data" research papers seem to cap out at 80-90% success rates and are unable to drive past that without lots of robot data as well. and that's on easy tasks.

So imagine you instead have this huge warehouse filled with humanoids; like Tesla Optimus could definitely do this (and may in fact be doing it). As you say, greenscreens can give lots of visual diversity for "free."

but you still have the problems of iteration. It takes a lot of time and money to set up that warehouse full of humanoids. simulation lets you scale this in the same way you might scale *software* - not quite as fast, because you still have to close the loop on hardware of course - but it takes a lot of the real-world iteration out, and thats incredibly valuable.

in addition, you'd be surprised how often you end up with subtly correlated features when collecting data. Big learning from demonstration efforts often throw out a TON of data. this is a huge issue if you're collecting in the real world.

Expand full comment

Reply (1)

sarhaan

Nov 25, 2024

Interestingg, thanks for the reply!!

Are world models the 3rd way of generating data then?

Lots of companies are doing amazing work with video gen models, I wonder if you can fine tune them on videos of your form factor? (unsure how to incorporate tactile information)

I know 1X and comma are working in this direction, but I'm curious what your mental model of this topic is?

Expand full comment

It Can Think!

Bringing robot skills from simulation to the…