Discussion about this post

User's avatar
Nathan Lambert's avatar

Seems like this could be missing RLHF as an example of where we can use RL, which already points to how soft verifiers can be mixed in with verifiable. I think the appetite for RL is now higher on the capabilities side, so there's line for optimism.

Of course, I agree with all the limitations when it comes to applying robotics/rl classic like ideas.

Expand full comment
Godwyll Aikins's avatar

Amazing blog! I’d love to see a part two that explores the tools we can use to address these RL constraints. No RL algorithm can fully overcome them yet, but as you showed, we have ways to mitigate them-especially in robotics. 🦾

Of the three main constraints, input observation is the toughest-especially with RGB images and generalization. Right now, simulations have to be almost 1:1 with the real world. The main solution seems to be adding more modalities or learning better latent representations.

For validation, I’m really curious to see what people come up with using VLMs as a replacement for RLHF. There’s a lot of potential there.

Exploration is still tricky, but starting with a policy trained via supervised learning or offline RL can help bootstrap the process.

But the biggest challenge, IMO, is that the problem always has to be clearly bounded. RL just can’t handle long-horizon tasks without a sophisticated framework. We definitely need a breakthrough in hierarchical RL!

Expand full comment
5 more comments...

No posts