Discussion about this post

User's avatar
TJ's avatar

Why isn’t continual fine tuning of transformers on individual user corpuses more common?

Neural Foundry's avatar

The analogy to how the human brain handles memory at different timescales is really stricking here. Most attempts at lifelong learning feel forced, but this nested optimization approach seems much more naturla. I wonder if the trade off in computational cost will be worth it though, since transformers are so efficient at infrence right now.

2 more comments...

No posts

Ready for more?