4 Comments
User's avatar
TJ's avatar

Why isn’t continual fine tuning of transformers on individual user corpuses more common?

Chris Paxton's avatar

Two reasons: 1) it loses a lot of the scale advantages you get from ml systems right now, due to being much more expensive than inference with all the tricks people have figured out, and 2) it has the "infinite data" issue mentioned in the article, where you start to need an insane amount of storage per user or you start to run into catastrophic forgetting

Neural Foundry's avatar

The analogy to how the human brain handles memory at different timescales is really stricking here. Most attempts at lifelong learning feel forced, but this nested optimization approach seems much more naturla. I wonder if the trade off in computational cost will be worth it though, since transformers are so efficient at infrence right now.

Chris Paxton's avatar

Oh yeah i agree, its really cool and compelling. I could even imagine tweaking the learning rates lower as your system "ages" which almost certainly happens with humans... but its very new, and probably will take a long time to compete with transformers if ever