Two reasons: 1) it loses a lot of the scale advantages you get from ml systems right now, due to being much more expensive than inference with all the tricks people have figured out, and 2) it has the "infinite data" issue mentioned in the article, where you start to need an insane amount of storage per user or you start to run into catastrophic forgetting
The analogy to how the human brain handles memory at different timescales is really stricking here. Most attempts at lifelong learning feel forced, but this nested optimization approach seems much more naturla. I wonder if the trade off in computational cost will be worth it though, since transformers are so efficient at infrence right now.
Oh yeah i agree, its really cool and compelling. I could even imagine tweaking the learning rates lower as your system "ages" which almost certainly happens with humans... but its very new, and probably will take a long time to compete with transformers if ever
Why isn’t continual fine tuning of transformers on individual user corpuses more common?
Two reasons: 1) it loses a lot of the scale advantages you get from ml systems right now, due to being much more expensive than inference with all the tricks people have figured out, and 2) it has the "infinite data" issue mentioned in the article, where you start to need an insane amount of storage per user or you start to run into catastrophic forgetting
The analogy to how the human brain handles memory at different timescales is really stricking here. Most attempts at lifelong learning feel forced, but this nested optimization approach seems much more naturla. I wonder if the trade off in computational cost will be worth it though, since transformers are so efficient at infrence right now.
Oh yeah i agree, its really cool and compelling. I could even imagine tweaking the learning rates lower as your system "ages" which almost certainly happens with humans... but its very new, and probably will take a long time to compete with transformers if ever