on a technical level it feels like we're automating post-training we want to take a bunch of data and output a post-trained model – then iterate this process many times. continual learning by induction the ideal algorithm is an interpolation of OPSD and RL with a lot of
Automating post-training with continual learning and OPSD-RL interpolation
By
–