is it a big model smell you smell?
@jxmnop
-
KL Divergence and ProLong Paper KL Resets Explained
By
–
KL can mean a lot of things! and the ProLong paper does KL resets
-
KL Divergence Measurement on Reinforcement Learning Outputs
By
–
no, because it's KL measured on the RL outputs
-
Scaling Prompt Optimization: Systems, Theory, and Benchmarks Needed
By
–
probably 10x more people should be working on prompt optimization systems (we need a vLLM for promptopt), theory, new techniques, benchmarks. the whole kit and caboodle
-
Model Merging as Compromise for AI Stacking Capabilities
By
–
for stacking capabilities it makes sense in this case model merging is an odd compromise
-
Pretraining Data and Gradient-Matching Training Approaches
By
–
oh cool, is it *pretraining* data? and they do the same gradient-matching thing? i wonder what the advantage is over just mixing in and doing supervised training on a few examples every now and again
-

RLHF Training: Avoiding Model Drift Through Gradient Mixing
By
–
here's some free alpha: if we do RL for too long after pretraining, we will surely overwrite parameters and start to forget things in the original instructGPT paper, their best model mixed RLHF with pretraining gradients to avoid exactly this model drift issue yet no one is
-
Reasoning Model Generalization: Math to Creative Writing Transfer
By
–
i'm looking for good examples of reasoning model generalization for example, a model incentivized via RL to think for a while and solve math problems gets better at creative writing is this common?
-
Prompt Length Limitations for Question Conditioning
By
–
the prompt doesn't condition on the question. and it has bounded length. so too short to contain all possible questions.
-

Models achieving gold without reinforcement learning through prompt engineering
By
–
this seems really important: it is totally plausible that a model could get IMO gold without *any* reinforcement learning, given a perfectly-crafted prompt we just don't know, and lack tools to efficiently search through prompt space. glad to see at least someone is trying