AI Dynamics

Global AI News Aggregator

About

PPO Explanation Gaps and InstructGPT Fine-tuning Data Requirements

thanks man its a good thread. i feel like people dont explain PPO very well in these threads, thats the main part that always feels handwavy. i’d also like an idea of order of magnitude of finetune data needed for instructgpt but havent seen good numbers

→ View original post on X — @swyx