Reinforcement Pre-Training (RPT): A New Scaling Paradigm for LLMs via RL This paper proposes Reinforcement Pre-Training (RPT) — a fresh and ambitious reimagining of large language model (LLM) pre-training: instead of passive next-token prediction via supervised learning, RPT
Reinforcement Pre-Training: Novel LLM Scaling via RL
By
–
