AI Dynamics

Global AI News Aggregator

Fine-tuning and On-Policy RL for Model Optimization

We first fine-tune the model to follow instructions, stay within guardrails, and keep language consistent. Then we run on‑policy RL to improve search accuracy and tool efficiency while preserving those behaviors.

→ View original post on X — @perplexity_ai,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *