AI Dynamics

Global AI News Aggregator

Self-Training: Filtering Functions and Model Ranking Systems

Self-training: F(x,y) is a filtering/ranking function, eg., what we call a reward/return. The input x may be chosen by humans, but the model generates the y’s and F ranks and selects for further rounds of self-training. F can be explicit or implicit (human in the loop as in RLHF)

→ View original post on X — @nandodf,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *