AI Dynamics

Global AI News Aggregator

RLHF and Prompting Techniques for Targeted Model Behavior

This means that if we have a target behavior (e.g. non-discrimination) we may be able to nudge models to achieve that target using IF/CoT prompting if RLHF alone is not sufficient. But we must be careful to check whether RLHF + prompting causes the models to overshoot the target.

→ View original post on X — @anthropicai,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *