AI Dynamics

Global AI News Aggregator

About

RLHF and Prompting Techniques for Targeted Model Behavior

This means that if we have a target behavior (e.g. non-discrimination) we may be able to nudge models to achieve that target using IF/CoT prompting if RLHF alone is not sufficient. But we must be careful to check whether RLHF + prompting causes the models to overshoot the target.

→ View original post on X — @anthropicai