6). Evaluation Feature Steering in LLMs – evaluates featuring steering in LLMs using an experiment that artificially dials up and down various features to analyze changes in model outputs; it focused on 29 features related to social biases and study if feature steering can help
Feature Steering in LLMs: Evaluating Social Bias Control
By
–
Leave a Reply