When we launched computer use, we used hierarchical summarization to evaluate usage patterns against a set of guidelines. This helped us flag individual misuses for later human review. This is a promising technique, but it's also early-stage research with room to improve.
Hierarchical Summarization for AI Computer Use Safety Evaluation
By
–
