Our findings show that our fine-tuned BrowseSafe model outperforms both off‑the‑shelf safety classifiers and frontier LLMs used as detectors. These gains are possible through fine-tuning on BrowseSafe-Bench data, allowing us to bypass the reasoning latency of larger models.
BrowseSafe Fine-Tuned Model Outperforms Safety Classifiers
By
–
Leave a Reply