I did a big clean up of some new models to add to the BullshitBench – none of them are particularly interesting tbh. Qwen scored relatively well, but below Qwen 3.5
By
–

I did a big clean up of some new models to add to the BullshitBench – none of them are particularly interesting tbh. Qwen scored relatively well, but below Qwen 3.5