4o is pretty great with individual queries, first turn-type stuff, etc. I'm guessing the majority of lmsys votes are on shorter conversations. It falls apart when longer context is needed.
By
–
4o is pretty great with individual queries, first turn-type stuff, etc. I'm guessing the majority of lmsys votes are on shorter conversations. It falls apart when longer context is needed.