Word games yield valuable insights when evaluating LLMs. We built the SnorkleWordle benchmark to test models on 100 rare English words—and the results are
By
–
Word games yield valuable insights when evaluating LLMs. We built the SnorkleWordle benchmark to test models on 100 rare English words—and the results are