I think an interesting test of LLMs is their ability to engage in coherent puzzle creation. I asked Claude 3 and GPT-4 to create a complex puzzle for a D&D game. They both get very close, but get lost in the complexity, creating puzzles that are almost, but not quite, solvable.