Apple test, doesn't do quite as well as GPT-4
@emollick
-

LLM Sestina Writing: Claude 3 vs GPT-4 vs Grok
By
–
A hard test of a LLM is ability to write a sestina, the hardest poetic form. Claude 3 is very good, and a much better writer, but struggles a little more than GPT-4 with form, messing up a few lines. Both can't pull off the envoi at the end Compare to a 3.5-class model like Grok
-
New Model Underperforms GPT-4 in Real Use Cases
By
–
It does some things worse than GPT-4 in real use cases we have played with, but that could be prompting. We don't know yet.
-
GPT-4 Benchmark Beaten by Competing AI Leaders
By
–
The GPT-4 benchmark has now been beaten by the two other leading AI companies (even if not by a huge margin). It is very much OpenAI's move.
-
Model Shows Strong Programming Capabilities Despite Limited Testing
By
–
I have not tested programming, which is apparently a strong point. Its a really good model, though.
-

Claude 3 Joins GPT-4 Class: Three Leading AI Models Compared
By
–
And then there were three… I got access to the new Anthropic Claude 3 AI a few days ago, so not enough time for a full review, but it was obvious it was GPT-4 class even before they released the testing stats. At the same time, like Gemini Advanced, it doesn't blow GPT-4 away.
-
Prompting Art and Science: Practical Advice Guide
By
–
All of those weird prompting tricks (giving tips, threatening the AI) only work sometimes. The truth is that prompting is often more art than science, yet prompting is still very important. I try to reconcile these facts, and give some prompting advice:
-
Incentive Systems Fail Outside Games Without Monetary Rewards
By
–
It turns out that those same incentive systems don’t work well outside of games (unless the carrot is money)
-
Games and Crowdsourcing: AI-Powered Scientific Discovery Projects
By
–
Some major games as work efforts that got some actual results:
Folding at home: https://
foldingathome.org
Games with a purpose (sort of became Duolingo): https://
cmu.edu/homepage/compu
ting/2008/summer/games-with-a-purpose.shtml
…
Galaxy Zoo: https://
zooniverse.org/projects/zooke
eper/galaxy-zoo/
…
Project Discovery: https://
eveonline.com/discovery
