Lots of cool commercial demos of LLM-enabled technologies: Devin the software coding agent, Klarna's customer service, Sora, the Figure robot… All of them may be possible, given what we know of LLMs, but it is worth taking demo claims with tons of caution until there is proof
@emollick
-
LLM Capabilities and Limitations: The Jagged Frontier
By
–
Yes, LLMs are bad at: word games (due to tokenization), math (unless using tools), accurate citations & quotes, etc. Easy to find failures there The whole idea of the Jagged Frontier is that they are bad in ways you may not expect and good at other tasks that are hard for people
-
LLMs Struggle with Word Games Due to Tokenization Issues
By
–
LLMs are notoriously bad at word games thanks to tokenization. You can find lots of things they fail at if you want.
-
US Tech Companies Losing Traditional IT Advantage in AI
By
–
I don't think folks in large US companies realize that their traditional IT advantage over the rest of the world doesn't apply to AI.
-
Uganda’s AI Access Advantage Over US Corporate Restrictions
By
–
Everyone in Uganda with a cellphone with data has access to a better AI than most people at large companies in the US, because those companies have more restrictive AI policies and don't get access to GPT 4.
-
Microsoft Offers Free GPT-4-Turbo Access to 169 Countries
By
–
Microsoft continues to offer the only free way to access a GPT-4 class model, they just upgraded free Copilot (which used to be called Bing, which was secretly named Sydney) to GPT-4-Turbo in Creative and Precise modes. So people in 169 countries have free access to a top LLM.
-
Anthropic’s Claude naming convention misses the mark
By
–
Anthropic came so close to breaking the weird curse that makes AI companies unable to name their products well. Haiku for the smallest model, Sonnet for the mid-sized model, and then… Opus for the largest model? (Epic was available, and is an actual term for a long poem)
-
AI Feedback Tools for K12 Student Writing: GPT-3.5 Study
By
–
Study tries to answer a major question: can AI give good writing feedback to K12 students? Unfortunately, the paper just uses GPT-3.5, which it finds underperforms humans but has “potential as an evaluative tool given tradeoffs between quality and time”
https://
sciencedirect.com/science/articl
e/abs/pii/S0959475224000215
… -
Claude 3 Excels at ASCII Art Better Than GPT-4
By
–
Every previous AI model, including GPT-4, is really bad at ASCII art. Claude 3 is really impressive. (As you can see, it does hallucinate when asked to do more "artistic" work, but does really well on more structured outcomes).
-
Gemini 1.5 Video Reasoning: Safety Detection and Temporal Analysis
By
–
I don't think people are appreciating what capabilities are possible when AI can reason over an entire video (or live video feed). I gave Gemini 1.5 a video of traffic and asked it to identify dangerous situations, and to guess the year of the video, and got accurate answers.