Skills are easily my favorite Claude feature this year
@whats_ai
-
Monthly AI Model Releases Demand Frequent Evaluation Cycles
By
–
Monthly model releases means we're going to need monthly evals haha
-
Junior vs Senior: Different Perspectives on AI Technology Adoption
By
–
The discrepency is always junior reaction "wow it's perfect", while more senior is "why is it doing it like this", and hopefully its a senior's reaction with openness rather than just thinking its useless.
-
Embedding Leaderboard Performance Fails on Real Data
By
–
Embedding leaderboard wins evaporate the second you swap to your actual corpus haha
-
AI Model Evaluation Benchmarks Limited Testing Scope Critique
By
–
Same energy as "we only tested on MMLU" a year ago haha
-
ChatGPT Hallucinations: How Often Does It Fabricate Facts?
By
–
Have you ever caught ChatGPT making up a fact you knew was wrong?
-
Grounding: Why Perplexity Cites Sources, ChatGPT Sometimes Doesn’t
By
–
Grounding is why Perplexity always cites a source, and ChatGPT sometimes doesn't.
-

Grounding: How Document Upload Improves AI Accuracy
By
–
Ever uploaded a document to ChatGPT and asked a question about it? The answer you got came from grounding. When you ask a model a question without any file, it answers from memory.
Whatever it learned during training. Sometimes right, sometimes made up. Grounding forces the -
AI Engineer Europe Workshop Full Session Now Available
By
–
Huge thanks to AI Engineer Europe for inviting us.
Our full 2-hour workshop is on YouTube now: https://
youtu.be/mYSRn6PC1mc?si
=LY1UnXIb0WlfeQjY
… -
Claude Plays Pokémon Becomes Legitimate Agent Benchmark
By
–
Claude Plays Pokémon turning into a legit agent benchmark is awesome