Definitely a big limitation. Overall, the problem with synthetic data is generalization beyond the benchmarks that they target in the first place. This is where the most interesting results can be found.
@maximelabonne
-

Synthetic Pretraining Improves Reasoning in Sub-1B Models
By
–
Synthetic pretraining for sub-1B reasoning models Cool write-up from Tufa Labs (Matteo Saponati) on whether synthetic data augmentation actually helps very small (<1B) models reason better. They pretrain a 0.8B model with the Qwen3 architecture from scratch on 12B tokens of
-
In-Car AI Assistant Translates Speech into Personalized Function Calls
By
–
For example, an in-car assistant that translates speech (intents) into function calls with extra personalization
-

Liquid’s Approach to Small Models and Agentic Reinforcement Learning
By
–
Quick chat about what Liquid does with small models, unique challenges, and smol agentic RL 👀
— Maxime Labonne (@maximelabonne) 28 avril 2026
Thanks for the invitation @josephinePqt1! https://t.co/CDpKefwLPLQuick chat about what Liquid does with small models, unique challenges, and smol agentic RL Thanks for the invitation @josephinePqt1
! -
Real-World AI Use Cases: Function Calling and Data Extraction
By
–
Some real-word use cases from past year: function calling, data extraction, query rewriting, personal assistants, intent classification, complexity routing, data rephrasing, image captioning, object detection, live translation, etc. We actively work in this field at @liquidai
! -
RL Datasets Addition Planned for Next Update
By
–
I'm planning to (finally) add RL datasets in the next update
-

LLM Datasets Update: New Collections and Thinking Features
By
–
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs. > Added many new datasets
> New "thinking" column
> Refreshed recommended tools. Thanks to everyone who told me they used it for their research at ICLR, you motivated this update! -

LFMs Enable Speech and Reasoning Directly in Vehicle Edge
By
–
LFMs powering speech, language understanding, and reasoning directly inside the vehicle Very exciting times for edge models!
-
Leap-finetune demonstration request for AI model optimization
By
–
Amazing, please do another one with leap-finetune!
-
ColBERT Memes Trend Takes Over Social Media Timeline
By
–
My timeline is full of ColBERT memes. How did that happen?