L-Mul: Addition-Only Multiplication can slash computational costs by 80%! Researchers at @MSFTResearch dropped a groundbreaking technique that could slash the energy use in transformer computations : their novel "linear-complexity multiplication" (L-Mul) algorithm
@aymericroucher
-

Simplified old-school RNNs rival modern transformers
By
–
Old-school RNNs can actually rival fancy transformers! Remember good old RNNs (Recurrent Neural Networks)? Well, researchers from Mila and @BorealisAI have just shown that simplified versions of decade-old RNNs can match the performance of today's transformers. They took a
-

Chinese AI models expanding globally but underrated
By
–
出海: Chinese AI is expanding globally Chinese LLMs are heavily underrated. I regularly feel like Chinese AI releases do not get the recognition they deserve, for instance the recent excellent Deepseek-v2.5 or Qwen models. Luckily for us, @AdinaYakup just wrote an
-

Everyday Transformers Optimizations: KV Cache, FlashAttention, PagedAttention
By
–
This blog post is really cool: to understand everyday Transformers optimizations like KV cache, FlashAttention or PagedAttention: https://
astralord.github.io/posts/transfor
mer-inference-optimization-toolset/
… Image below is the interactive visualization for KB cache! -

Emu3: single model handles text, images, and videos
By
–
> Emu3: Next-token prediction conquers multimodal tasks This is the most important research in months: we’re now very close to having a single architecture to handle all modalities. The folks at BAAI just released Emu3, a single model that handles text, images, and videos all
-
Add source highlighting to your RAG system for trust
By
–
> Add source highlighting to your RAG system! 📄💡
— m_ric (@AymericRoucher) 1 octobre 2024
RAG systems are supposed to make your LLM's answer more trustworthy, by inserting in the prompt some supporting documents from a knowledge base : we say that we're "adding some context".
👎 But if you don't know which part of… pic.twitter.com/5KmE7wcMua> Add source highlighting to your RAG system! RAG systems are supposed to make your LLM's answer more trustworthy, by inserting in the prompt some supporting documents from a knowledge base : we say that we're "adding some context". But if you don't know which part of
-

Transformers v4.45.0: lightning-fast method to build tools
By
–
Transformers v4.45.0 released: includes a lightning-fast method to build tools! During user research with colleagues @MoritzLaurer and Joffrey Thomas, we discovered that the class definition currently in used to define a Tool in transformers.agents is a bit tedious to use,
-
Understanding Attention: K and V matrices, masking, and -inf for softmax
By
–
This is a must-watch to understand how attention works! Great visualization, explaining:
– Why the K and V matrix, what do they represent?
– Why mask the lower left part of the KV product?
– Why apply -inf to the lower left part of the KV product before softmax rather than just -
IBM and NASA release open-source AI model for weather climate
By
–
Read the announcement post https://
newsroom.ibm.com/2024-09-23-ibm
-and-nasa-release-open-source-ai-model-on-hugging-face-for-weather-and-climate-applications
… Model on the Hub https://
huggingface.co/Prithvi-WxC -

First Foundation Weather Model Prithvi WxC Enables Life-Saving Predictions
By
–
> The first ever Foundation weather model: Prithvi WxC enables life-saving weather predictions! Hurricane Katrina killed hundreds of people as it made landfall on New Orleans in 2005 – many of these deaths could have been avoided if alerts had been given one day earlier.
