Meta's REFRAG is quite an interesting optimization layer that works on top of any RAG architecture. Essentially, instead of tokenizing all retrieved chunks, it compresses most into embeddings and feeds them directly to the decoder. An RL policy selectively expands only the
Meta’s REFRAG: An AI Optimization Layer for RAG Architectures
By
–