What if we could train powerful AI retrievers to find specialized information without needing huge, expensive datasets?
— 机器之心 JIQIZHIXIN (@jiqizhixin) 9 avril 2026
A team from TU Darmstadt, University of Washington, CMU, Microsoft, and Tencent AI Lab presents Revela!
Revela leverages self-supervised language modeling. It… pic.twitter.com/bC4emjEYdV
What if we could train powerful AI retrievers to find specialized information without needing huge, expensive datasets? A team from TU Darmstadt, University of Washington, CMU, Microsoft, and Tencent AI Lab presents Revela! Revela leverages self-supervised language modeling. It teaches retrievers to understand semantic relationships between document segments by predicting "next chunks" of info, integrating retriever similarity scores directly into this learning process. Without any annotated query-document pairs, Revela surpasses larger supervised models and proprietary APIs in code and reasoning-intensive domains. It achieves unsupervised SoTA on general benchmarks with ~1000x less data and 10x less compute! Revela: Dense Retriever Learning via Language Modeling Paper: openreview.net/forum?id=e7pA… Code: github.com/TRUMANCFY/Revela Model: huggingface.co/trumancai/Rev… Our report: mp.weixin.qq.com/s/9TmVSNHMQ… 📬 #PapersAccepted by Jiqizhixin
Leave a Reply