here are two awesome researchers you should follow: @WeijiaShi2 at UW and @wzhao_nlp at Cornell!! some of their recent work: weijia shi (
@WeijiaShi2
): – built INSTRUCTOR, the embedding model that lots of startups / companies use (
http://
arxiv.org/abs/2212.09741)
– proposed a more
@jxmnop
-
Follow Top NLP Researchers: INSTRUCTOR Embedding Model Creators
By
–
-
SSM Papers on Image Generation Research
By
–
this is cool, I think @NathanYan2012 has written some papers on generating images using SSMs
-
Condensing Token Embeddings into Single Vectors for Compute Efficiency
By
–
oh yes I've been thinking about some similar things. so you want to condense sequences of token embeddings into a single vector to save compute. I think my research shows this should work in theory (you can condense lots of text into a vector without losing information)
-
Flash Attention: Why Did Breakthrough Technology Take Years?
By
–
if that were true we would have gotten flash attention years earlier
-
GPU Programming and Transformers: Essential ML Systems Knowledge
By
–
fewer than 100 people deeply understand both (i) transformers and (ii) the GPU programming model want to learn machine learning? gain some esoteric systems knowledge; spend some time really learning CUDA
-
Data determines AI systems behavior and outcomes
By
–
no, it’s determined by the data, ur missing the point
-
Vector Embeddings vs Word Token Embeddings Distinction
By
–
I’m talking about vectors that are the output of vector-encoding models, commonly called “embeddings”. not word/token vectors, also called embeddings, which you’re thinking of
-
Embeddings Model Output vs Token Embeddings Clarification
By
–
great question; when I tweet about this, I will always be talking about the output of an embeddings model; if I mean token embeddings I will specify
-
Language Models Weights vs Embeddings Activations Information Encoding
By
–
language models encode information in their *weights* while embeddings encode information in their *activations* this distinction is important, possibly somewhat profound
-
RLHF vs DPO: Training Methods for AI Models Compared
By
–
For those without context, I’m referring to this sort of thing. Competition going right now as to whether we need “RLHF” (which I think is just reinforcement learning w/ a trained reward model) or if we can do “DPO” (essentially supervised learning) no one really knows https://
x.com/yoavartzi/stat
/yoavartzi/status/1730252149370548598
…