@jxmnop - AI Dynamics - Page 69 of 78

Condensing Token Embeddings into Single Vectors for Compute Efficiency

By

–

05 December 2023 4h48

oh yes I've been thinking about some similar things. so you want to condense sequences of token embeddings into a single vector to save compute. I think my research shows this should work in theory (you can condense lots of text into a vector without losing information)

→ View original post on X — @jxmnop

5 December 2023

Flash Attention: Why Did Breakthrough Technology Take Years?

By

@jxmnop

–

05 December 2023 0h50

if that were true we would have gotten flash attention years earlier

→ View original post on X — @jxmnop

5 December 2023

GPU Programming and Transformers: Essential ML Systems Knowledge

By

@jxmnop

–

04 December 2023 18h27

fewer than 100 people deeply understand both (i) transformers and (ii) the GPU programming model want to learn machine learning? gain some esoteric systems knowledge; spend some time really learning CUDA

→ View original post on X — @jxmnop

4 December 2023

Data determines AI systems behavior and outcomes

By

@jxmnop

–

04 December 2023 17h33

no, it’s determined by the data, ur missing the point

→ View original post on X — @jxmnop

4 December 2023

Vector Embeddings vs Word Token Embeddings Distinction

By

@jxmnop

–

04 December 2023 16h28

I’m talking about vectors that are the output of vector-encoding models, commonly called “embeddings”. not word/token vectors, also called embeddings, which you’re thinking of

→ View original post on X — @jxmnop

4 December 2023

Embeddings Model Output vs Token Embeddings Clarification

By

@jxmnop

–

04 December 2023 12h46

great question; when I tweet about this, I will always be talking about the output of an embeddings model; if I mean token embeddings I will specify

→ View original post on X — @jxmnop

4 December 2023

Language Models Weights vs Embeddings Activations Information Encoding

By

@jxmnop

–

04 December 2023 4h11

language models encode information in their *weights* while embeddings encode information in their *activations* this distinction is important, possibly somewhat profound

→ View original post on X — @jxmnop

4 December 2023

RLHF vs DPO: Training Methods for AI Models Compared

By

@jxmnop

–

01 December 2023 3h10

For those without context, I’m referring to this sort of thing. Competition going right now as to whether we need “RLHF” (which I think is just reinforcement learning w/ a trained reward model) or if we can do “DPO” (essentially supervised learning) no one really knows https://
x.com/yoavartzi/stat
/yoavartzi/status/1730252149370548598
…

→ View original post on X — @jxmnop

1 December 2023

Do We Really Need Reinforcement Learning for Language Models?

By

@jxmnop

–

01 December 2023 3h08

It’s funny to me that no one has been able to figure out if we really need reinforcement learning to train language models. are we collectively not smart enough to figure out the math? or is there some theory we don’t have that would make questions like this easier?

→ View original post on X — @jxmnop

1 December 2023

Keras 3 Release Breaks Python Package Test Suites

By

@jxmnop

–

28 November 2023 17h15

I remember when I first started training deep learning models and my whole twitter timeline was the author of keras tweeting about how no one wants to use pytorch keras 3 came out today and the only reason I heard about it is because it broke one of my python package test suites https://
x.com/fchollet/statu
/fchollet/status/1047570406570307584
…

→ View original post on X — @jxmnop

28 November 2023