It’s funny to me that no one has been able to figure out if we really need reinforcement learning to train language models. are we collectively not smart enough to figure out the math? or is there some theory we don’t have that would make questions like this easier?
@jxmnop
-
Keras 3 Release Breaks Python Package Test Suites
By
–
I remember when I first started training deep learning models and my whole twitter timeline was the author of keras tweeting about how no one wants to use pytorch keras 3 came out today and the only reason I heard about it is because it broke one of my python package test suites https://
x.com/fchollet/statu
/fchollet/status/1047570406570307584
… -
Model Parameters Training Steps and Downstream Performance
By
–
those too, but i think more importantly i'm asking about the relationship between model parameters, training steps (whatever that means in the embeddings case), and downstream performance
-
Scaling Laws for Language Models Explained
By
–
no, scaling laws like the scaling laws for language models:
-
Scaling Laws for Embedding Models: Research Initiatives
By
–
who's working on scaling laws for embedding models? could be any of:
• text embeddings (DPR, GTR, GTE…)
• image embeddings (SimCLR, DINO…)
• recommendation systems (?)
• multimodal embeddings (CLIP, ImageBind…)
• any other type of embeddings… -
Nomic AI’s latest developments in open source language models
By
–
literally @nomic_ai (cc @andriy_mulyar
) -
HuggingFace Model Quantization on 24GB GPU
By
–
I used HuggingFace and quantized the model to 4 bits, I believe it ran on a 24GB gpu (maybe need 48)
-
Circumventing Biden AI Model Training Restrictions Through Parameter Scaling
By
–
the biden executive order put restrictions on models that:
• contain at least 10^9 parameters
• use more than 10^26 floating-point operations my future company will get around these restrictions by simply training a 9,999,999 billion param model comprised of 8192-bit floats -
Circumventing Biden’s AI Model Parameter and Compute Restrictions
By
–
the biden executive order put restrictions on models that:
• contain at least 10^9 parameters
• use more than 10^26 floating-point operations my future company will get around these restrictions by simply training a 9,999,999 billion param model comprised of 8192-bit floats -
4096 Bit Floats: Advanced Computing Technology Innovation
By
–
shhhh, this was gonna be how i won the bet, 4096 bit floats