Totally thrilled to be alongside @GuillaumeLample and @tlacroix6 to create Mistral AI. A lot of work ahead of us!
@arthurmensch
-
ChatGPT’s Ease with Boomer Requests Raises Questions
By
–
ChatGPT indeed seems at ease with boomer requests
-
Open vs Closed Source Operating Systems Competition Ahead
By
–
If that's indeed an OS, let's prepare to see an interesting replay of closed vs open-source operating systems in the coming years 🙂
-
Deep Learning vs Tree Methods on Small Datasets and Multitask Fine-tuning
By
–
I can see how deep learning methods may struggle to catch up with tree methods on small size datasets (and you say it in the thread). Wondering if you did try to do multitask fine-tuning on eg Transformers and saw a positive benefits? (we observe it in text, see eg Flan/T0)
-
Empirical Modelling: Approximation Theory and Optimization Analysis
By
–
Both are empirical modelling of experimental outcomes. The bottom one has some grounding in approximation theory and optimisation analysis. It also does not tend to 0 when N,D -> infinity, which is a sound property.
-
TPU Dynamic Sizes Constraints and GPU Attention Efficiency Trade-offs
By
–
On TPUs you can't use dynamic sizes in a loop, so you use mask and static sizes, and therefore the first step of the loop is as costly as the last one. On GPUs you could have a faster first step, but the cost of attention reduction is relatively low for large models.
-
Fixed Size Cache Initialization in Sampling Operations
By
–
Because you typically init caches with fixed size before sampling
-
DeepMind LLM Research at NeurIPS Discussion and Recruitment
By
–
At NeurIPS this week, reach out if you want to discuss our work around LLM at @DeepMind (Chinchilla, Retro, Flamingo), and if you're interested in working with us!