The second critical ingredient is that while a Transformer seems ~able to act as a general-purpose computer in principle, the training objective has to be hard enough to actually force the optimization to discover and converge onto it in the "weights space" of the network.
LLMS
-
Language Modeling as Universal Learning Objective Through Text Compression
By
–
Turns out language modeling (i.e. ~next word prediction; equivalent to compression) of internet text is this excellent objective – v simple to define and collect data for at scale. It forces the neural net to learn a lot about the world, "multi-tasking" across many domains.
-
GPT as a General-Purpose Computer Reconfigurable via Natural Language Programs
By
–
If previous neural nets are special-purpose computers designed for a specific task, GPT is a general-purpose computer, reconfigurable at run-time to run natural language programs. Programs are given in prompts (a kind of inception). GPT runs the program by completing the document
-
Transformer: The Critical Unlock Technology for General-Purpose AI
By
–
So the first critical "unlock technology" is the Transformer, a neural net architecture powerful enough to become a general-purpose computer. I've written more about this here: 1) https://
x.com/karpathy/statu
s/1582807367988654081
… and 2) -
Transformers’ In-Context Learning: Emergent Ability at Scale
By
–
The non-obvious crux of the shift is an empirical finding, emergent only at scale, and well-articulated in the GPT-3 paper (
https://
arxiv.org/abs/2005.14165). Basically, Transformers demonstrate the ability of "in-context" learning. At run-time, in the activations. No weight updates. -
Neural Language Models: 20 Years of Autoregressive Architecture Evolution
By
–
E.g. ~20 years ago Bengio et al 2003 (pdf: https://
jmlr.org/papers/volume3
/bengio03a/bengio03a.pdf
…) trained a neural language model. The state of the art GPT+friends of today are the exact same (autoregressive) model, except the neural net architecture is upgraded from an MLP to a Transformer. -
Neural Language Models: From Overlooked Niche to AI Breakthrough
By
–
An interesting historical note is that neural language models have actually been around for a very long time but noone really cared anywhere near today's extent. LMs were thought of as specific applications, not as mainline research unlocking new general AI paths and capabilities
-
GALA Model Code Released on GitHub for Researchers
By
–
The code is available. https://
github.com/paperswithcode
/galai
… So, we can try, but the general public can't. -
LangChain 0.0.15 Release: Dark Mode and SQL Improvements
By
–
LangChain version 0.0.15 – "the @nlarusstone release" improve color highlighting so it looks good in dark mode @nlarusstone (see below) add tables to ignore/include in SQL DB chain @nlarusstone add concept of document metadata add `apply` method to all chains
-
Groq Showcases Compiler, Developer Tools, and RealScale at Supercomputing
By
–
At @Supercomputing
? Drop by booth 3047 to:
– Discuss your model! Did you know 500+ models run on our #compiler?
– See dev tools like GroqFlow & GroqAPI, made for fine-grained control
– Discuss RealScale™, the technology that extends performance and #lowlatency from #chip to rack