AI Dynamics

Global AI News Aggregator

LLMS

Transformer Training Objective Difficulty Forces Weight Space Optimization

By

AI Dynamics

–

18 November 2022 2h37

The second critical ingredient is that while a Transformer seems ~able to act as a general-purpose computer in principle, the training objective has to be hard enough to actually force the optimization to discover and converge onto it in the "weights space" of the network.

→ View original post on X — @karpathy,

18 November 2022
Language Modeling as Universal Learning Objective Through Text Compression

By

AI Dynamics

–

18 November 2022 2h37

Turns out language modeling (i.e. ~next word prediction; equivalent to compression) of internet text is this excellent objective – v simple to define and collect data for at scale. It forces the neural net to learn a lot about the world, "multi-tasking" across many domains.

→ View original post on X — @karpathy,

18 November 2022
GPT as a General-Purpose Computer Reconfigurable via Natural Language Programs

By

AI Dynamics

–

18 November 2022 2h37

If previous neural nets are special-purpose computers designed for a specific task, GPT is a general-purpose computer, reconfigurable at run-time to run natural language programs. Programs are given in prompts (a kind of inception). GPT runs the program by completing the document

→ View original post on X — @karpathy,

18 November 2022
Transformer: The Critical Unlock Technology for General-Purpose AI

By

AI Dynamics

–

18 November 2022 2h37

So the first critical "unlock technology" is the Transformer, a neural net architecture powerful enough to become a general-purpose computer. I've written more about this here: 1) https://
x.com/karpathy/statu
s/1582807367988654081
… and 2)

→ View original post on X — @karpathy,

18 November 2022
Transformers’ In-Context Learning: Emergent Ability at Scale

By

AI Dynamics

–

18 November 2022 2h37

The non-obvious crux of the shift is an empirical finding, emergent only at scale, and well-articulated in the GPT-3 paper (
https://
arxiv.org/abs/2005.14165). Basically, Transformers demonstrate the ability of "in-context" learning. At run-time, in the activations. No weight updates.

→ View original post on X — @karpathy,

18 November 2022
Neural Language Models: 20 Years of Autoregressive Architecture Evolution

By

AI Dynamics

–

18 November 2022 2h37

E.g. ~20 years ago Bengio et al 2003 (pdf: https://
jmlr.org/papers/volume3
/bengio03a/bengio03a.pdf
…) trained a neural language model. The state of the art GPT+friends of today are the exact same (autoregressive) model, except the neural net architecture is upgraded from an MLP to a Transformer.

→ View original post on X — @karpathy,

18 November 2022
Neural Language Models: From Overlooked Niche to AI Breakthrough

By

AI Dynamics

–

18 November 2022 2h37

An interesting historical note is that neural language models have actually been around for a very long time but noone really cared anywhere near today's extent. LMs were thought of as specific applications, not as mainline research unlocking new general AI paths and capabilities

→ View original post on X — @karpathy,

18 November 2022
GALA Model Code Released on GitHub for Researchers

By

AI Dynamics

–

17 November 2022 19h41

The code is available. https://
github.com/paperswithcode
/galai
… So, we can try, but the general public can't.

→ View original post on X — @learnopencv,

17 November 2022
LangChain 0.0.15 Release: Dark Mode and SQL Improvements

By

AI Dynamics

–

17 November 2022 19h23

LangChain version 0.0.15 – "the @nlarusstone release" improve color highlighting so it looks good in dark mode @nlarusstone (see below) add tables to ignore/include in SQL DB chain @nlarusstone add concept of document metadata add `apply` method to all chains

→ View original post on X — @langchain,

17 November 2022
Groq Showcases Compiler, Developer Tools, and RealScale at Supercomputing

By

AI Dynamics

–

17 November 2022 17h01

At @Supercomputing
? Drop by booth 3047 to:
– Discuss your model! Did you know 500+ models run on our #compiler?
– See dev tools like GroqFlow & GroqAPI, made for fine-grained control
– Discuss RealScale™, the technology that extends performance and #lowlatency from #chip to rack

→ View original post on X — @groqinc,

17 November 2022