Feels like a lot of fertile ground is left in managing the "attention" of an LLM during its training via a meta-learning policy, instead of the typical "memorize dataset uniformly at random" strategy. And giving it a calculator and a scratch pad.
LLMS
-
Training Strategies: Skimming, Filtering Noise, and Revisiting Content
By
–
More generally a few remarkable strategies people use during their training:
1) skim text because they already know it
2) ignore text because it's clearly noise (e.g. they won't memorize SHA256 hashes. LLMs will.)
3) revisit parts that are learnable but not yet learned -
Examples vs. Presentations: Spaced Repetition in LLM Training
By
–
Is it the number of examples that matters or the number of presentations to the model during training? E.g. humans used spaced repetition to memorize facts but there are no equivalents of similar techniques in LLMs where the typical training regime is uniform random.
-
LangChain 0.0.13 Release: Vector DB QA and Documentation Updates
By
–
LangChain Version 0.0.13 Question/Answering w/ a vector DB chain (demo coming tmrw) Loading a prompt from a text file (
@edmarferreira first commit!) Misc cleanup (Eugene x4!!!) Big Documentation overhaul (w/ Eugene again) -
GPT-2’s Inscrutable Internal Matrices Remain Poorly Understood
By
–
Nobody's ever even going to understand how GPT-2 worked, except that there sure were a lot of inscrutable matrices in there.
-
Hybrid AI Systems More Practical Than Full LLM Retraining
By
–
I love the dramatic futurism here! But it's more likely they will be hybrid systems. You don't want to retrain a LLM everytime a Wikipedia page is changed, or a news item is published. (Plus, Google would be in the best place to deliver this new system, incrementally.)
-
When We Realize We’re Just Stochastic Parrots
By
–
what happens when we realize we were just stochastic parrots all along?
-
LangChain 0.0.12 Release with AI21Labs and Manifest Integrations
By
–
LangChain version 0.0.12 Two super exciting integrations! @AI21Labs integration (from a friend of @YuvalinTheDeep
) Integration with @HazyResearch
's manifest library (with help from @laurel_orr1
) -
AGI Parameter Count vs Brain Synapses: Current Models Scale
By
–
A common view is that human level AGI will require a parameter count in the order of magnitude of the brain’s 100 trillion synapses. The large language models and image generators are only about 1/1000 of that, but they already contain more information than a single human
-
LangChain 0.0.11 Release: New Embeddings and Text Splitting Features
By
–
LangChain version 0.0.11: – @CohereAI embedding support from @abdrahman_issam – @NLTK_org and @spacy_io support for text splitting from @deliprao "Optimized Prompts" from @sjwhitmore misc cleanup from @deliprao and @nlarusstone https://
github.com/hwchase17/lang
chain
…