AI Dynamics

Global AI News Aggregator

About

GPT-3 Training Data Composition Breakdown

GPT-3 has been trained on 45 TB of text data from different categories: ⬩Common Crawl (8 years of raw web page crawling) ⬩WebText (The text of Reddit posts with 3+ upvotes) ⬩Books (The internet-based books corpora) ⬩Wikipedia Data is then "weighed" as such:

→ View original post on X — @aibreakfast