AI Dynamics

Global AI News Aggregator

About

SlimPajama: Open-Source Cleaned RedPajama Dataset Released

We recently announced the availablity of SlimPajama – an open-source, cleaned, and deduplicated version of RedPajama-1T. It is half the size and trains twice as fast and when upsampled, performs equal or better than RedPajama. See below for the dataset and preprocessing library

→ View original post on X — @cerebras,