AI Dynamics

Global AI News Aggregator

Google AI Releases Improved MC4 Corpus and uMT5 Models

Sharing a piece of work I contributed to while at @GoogleAI
: * a new improved Mc4 corpus (29T char tokens and 107 languages) that gets language sampling right with UniMax sampling. * open source pretrained uMT5 models trained on 1T tokens. * Unimax sampling solves some

→ View original post on X — @yitayml,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *