Google AI Releases Improved MC4 Corpus and uMT5 Models

AI Dynamics

Global AI News Aggregator

Google AI Releases Improved MC4 Corpus and uMT5 Models

–

19 April 2023 6h37

Sharing a piece of work I contributed to while at @GoogleAI
: * a new improved Mc4 corpus (29T char tokens and 107 languages) that gets language sampling right with UniMax sampling. * open source pretrained uMT5 models trained on 1T tokens. * Unimax sampling solves some

→ View original post on X — @yitayml,

19 April 2023

AI DATA GENERATIVE AI LLMS MACHINE LEARNING OPEN SOURCE RESEARCH

AI Dynamics

Google AI Releases Improved MC4 Corpus and uMT5 Models

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Choosing Survival: The Cost of Edge Cases in Difficult Decisions

Hyperloop Transformers: Memory-Efficient LLM via Looped Architecture

Chinese Geely Robotaxi Concept Challenges Tesla’s Market Position

Top 10 Strategic Technology Trends for 2026