AI Dynamics

Global AI News Aggregator

Why no 70B model for 100k context windows

Also why no 70B model?
1) Because of the 4x smaller dataset (compared to Llama 2 pretraining)?
2) to make 100k context windows work hardware-wise?

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *