Also why no 70B model?
1) Because of the 4x smaller dataset (compared to Llama 2 pretraining)?
2) to make 100k context windows work hardware-wise?
Why no 70B model for 100k context windows
By
–
Global AI News Aggregator
By
–
Also why no 70B model?
1) Because of the 4x smaller dataset (compared to Llama 2 pretraining)?
2) to make 100k context windows work hardware-wise?
Leave a Reply