
A new open source model with 10M context window is out. “We gradually increased context size from 32K →… 4M → 10M. This allowed us to prioritize pretraining with shorter sequences in the beginning, thereby offering higher utilization rates.”
By
–


A new open source model with 10M context window is out. “We gradually increased context size from 32K →… 4M → 10M. This allowed us to prioritize pretraining with shorter sequences in the beginning, thereby offering higher utilization rates.”