It's been a week since LLaMA 3 dropped. In that time, we've:
– extended context from 8K -> 128K
– trained multiple ridiculously performant fine-tunes
– got inference working at 800+ tokens/second If Meta keeps releasing OSS models, closed providers won't be able to compete.
LLaMA 3 Extended Context Window Reaches 128K Tokens
By
–