The model card has some more interesting info too: https://
github.com/meta-llama/lla
ma3/blob/main/MODEL_CARD.md
… Note that Llama 3 8B is actually somewhere in the territory of Llama 2 70B, depending on where you look. This might seem confusing at first but note that the former was trained for 15T tokens, while the
Llama 3 8B Performance Comparable to Llama 2 70B Model
By
–
Leave a Reply