One of the more interesting things about the new DBRX model is it uses the GPT-4 tokenizer. Compared to the LLaMA tokenizer (used by Mixtral), it's ~20% more efficient. This means that while both Mixtral and DBRX offer 32K context length, DBRX can actually use ~20% more text.
DBRX Model Uses GPT-4 Tokenizer for Improved Efficiency
By
–
Leave a Reply