Meta released MobileLLM – 125M, 350M, 600M, 1B model checkpoints! Notes on the release: Depth vs. Width: Contrary to the scaling law (Kaplan et al., 2020), depth is more critical than width for small LLMs, enhancing abstract concept capture and final performance Embedding
Meta Releases MobileLLM: Depth Critical Over Width
By
–
Leave a Reply