Meta's 24k H100 Cluster Pods Infrastructure for Llama3

AI Dynamics

Global AI News Aggregator

Meta’s 24k H100 Cluster Pods Infrastructure for Llama3

–

12 March 2024 16h54

Here's details on Meta's 24k H100 Cluster Pods that we use for Llama3 training.
* Network: two versions RoCEv2 or Infiniband. * Llama3 trains on RoCEv2
* Storage: NFS/FUSE based on Tectonic/Hammerspace
* Stock PyTorch: no real modifications that aren't upstreamed
* NCCL with

→ View original post on X — @soumithchintala,

12 March 2024

AI AI HARDWARE COMPUTING GENERATIVE AI HARDWARE LLMS OPEN SOURCE RESEARCH SOFTWARE

AI Dynamics

Meta’s 24k H100 Cluster Pods Infrastructure for Llama3

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cybercab Uber: Safer, Cheaper Alternative for Single Riders

Zeekr Global Unveils Latest Electric Vehicle Model

Revolutionary New Camera Technology Unveiled

Hidden Camera Recording Family Interactions Raises Privacy Concerns