Perplexity runs on NVIDIA. Nice breakdown from the team on how they’re using the CUTLASS Python stack to optimize their models for inference
By
–

Perplexity runs on NVIDIA. Nice breakdown from the team on how they’re using the CUTLASS Python stack to optimize their models for inference