Inference Engines have different optimizations for different hardware Most optimized for DGX Spark: TensorRT-LLM Inference Engines MATTER and they are NOT EQUAL (e.g. blogpost below) Opensource models ARE NOT just a memory size issue BTW
TensorRT-LLM Optimized for DGX Spark: Inference Engines Matter
By
–