For a specific task, like doing a large matrix multiply, there are libraries that get nearly optimal performance across different CPU and GPU architectures, but only because programmers have gone and hand written each case. Generally, software does “just work” across lots of
Hand-written optimization libraries achieve near-optimal performance across architectures
By
–
Leave a Reply