Large Transformers are powerful but expensive to train & use. The extremely high inference cost is a big bottleneck for adopting them for solving real-world tasks at scale. Check out my new post on some ideas on inference optimization for Transformers:
Transformer Inference Optimization: Reducing Computational Costs
By
–