3/5 Trying to run SWE-bench eval as-is on k8s at large scale wasn't trivial: – Fresh pods have no cache. This means that everyone re-downloads the world (hello HF 429s.
– “docker run inside k8s” works on paper, then dies from contention, privileges, and overhead. It worked, but
SWE-bench Evaluation Challenges at Kubernetes Scale
By
–
Leave a Reply