If a video generation model is allowed to use a non-trivial amount of inference-time compute, how much can it improve generation quality for challenging text prompts?
This is exactly what a new paper from Tsinghua University explores.
Video Generation Models Leverage Inference Compute for Quality
By
–
