I think focusing on a simple concept of information theory is missing the point. The issue here is more practical. If you need detailed captions for 5 billion videos, it would take you a huge amount of effort, incentives, and money to get good data from humans. If you google alt
Scaling Video Captioning: Practical Challenges Beyond Information Theory
By
–