Presenting AVFormer, a simple method for injecting visual information into frozen speech models for zero-shot audiovisual (AV) automatic speech recognition (ASR). Read about how AVFormer achieves state-of-the-art AV-ASR performance and more → https://
goo.gle/3IU40P3
AVFormer Achieves State-of-the-Art Audiovisual Speech Recognition
By
–
Leave a Reply