CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models paper page: https://
huggingface.co/papers/2306.09
635
… Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos
By
–
Leave a Reply