One-Channel Stack: > Trained on 20M hours of audio
> Primary checkpoint initialized from pretrained language model on 2T text tokens
> Text-pretrained model shows higher coherence in subjective evaluations Two-Channel Hertz-lm: > Predicts two quantized latents for two separate
Two-Channel Audio Models with Text Pretraining Architecture
By
–
Leave a Reply