AI Dynamics

Global AI News Aggregator

About

Training generator and cascader models on 300k+ hours of audio

The 300k+ hours of audio clips were used to train a "generator model" that turns the text into an intermediate representation, and a "cascader model" that uses this intermediate representation to produce high-quality audio.

→ View original post on X — @aibreakfast