i'll take blogposts as long as they come with open artefacts on the hub
@reach_vb
-
User anticipates Whisper speech recognition model refresh
By
–
i was hoping for another whisper refresh actually
-
Smaller Multilingual Models Under 1B Parameters
By
–
go even smaller than 2B & multilingual – sub 1B is perfect size!
-
Autoregressive Latent Diffusion Model for Video Generation
By
–
> autoregressive latent diffusion model
— Vaibhav (VB) Srivastav (@reach_vb) 4 décembre 2024
> trained on large video datasets
> latent frames pass through an autoencoder to a transformer dynamics model
> uses a causal mask similar to LLMs
> inference involves frame-by-frame autoregressive sampling with past frames
>… https://t.co/bLiWmy1IDS pic.twitter.com/7V3vsiC6c1> autoregressive latent diffusion model > trained on large video datasets
> latent frames pass through an autoencoder to a transformer dynamics model
> uses a causal mask similar to LLMs
> inference involves frame-by-frame autoregressive sampling with past frames
> -
DeepMind Genie 2: Multimodal World Model Generates 3D Environments
By
–
DeepMind COOKED! Genie 2, a large-scale, multi-modal foundation world model! 🔥
— Vaibhav (VB) Srivastav (@reach_vb) 4 décembre 2024
Capable of creating endless action-controllable, playable 3D environments – the future is going to be so, so wild! pic.twitter.com/BRi2djkarmDeepMind COOKED! Genie 2, a large-scale, multi-modal foundation world model! Capable of creating endless action-controllable, playable 3D environments – the future is going to be so, so wild!
-
Model Distribution Evolution: Stable Diffusion to Llama 3.1
By
–
Wild how the distribution of models changes so, soo much over the two years!
— Vaibhav (VB) Srivastav (@reach_vb) 4 décembre 2024
We went from Stable Diffusion v1.4 to Mixtral to Llama 3.1 8B 🔥 pic.twitter.com/yjpo8luK3XWild how the distribution of models changes so, soo much over the two years! We went from Stable Diffusion v1.4 to Mixtral to Llama 3.1 8B
-
Indic-Parler TTS: 20 Indian Languages Speech Synthesis Model
By
–
Introducing Indic-Parler TTS – Trained on 10K hours of data, 938M params, supports 20 Indic languages, emotional synthesis, apache 2.0 licensed! 🔥
— Vaibhav (VB) Srivastav (@reach_vb) 3 décembre 2024
A collaboration w/ @ai4bharat & @huggingface – w/ fully customisable speech and voice personas!
Try it out directly below or use… pic.twitter.com/8aagHcCT61Introducing Indic-Parler TTS – Trained on 10K hours of data, 938M params, supports 20 Indic languages, emotional synthesis, apache 2.0 licensed! A collaboration w/ @ai4bharat & @huggingface – w/ fully customisable speech and voice personas! Try it out directly below or use