Hertz-lm: > 6.6B parameters, 32-layer decoder-only transformer
> Context of 2048 input tokens (~4.5 mins)
> Predicts 15-bit compressed versions of hertz-codec tokens
@reach_vb
-
Hertz-lm: 6.6B Parameter Audio Language Model Released
By
–
-
Hertz-codec: Advanced Convolutional Audio VAE Outperforms Competitors
By
–
Hertz-codec: > Convolutional audio VAE
> Encodes 16kHz mono speech to 8Hz latent representation at 1kbps
> 32-dim latent per 125ms frame
> Outperforms Soundstream and Encodec at 6kbps, on par with DAC at 8kbps
> 5M encoder, 95M decoder parameters -
Hertz-dev: 8.5B Parameter Open-Source Audio Model
By
–
Hertz-dev – 8.5 billion parameters, full-duplex, audio-only base model, APACHE 2.0 licensed 🔥
— Vaibhav (VB) Srivastav (@reach_vb) 10 novembre 2024
> Trained on 20 million hours of audio
Train on any down-stream task, speech-to-speech, translation, classification, speech recognition, text-to-speech and more!
GG @si_pbc 🤗 pic.twitter.com/MlCt6njDGYHertz-dev – 8.5 billion parameters, full-duplex, audio-only base model, APACHE 2.0 licensed > Trained on 20 million hours of audio Train on any down-stream task, speech-to-speech, translation, classification, speech recognition, text-to-speech and more! GG @si_pbc
-
Gemini 1.5Pro Outscores OpenAI O1 on FrontierMath Benchmark
By
–
Wait wtf!? @GoogleDeepMind Gemini 1.5Pro out scoring @OpenAI O1-preview on FrontierMath :O Even @AnthropicAI 3.5 Sonnet (new) beats it!
-
GPT-4o Advanced Voice Mode Launch Announcement
By
–
Open GPT-4o Advanced Voice Mode ⚡️ https://t.co/Yae2nFv3gF
— Vaibhav (VB) Srivastav (@reach_vb) 5 novembre 2024Open GPT-4o Advanced Voice Mode
-
Fish Agent v0.1: New Multilingual Speech-to-Speech Model Released
By
–
Wow! New Speech to Speech model – Fish Agent v0.1 3B by @FishAudio 🔥
— Vaibhav (VB) Srivastav (@reach_vb) 5 novembre 2024
> Trained on 700K hours of multilingual audio
> Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens
> Zero-shot voice cloning
> Text + audio input/ Audio output
> Ultra-fast… pic.twitter.com/UvdwxGUm4wWow! New Speech to Speech model – Fish Agent v0.1 3B by @FishAudio > Trained on 700K hours of multilingual audio
> Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens
> Zero-shot voice cloning
> Text + audio input/ Audio output
> Ultra-fast -
Using PEFT Adapters in Your Machine Learning Projects
By
–
If you have a PEFT adapter, then you can use: