Microsoft Open Sources VibeVoice-ASR for 60-Minute Speech Recognition - AI Dynamics

Skip to content

AI Dynamics

Global AI News Aggregator

Rechercher

Microsoft Open Sources VibeVoice-ASR for 60-Minute Speech Recognition

By

–

06 April 2026 16h12

Microsoft just fixed a major speech recognition problem! They open sourced VibeVoice-ASR, a speech-to-text model that processes 60 minutes of audio in a single pass. Here's the problem with most ASR models. They slice audio into short chunks, usually 30 seconds or less. Process each chunk separately. Lose speaker context between segments. You get disconnected transcripts that can't track who said what across a full meeting. VibeVoice-ASR handles 60 minutes of continuous audio without chunking. The model maintains global context across the entire hour. The output is structured. Who spoke, when they spoke, what they said. Speaker diarization, timestamps, and transcription all in one pass. Key features: • 60-minute single-pass processing without chunking audio • Structured output: speaker labels, timestamps, and content combined • Customized hotwords: provide specific names or technical terms to improve accuracy • Multilingual support: 50+ languages • Joint ASR, diarization, and timestamping in one model The model is 7B parameters. Available on Hugging Face with finetuning code included. I've shared the repo link in the comments!

→ View original post on X — @sumanth_077, 2026-04-06 14:12 UTC

6 April 2026

AI BIG TECH CODE INNOVATION MACHINE LEARNING MULTIMODAL AI OPEN SOURCE RESEARCH TOOLS

←Try the Ad Concepter App Today New Tool

Microsoft Shares VibeVoice GitHub Repository→

MORE ARTICLES

Paper praised for executing Gato idea with humanoid; more work desired

28 June 2026
Skild Brain AI enables robots to handle unfamiliar environments

28 June 2026
Proposal to replace Google Search with Gemini

28 June 2026
Using video to learn control representations, touch important

28 June 2026

INNOVATION GENERATIVE AI RESEARCH LLMS TOOLS MACHINE LEARNING CODE MARKET TRENDS TECHNOLOGY BUSINESS BIG TECH ETHICS ENTERPRISE AI SOFTWARE AGENTS AUTOMATION APPS COMPUTING DATA POLICY OPEN SOURCE MULTIMODAL AI REGULATION CULTURE CREATIVE AI PROMPT ENGINEERING SOCIETY ECONOMY SAFETY EDUCATION INVESTMENT AI HARDWARE AGI HARDWARE JOBS STARTUPS INDUSTRY ROBOTICS WORKFORCE SECURITY CYBERSECURITY HEALTHCARE AI SYSTEMS SUSTAINABILITY WEB3 DECENTRALIZED AI

AI Dynamics

Global AI News Aggregator

About
Archives
Contact

Rechercher