Skip to content

AI Dynamics

Global AI News Aggregator

Rechercher

@sumanth_077

Modulate’s Velma: Real-Time Deepfake Voice Detection API

By

@sumanth_077

–

05 April 2026 16h12

If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/204… Sumanth (@Sumanth_077) Massive breakthrough in voice deepfake detection! @modulate_ai just released a deepfake detection API that topped @huggingface's leaderboard at 98.9% accuracy. Here's the problem with how most companies handle deepfake detection. They check the first 10 seconds of a call. If it passes, they assume the whole call is clean. Gate check. One scan. Done. Fraudsters know this. So they open the call with a real voice. Their own voice, a colleague, a quick recording. Pass the check. Then switch to the AI-generated clone mid-call. The system already gave them the green light. They're through. The fix is obvious. Monitor the entire call. Not just the opening. Not random spot checks. Every segment, continuously, in real-time. But that was too expensive. Until now. Velma is Modulate's real-time and batch deepfake detection API. Here's what changed. • Real-time streaming detection. Analyzes audio every 2 seconds during live calls. Catches mid-call voice switches instantly. • 120x cheaper than competitors. $0.25 per hour instead of $30-150. Now you can actually afford to monitor full conversations instead of spot-checking. • Only needs 2.5 seconds of audio. Faster detection, works with short segments. • 98.9% accuracy, ranked first on HuggingFace. Lower error rate than models 10x larger. First 1000 API credits are free. I've shared the link in the replies! — https://nitter.net/Sumanth_077/status/2040793953927311527#m

→ View original post on X — @sumanth_077, 2026-04-05 14:12 UTC

5 April 2026
Checkout Modulate Deepfake Detection API

By

@sumanth_077

–

05 April 2026 16h09

Checkout Modulate: modulate.ai/api/deepfake-det…

→ View original post on X — @sumanth_077, 2026-04-05 14:09 UTC

5 April 2026
Modulate’s Velma API Achieves 98.9% Deepfake Detection Accuracy

By

@sumanth_077

–

05 April 2026 16h09

Massive breakthrough in voice deepfake detection! @modulate_ai just released a deepfake detection API that topped @huggingface's leaderboard at 98.9% accuracy. Here's the problem with how most companies handle deepfake detection. They check the first 10 seconds of a call. If it passes, they assume the whole call is clean. Gate check. One scan. Done. Fraudsters know this. So they open the call with a real voice. Their own voice, a colleague, a quick recording. Pass the check. Then switch to the AI-generated clone mid-call. The system already gave them the green light. They're through. The fix is obvious. Monitor the entire call. Not just the opening. Not random spot checks. Every segment, continuously, in real-time. But that was too expensive. Until now. Velma is Modulate's real-time and batch deepfake detection API. Here's what changed. • Real-time streaming detection. Analyzes audio every 2 seconds during live calls. Catches mid-call voice switches instantly. • 120x cheaper than competitors. $0.25 per hour instead of $30-150. Now you can actually afford to monitor full conversations instead of spot-checking. • Only needs 2.5 seconds of audio. Faster detection, works with short segments. • 98.9% accuracy, ranked first on HuggingFace. Lower error rate than models 10x larger. First 1000 API credits are free. I've shared the link in the replies!

→ View original post on X — @sumanth_077, 2026-04-05 14:09 UTC

5 April 2026
VoxCPM: Open-Source Voice Cloning Without Tokenization

By

@sumanth_077

–

03 April 2026 15h15

If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/204… Sumanth (@Sumanth_077) Clone a human voice in real time without tokenization! VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens. Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation. VoxCPM skips tokenization entirely. It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4. The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details. This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines. Two flagship capabilities: 1. Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions. 2. Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing. Key features: • Tokenizer-free architecture with continuous speech modeling • Context-aware prosody generation without manual tuning • Zero-shot voice cloning from short reference audio • Streaming synthesis support for real-time applications • SFT and LoRA fine-tuning support It's 100% open source Link to the GitHub repo in the comments! — https://nitter.net/Sumanth_077/status/2040055394958286903#m

→ View original post on X — @sumanth_077, 2026-04-03 13:15 UTC

3 April 2026
VoxCPM GitHub Repository Released

By

@sumanth_077

–

03 April 2026 15h14

Github Repo: github.com/OpenBMB/VoxCPM

→ View original post on X — @sumanth_077, 2026-04-03 13:14 UTC

3 April 2026
VoxCPM: Real-time Voice Cloning Without Tokenization

By

@sumanth_077

–

03 April 2026 15h14

Clone a human voice in real time without tokenization! VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens. Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation. VoxCPM skips tokenization entirely. It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4. The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details. This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines. Two flagship capabilities: 1. Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions. 2. Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing. Key features: • Tokenizer-free architecture with continuous speech modeling
• Context-aware prosody generation without manual tuning
• Zero-shot voice cloning from short reference audio
• Streaming synthesis support for real-time applications
• SFT and LoRA fine-tuning support It's 100% open source Link to the GitHub repo in the comments! [Translated from EN to English]

→ View original post on X — @sumanth_077, 2026-04-03 13:14 UTC

3 April 2026
LLaMA-Factory: Fine-Tune 100+ LLMs Without Coding

By

@sumanth_077

–

02 April 2026 15h50

If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/203… Sumanth (@Sumanth_077) Fine-Tune 100+ LLMs without writing a single line of code! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Here's why it's a game changer for fine-tuning: • Fine-tune 100+ LLMs/VLMs with built-in templates (LLaMA, Gemma, Qwen, Mistral, DeepSeek, and more). • Zero-code CLI & Web UI for training, inference, merging, and evaluation. • Supports full-tuning, LoRA, QLoRA, freeze-tuning, PPO/DPO, OFT, reward modeling, and multi-modal fine-tuning. • Speeds up training/inference with FlashAttention-2, RoPE scaling, Liger Kernel, and vLLM backend. • Integrates experiment tracking via LlamaBoard, TensorBoard, Weights & Biases, MLflow, and SwanLab. It's 100% Open Source Link to the Github Repo in the comments! — https://nitter.net/Sumanth_077/status/2039701710659272775#m

→ View original post on X — @sumanth_077, 2026-04-02 13:50 UTC

2 April 2026
Github Repo: LlamaFactory – Advanced Language Model Fine-tuning Framework

By

@sumanth_077

–

02 April 2026 15h49

Github Repo: github.com/hiyouga/LlamaFact…

→ View original post on X — @sumanth_077, 2026-04-02 13:49 UTC

2 April 2026
LLaMA-Factory: Fine-Tune 100+ LLMs Without Code

By

@sumanth_077

–

02 April 2026 15h49

Fine-Tune 100+ LLMs without writing a single line of code! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Here's why it's a game changer for fine-tuning: • Fine-tune 100+ LLMs/VLMs with built-in templates (LLaMA, Gemma, Qwen, Mistral, DeepSeek, and more). • Zero-code CLI & Web UI for training, inference, merging, and evaluation. • Supports full-tuning, LoRA, QLoRA, freeze-tuning, PPO/DPO, OFT, reward modeling, and multi-modal fine-tuning. • Speeds up training/inference with FlashAttention-2, RoPE scaling, Liger Kernel, and vLLM backend. • Integrates experiment tracking via LlamaBoard, TensorBoard, Weights & Biases, MLflow, and SwanLab. It's 100% Open Source Link to the Github Repo in the comments!

→ View original post on X — @sumanth_077, 2026-04-02 13:49 UTC

2 April 2026
Build a Large Language Model from Scratch Repository

By

@sumanth_077

–

01 April 2026 15h22

If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/203… Sumanth (@Sumanth_077) Build a Large Language Model from scratch! This repository contains the code examples for developing, pretraining, and finetuning a LLM from scratch. It is the official codebase for the book Build a Large Language Model (From Scratch). Notebook examples are included for each chapter: Chapter 1: Understanding Large Language Models Chapter 2: Working with Text Data Chapter 3: Coding Attention Mechanisms Chapter 4: Implementing a GPT Model from Scratch Chapter 5: Pretraining on Unlabeled Data Chapter 6: Finetuning for Text Classification Chapter 7: Finetuning to Follow Instructions Link to the repo in the comments! — https://nitter.net/Sumanth_077/status/2039332313910383043#m

→ View original post on X — @sumanth_077, 2026-04-01 13:22 UTC

1 April 2026

←Previous Page

1 … 4 5 6 7 8 … 94

INNOVATION GENERATIVE AI RESEARCH LLMS TOOLS MACHINE LEARNING CODE MARKET TRENDS BUSINESS BIG TECH TECHNOLOGY ETHICS ENTERPRISE AI APPS SOFTWARE DATA COMPUTING AGENTS AUTOMATION POLICY OPEN SOURCE CULTURE REGULATION ECONOMY MULTIMODAL AI SOCIETY INVESTMENT CREATIVE AI EDUCATION AI HARDWARE SAFETY HARDWARE JOBS AGI PROMPT ENGINEERING STARTUPS INDUSTRY ROBOTICS WORKFORCE SECURITY CYBERSECURITY HEALTHCARE AI SYSTEMS SUSTAINABILITY WEB3 DECENTRALIZED AI

AI Dynamics

Global AI News Aggregator

About
Archives

Rechercher