AI Dynamics

Global AI News Aggregator

About

@sumanth_077

  • Modulate’s Velma: Real-Time Deepfake Voice Detection API
    Modulate’s Velma: Real-Time Deepfake Voice Detection API

    If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/204… Sumanth (@Sumanth_077) Massive breakthrough in voice deepfake detection! @modulate_ai just released a deepfake detection API that topped @huggingface's leaderboard at 98.9% accuracy. Here's the problem with how most companies handle deepfake detection. They check the first 10 seconds of a call. If it passes, they assume the whole call is clean. Gate check. One scan. Done. Fraudsters know this. So they open the call with a real voice. Their own voice, a colleague, a quick recording. Pass the check. Then switch to the AI-generated clone mid-call. The system already gave them the green light. They're through. The fix is obvious. Monitor the entire call. Not just the opening. Not random spot checks. Every segment, continuously, in real-time. But that was too expensive. Until now. Velma is Modulate's real-time and batch deepfake detection API. Here's what changed. • Real-time streaming detection. Analyzes audio every 2 seconds during live calls. Catches mid-call voice switches instantly. • 120x cheaper than competitors. $0.25 per hour instead of $30-150. Now you can actually afford to monitor full conversations instead of spot-checking. • Only needs 2.5 seconds of audio. Faster detection, works with short segments. • 98.9% accuracy, ranked first on HuggingFace. Lower error rate than models 10x larger. First 1000 API credits are free. I've shared the link in the replies! — https://nitter.net/Sumanth_077/status/2040793953927311527#m

    → View original post on X — @sumanth_077, 2026-04-05 14:12 UTC

  • Modulate’s Velma API Achieves 98.9% Deepfake Detection Accuracy
    Modulate’s Velma API Achieves 98.9% Deepfake Detection Accuracy

    Massive breakthrough in voice deepfake detection! @modulate_ai just released a deepfake detection API that topped @huggingface's leaderboard at 98.9% accuracy. Here's the problem with how most companies handle deepfake detection. They check the first 10 seconds of a call. If it passes, they assume the whole call is clean. Gate check. One scan. Done. Fraudsters know this. So they open the call with a real voice. Their own voice, a colleague, a quick recording. Pass the check. Then switch to the AI-generated clone mid-call. The system already gave them the green light. They're through. The fix is obvious. Monitor the entire call. Not just the opening. Not random spot checks. Every segment, continuously, in real-time. But that was too expensive. Until now. Velma is Modulate's real-time and batch deepfake detection API. Here's what changed. • Real-time streaming detection. Analyzes audio every 2 seconds during live calls. Catches mid-call voice switches instantly. • 120x cheaper than competitors. $0.25 per hour instead of $30-150. Now you can actually afford to monitor full conversations instead of spot-checking. • Only needs 2.5 seconds of audio. Faster detection, works with short segments. • 98.9% accuracy, ranked first on HuggingFace. Lower error rate than models 10x larger. First 1000 API credits are free. I've shared the link in the replies!

    → View original post on X — @sumanth_077, 2026-04-05 14:09 UTC

  • VoxCPM: Open-Source Voice Cloning Without Tokenization
    VoxCPM: Open-Source Voice Cloning Without Tokenization

    If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/204… Sumanth (@Sumanth_077) Clone a human voice in real time without tokenization! VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens. Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation. VoxCPM skips tokenization entirely. It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4. The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details. This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines. Two flagship capabilities: 1. Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions. 2. Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing. Key features: • Tokenizer-free architecture with continuous speech modeling • Context-aware prosody generation without manual tuning • Zero-shot voice cloning from short reference audio • Streaming synthesis support for real-time applications • SFT and LoRA fine-tuning support It's 100% open source Link to the GitHub repo in the comments! — https://nitter.net/Sumanth_077/status/2040055394958286903#m

    → View original post on X — @sumanth_077, 2026-04-03 13:15 UTC

  • VoxCPM: Real-time Voice Cloning Without Tokenization
    VoxCPM: Real-time Voice Cloning Without Tokenization

    Clone a human voice in real time without tokenization! VoxCPM is an open-source text-to-speech system that models speech in continuous space instead of discrete tokens. Most TTS systems convert speech to discrete tokens before generation. This quantization creates a fundamental trade-off: tokens provide stability but lose acoustic details like breath, vocal texture, and subtle articulation. VoxCPM skips tokenization entirely. It models speech directly in continuous space using an end-to-end diffusion autoregressive architecture built on MiniCPM-4. The system uses hierarchical language modeling with two specialized components: a Text-Semantic Language Model that captures high-level prosody and structure, and a Residual Acoustic Model that recovers fine-grained acoustic details. This separation eliminates dependency on external speech tokenizers and prevents error accumulation from multi-stage pipelines. Two flagship capabilities: 1. Context-aware speech generation: The model comprehends text to infer appropriate prosody and speaking style. Explanations slow down naturally, emphasis appears in the right places, questions sound like questions. 2. Zero-shot voice cloning: With just 3-10 seconds of reference audio, it replicates speaker timbre, accent, emotional tone, rhythm, and pacing. Key features: • Tokenizer-free architecture with continuous speech modeling
    • Context-aware prosody generation without manual tuning
    • Zero-shot voice cloning from short reference audio
    • Streaming synthesis support for real-time applications
    • SFT and LoRA fine-tuning support It's 100% open source Link to the GitHub repo in the comments! [Translated from EN to English]

    → View original post on X — @sumanth_077, 2026-04-03 13:14 UTC

  • LLaMA-Factory: Fine-Tune 100+ LLMs Without Coding
    LLaMA-Factory: Fine-Tune 100+ LLMs Without Coding

    If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/203… Sumanth (@Sumanth_077) Fine-Tune 100+ LLMs without writing a single line of code! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Here's why it's a game changer for fine-tuning: • Fine-tune 100+ LLMs/VLMs with built-in templates (LLaMA, Gemma, Qwen, Mistral, DeepSeek, and more). • Zero-code CLI & Web UI for training, inference, merging, and evaluation. • Supports full-tuning, LoRA, QLoRA, freeze-tuning, PPO/DPO, OFT, reward modeling, and multi-modal fine-tuning. • Speeds up training/inference with FlashAttention-2, RoPE scaling, Liger Kernel, and vLLM backend. • Integrates experiment tracking via LlamaBoard, TensorBoard, Weights & Biases, MLflow, and SwanLab. It's 100% Open Source Link to the Github Repo in the comments! — https://nitter.net/Sumanth_077/status/2039701710659272775#m

    → View original post on X — @sumanth_077, 2026-04-02 13:50 UTC

  • LLaMA-Factory: Fine-Tune 100+ LLMs Without Code
    LLaMA-Factory: Fine-Tune 100+ LLMs Without Code

    Fine-Tune 100+ LLMs without writing a single line of code! LLaMA-Factory lets you train and fine-tune open-source LLMs and VLMs without writing any code. Here's why it's a game changer for fine-tuning: • Fine-tune 100+ LLMs/VLMs with built-in templates (LLaMA, Gemma, Qwen, Mistral, DeepSeek, and more). • Zero-code CLI & Web UI for training, inference, merging, and evaluation. • Supports full-tuning, LoRA, QLoRA, freeze-tuning, PPO/DPO, OFT, reward modeling, and multi-modal fine-tuning. • Speeds up training/inference with FlashAttention-2, RoPE scaling, Liger Kernel, and vLLM backend. • Integrates experiment tracking via LlamaBoard, TensorBoard, Weights & Biases, MLflow, and SwanLab. It's 100% Open Source Link to the Github Repo in the comments!

    → View original post on X — @sumanth_077, 2026-04-02 13:49 UTC

  • Build a Large Language Model from Scratch Repository
    Build a Large Language Model from Scratch Repository

    If you found it useful, reshare it with your network Follow me → @Sumanth_077 for more insights and tutorials on AI Engineering! nitter.net/Sumanth_077/status/203… Sumanth (@Sumanth_077) Build a Large Language Model from scratch! This repository contains the code examples for developing, pretraining, and finetuning a LLM from scratch. It is the official codebase for the book Build a Large Language Model (From Scratch). Notebook examples are included for each chapter: Chapter 1: Understanding Large Language Models Chapter 2: Working with Text Data Chapter 3: Coding Attention Mechanisms Chapter 4: Implementing a GPT Model from Scratch Chapter 5: Pretraining on Unlabeled Data Chapter 6: Finetuning for Text Classification Chapter 7: Finetuning to Follow Instructions Link to the repo in the comments! — https://nitter.net/Sumanth_077/status/2039332313910383043#m

    → View original post on X — @sumanth_077, 2026-04-01 13:22 UTC