Products/AI/ML - Text-to-Speech/VoxCPM

VoxCPM

Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

AI/ML - Text-to-Speechttsmultilingualvoice-cloningcreative-designReviewed
VoxCPM

Our Take

VoxCPM2 is the tokenizer-free TTS model that makes traditional text-to-speech look like a toy. Most TTS systems rely on tokenizers—they break text into discrete units, then convert those units back to speech. That middleman step introduces errors, limits expressiveness, and honestly sounds robotic in ways users have learned to accept. VoxCPM skips all of that. It generates speech directly from raw text, which means multilingual support comes without the usual language-specific tokenizer headaches. Want to clone a voice? VoxCPM2 does true-to-life voice cloning. Want to design a completely creative voice that never existed? It handles that too.

OpenBMB built this. They're the team behind the CPM large language model series, and they specialize in making open-source AI tools that actually work. VoxCPM2 is their answer to a simple question: why are we still pretending that tokenization is necessary for good speech synthesis? The answer is: it's not. And once you hear VoxCPM2, you'll wonder why everyone's been doing it the hard way.

It's on GitHub, it's open, it's tradable. This is the kind of project that makes the closed-source TTS giants nervous.

A tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, bypassing discrete tokenization to achieve highly natural and expressive synthesis

Key Features
Tokenizer-free diffusion autoregressive architecture, 30-language multilingual support (no language tag needed), Voice Design - create brand-new voice from natural-language description, Controllable Voice Cloning from short reference clip with style guidance, Ultimate Cloning - reproduce every vocal nuance with reference audio and transcript, 48kHz studio-quality audio output (from 16kHz input), Context-aware synthesis - infers prosody from text content, Real-time streaming (RTF ~0.3 on RTX 4090, ~0.13 with Nano-vLLM), Open-source Apache-2.0 license for commercial use, MiniCPM-4 backbone with 2B parameters
Problem It Solves
Enables realistic multilingual speech synthesis without language tags, creates new voices from text descriptions, clones voices from short reference clips, and produces studio-quality 48kHz audio
Target Customer
Developers, companies, and researchers needing text-to-speech capabilities for applications including AI assistants, content creation, accessibility tools, and commercial products
Use Cases
Multilingual speech synthesis, Creative voice design for characters/assistants, Voice cloning for personalization, Studio-quality audio production, AI assistant voice output, Accessibility applications
Pricing Details
Open-source, free for commercial use under Apache-2.0 license
Free Tier
true
Differentiator
Tokenizer-free architecture enabling direct generation of continuous speech representations; 48kHz output without external upsampler; Voice Design from text descriptions without reference audio; Nano-vLLM acceleration with PagedAttention and OpenAI-compatible API
Why Now
Released April 2026 as VoxCPM2 with 2B parameters, 30 languages, Voice Design & Controllable Voice Cloning, and 48kHz audio output
Traction
Notable Metrics: 28.6k stars, 3.2k forks, 147 commits · Press Mentions: #1 GitHub Trending (December 2025), #1 HuggingFace Trending (September 2025)

Key Facts

Category
AI/ML - Text-to-Speech
Discovered via
github-trending

The people behind VoxCPM

M

Muyleang Ing

profile

Developer

👋 Hi, I'm Ing Muyleang - DevOps Engineering research student @ISTAD - CS graduate from RUPP - Quantum Lab researcher @PKNU - M.S. AI Convergence (ongoing)

Y

Yixuan Zhou (周逸轩)

profile

Developer

Focus on TTS/Speech/NLP. El Psy Congroo

Z

Developer

Wow~ You can really dance!

x

xliucs

profile

Developer

A master student of Beijing Language and Culture Universty

Links

Browse by category

Similar products worth knowing

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

VoxCPM — SLAYREPORT