Products/AI Voice Agent Infrastructure/MiMo-V2.5 Voice

MiMo-V2.5 Voice

Bilingual ASR for dialects, code-switching, and songs

AI Voice Agent InfrastructureFounded 2026Eight Chinese dialects natively supported (Wu, Cantonese, Hokkien, Sichuanese)Chinese-English code-switching with no language tagsLyrics transcription under accompaniment and pitch variationMulti-speaker and noisy environment robustnessNative punctuation, no post-processing neededMIT license, Python API, Gradio demo, self-hostable

Visit MiMo-V2.5 Voice →

Our Take

Xiaomi just dropped an 8B open-source speech model that actually competes with Whisper on accuracy. MiMo-2.5-ASR handles eight Chinese dialects, code-switched Chinese-English speech, AND song lyrics — no language-tagging post-processing required. The numbers back it up: 5.73% WER on English versus Whisper's 7.44%, 19.55% on Wu dialect versus FunASR's 29.08%, and 3.95% on lyrics. MIT licensed, free, self-hostable — and it addresses what the benchmark babies won't tell you: most ASR models look amazing on clean studio data and then quietly fail in production where audio is noisy, speakers overlap, and people switch languages mid-sentence. This is the move for voice product teams building bilingual or Chinese-language pipelines who need accuracy that actually holds up outside the lab.

An 8B open-source speech recognition model from Xiaomi that transcribes Mandarin, English, eight Chinese dialects, code-switched speech, and song lyrics. Built for ML engineers, researchers, and developers building real-world voice applications.

Problem It Solves

Most ASR models are benchmarked on clean studio data but deployed into the real world where audio is noisy, speakers overlap, and people switch languages mid-sentence. The gap between benchmark accuracy and production accuracy is where voice products quietly fail.

Target Customer

ML engineers and voice product teams building bilingual or Chinese-language transcription pipelines who need accuracy that holds up outside the lab.

Use Cases

Location-based audio guides with multilingual support, Travel apps requiring dialect understanding, Bilingual transcription pipelines, Chinese-language voice products

Pricing Details

Open source with MIT license; self-hosting eliminates per-call API costs

Free Tier

true

Differentiator

On Open ASR Leaderboard: 5.73% average WER on English (vs Whisper large-v3 at 7.44%), 19.55% on Wu dialect (vs FunASR-1.5 at 29.08%), 3.95% on lyrics (vs Gemini 2.5 Pro at 4.25%)

Why Now

Open-source ASR has been catching up to closed models for years. MiMo-2.5-ASR demonstrates the gap is now very small, and in some scenarios gone.

Traction

Notable Metrics: 114 points, 7 Day Rank, 110 followers

Key Facts

The people behind MiMo-V2.5 Voice

Rohan Chaubey

profile

Maker

Links

Website GitHub Source: product-hunt

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.