Open-Source Project
We have been committed to embracing open source, firmly believing in the power of openness and sharing. We strive to open-source our lab’s research projects as much as possible, giving back to the community and jointly advancing technological development and progress.
As of March 2026, our open-source models have reached 70M+ downloads and 20K+ GitHub stars, thanks to the efforts of our outstanding students and collaborators.
2026
SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation
Code: https://github.com/Soul-AILab/SoulX-Duplug
Model: https://huggingface.co/Soul-AILab/SoulX-Duplug-0.6B
Demo: https://soulx-duplug.sjtuxlance.com/
Resonate: Reinforcing Text-to-Audio Generation with Online Feedbacks from Large Audio Language Models
Code: https://github.com/xiquan-li/Resonate
Model: https://huggingface.co/AndreasXi/Resonate
Demo: https://huggingface.co/spaces/chenxie95/Resonate
Audio ControlNet: Audio ControlNet for Fine-Grained Audio Generation and Editing
Code: https://github.com/juhayna-zh/AudioControlNet
Model: https://huggingface.co/collections/juhayna/audio-controlnet
Demo: https://huggingface.co/spaces/chenxie95/AudioControlNet
CLSP: Fine-Grained and Multi-Level Speech–Text Style Representation Alignment Model
Code: https://github.com/yfyeung/CLSP
Model: https://huggingface.co/yfyeung/CLSP
Dataset: https://huggingface.co/datasets/yfyeung/FCaps
Habibi-TTS: Multidialect Arabic TTS Model
Code: https://github.com/SWivid/Habibi-TTS
Model: https://huggingface.co/SWivid/Habibi-TTS
Demo: https://huggingface.co/spaces/chenxie95/Habibi-TTS
2025
Emotional Dialectal TTS: Toward emotionally expressive dialectal speech Synthesis
Code: https://github.com/the-bird-F/Expressive-Vectors
Demo: https://the-bird-f.github.io/Expressive-Vectors/
X-Talk: Full-duplex, low-latency, cascaded spoken dialogue system
Code: https://github.com/xcc-zach/xtalk
Demo: https://xtalk.sjtuxlance.com/
IMTalker: Efficient audio-driven talking face generation with implicit motion transfer
Code: https://github.com/cbsjtu01/IMTalker
Model: https://huggingface.co/cbsjtu01/IMTalker
Demo: https://huggingface.co/spaces/chenxie95/IMTalker
SAC: Neural speech codec with semantic-acoustic dual-stream quantization
Code: https://github.com/Soul-AILab/SAC
Model: https://huggingface.co/Soul-AILab/SAC-16k-62_5Hz
Demo: https://sac-codec.github.io/
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
Code: https://github.com/SWivid/AUV
Demo: https://swivid.github.io/AUV/
Omni-Captioner: Multi-agent based detail audio caption
Code: https://github.com/ddlBoJack/Omni-Captioner
Model: https://huggingface.co/ddlBoJack/Omni-Captioner
Demo: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Captioner-Demo
Semantic-VAE: Semantic-alignment latent representation for better speech synthesis
Code: https://github.com/ZhikangNiu/Semantic-VAE
Model: https://huggingface.co/zkniu/Semantic-VAE
MeanAudio: Audio generation with Mean Flow
Code: https://github.com/xiquan-li/MeanAudio
Model: https://huggingface.co/AndreasXi/MeanAudio
Demo: https://huggingface.co/spaces/chenxie95/MeanAudio
UltraVoice: Scaling fine-grained style-controlled speech conversations for spoken dialogue models
Code:https://github.com/bigai-nlco/UltraVoice
Model: https://huggingface.co/AndreasXi/MeanAudio
Dataset: HuggingFace
A-DMA: Efficient F5-TTS model
Code:https://github.com/ZhikangNiu/A-DMA
MMAR: Benchmark for deep reasoning on audio signal
Code:https://github.com/ddlBoJack/MMAR
Dataset: HuggingFace
emovoice: Emotional TTS model with natural language descriptions
Code: https://github.com/yanghaha0908/EmoVoice
Model: HuggingFace
MagiCodec: Single-layer high-quality speech codec
Code: https://github.com/Ereboas/MagiCodec
Model: HuggingFace
E2E RAG for SLM: End-to-end RAG for speech language model
Code: https://github.com/the-bird-F/GLM-Voice-RAG
Dataset: HuggingFace
VietASR: High-quality Vietnamese speech recognition model
Code: https://github.com/zzasdf/VietASR![]()
Model: HuggingFace
URO-Bench: Comprehensive Benchmark for end-to-end spoken dialogue
Code: https://github.com/Ruiqi-Yan/URO-Bench
Dataset: HuggingFace
muQ: Universal music signal representation model
Code: https://github.com/tencent-ailab/MuQ![]()
Model: MuQ-large, MuQ-MuLan-large
models on HuggingFace have been downloaded over 9 million times.
2024
NDVQ: Robust neural audio codec
Code: https://github.com/ZhikangNiu/NDVQ
SLAM-LLM: Open-source toolkit for audio foundation model
Code: https://github.com/X-LANCE/SLAM-LLM
F5-TTS: Flow-matching based TTS model
Code: https://github.com/SWivid/F5-TTS![]()
Model: https://huggingface.co/SWivid/F5-TTS
Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
models on HuggingFace have been downloaded over 10 million times.
EmoBox: Evaluation toolkit for multilingual speech emotion recognition
Code:https://github.com/emo-box/EmoBox
Gigaspeech 2: Large-scale speech dataset for low-resource languages (Vietnamese, Indonesian, and Thai)
Code:https://github.com/SpeechColab/GigaSpeech2
Dataset: ModelScope | HuggingFace
dataset have been downloaded over 100,000 times.
emotion2vec: Universal speech emotion representation model
Code:https://github.com/ddlBoJack/emotion2vec
Model: emotion2vec_plus_large | emotion2vec_base_finetuned | emotion2vec_base
models on ModelScope have been downloaded over 50 million times
EAT: Universal audio signal representation model
Code:https://github.com/cwx-worst-one/EAT
Model: HuggingFace
models on HuggingFace have been downloaded over 900K times
2023
FastHuBERT: Efficient training framework for self-supervised speech representation learning
Code: https://github.com/yanghaha0908/FastHuBERT
MT4SSL: Multi-task self-supervised speech model
Code: https://github.com/ddlBoJack/MT4SSL
Text2Animation:
Code: https://github.com/Moon0316/T2A
Participating in open-source projects
OpenMOSS MOVA: Towards Scalable and Synchronized Video–Audio Generation
Code: https://github.com/OpenMOSS/MOVA
Model: https://huggingface.co/collections/OpenMOSS-Team/mova
FISHER: A Foundation Model for Industrial Signal Comprehensive Representation
Code: https://github.com/jianganbai/FISHER
Model: https://huggingface.co/collections/jiangab/fisher
Spark-TTS: LLM-based TTS model
Code: https://github.com/SparkAudio/Spark-TTS
Model: https://huggingface.co/SparkAudio/Spark-TTS-0.5B
YuE: Open-source foundation models for music generation
Code: https://github.com/multimodal-art-projection/YuE
Model: https://huggingface.co/collections/m-a-p/yue
AniTalker: Lifelike talking faces model
Code: https://github.com/X-LANCE/AniTalker
StoryTTS: Highly expressive text-to-speech dataset from Mandarin storytelling show
Code: https://github.com/X-LANCE/StoryTTS
VoiceFlow-TTS: Efficient Text-to-Speech with Rectified Flow Matching
Code: https://github.com/X-LANCE/VoiceFlow-TTS