Open-Source Project
We have been committed to embracing open source, firmly believing in the power of openness and sharing. We strive to open-source our lab’s research projects as much as possible, giving back to the community and jointly advancing technological development and progress.
2025
MMAR: Benchmark for deep reasoning on audio signal
Code:https://github.com/ddlBoJack/MMAR
Dataset: HuggingFace
emovoice: Emotional TTS model with natural language descriptions
Code: https://github.com/yanghaha0908/EmoVoice
Model: HuggingFace
MagiCodec: Single-layer high-quality speech codec
Code: https://github.com/Ereboas/MagiCodec
Model: HuggingFace
VietASR: High-quality Vietnamese speech recognition model
Code: https://github.com/zzasdf/VietASR![]()
Model: HuggingFace
URO-Bench: Comprehensive Benchmark for end-to-end spoken dialogue
Code: https://github.com/Ruiqi-Yan/URO-Bench
Dataset: HuggingFace
muQ: Universal music signal representation model
Code:https://github.com/tencent-ailab/MuQ![]()
模型:MuQ-large, MuQ-MuLan-large
models on HuggingFace have been downloaded over 0.7 million times.
2024
NDVQ: Robust neural audio codec
Code: https://github.com/ZhikangNiu/NDVQ
SLAM-LLM: Open-source toolkit for audio foundation model
Code: https://github.com/X-LANCE/SLAM-LLM
F5-TTS: Flow-matching based TTS model
Code: https://github.com/SWivid/F5-TTS![]()
Model: https://huggingface.co/SWivid/F5-TTS
Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
models on HuggingFace have been downloaded over 8 million times.
EmoBox: Evaluation toolkit for multilingual speech emotion recognition
Code:https://github.com/emo-box/EmoBox
Gigaspeech 2: Large-scale speech dataset for low-resource languages (Vietnamese, Indonesian, and Thai)
Code:https://github.com/SpeechColab/GigaSpeech2
Dataset: ModelScope | HuggingFace
dataset have been downloaded over 50,000 times.
emotion2vec: Universal speech emotion representation model
Code:https://github.com/ddlBoJack/emotion2vec
Model: emotion2vec_plus_large | emotion2vec_base_finetuned | emotion2vec_base
models on ModelScope have been downloaded over 1.4 million times
EAT: Universal audio signal representation model
Code:https://github.com/cwx-worst-one/EAT
Model: HuggingFace
2023
FastHuBERT: Efficient training framework for self-supervised speech representation learning
Code: https://github.com/yanghaha0908/FastHuBERT
MT4SSL: Multi-task self-supervised speech model
Code: https://github.com/ddlBoJack/MT4SSL
Text2Animation:
Code: https://github.com/Moon0316/T2A
Participating in open-source projects
FISHER: A Foundation Model for Industrial Signal Comprehensive Representation
Code: https://github.com/jianganbai/FISHER
Spark-TTS: LLM-based TTS model
Code: https://github.com/SparkAudio/Spark-TTS
YuE: Open-source foundation models for music generation
Code: https://github.com/multimodal-art-projection/YuE
AniTalker: Lifelike talking faces model
Code: https://github.com/X-LANCE/AniTalker
StoryTTS: Highly expressive text-to-speech dataset from Mandarin storytelling show
Code: https://github.com/X-LANCE/StoryTTS
VoiceFlow-TTS: Efficient Text-to-Speech with Rectified Flow Matching
Code: https://github.com/X-LANCE/VoiceFlow-TTS