Open-Source Project

We have been committed to embracing open source, firmly believing in the power of openness and sharing. We strive to open-source our lab’s research projects as much as possible, giving back to the community and jointly advancing technological development and progress.

2025

MMAR: Benchmark for deep reasoning on audio signal
Code:https://github.com/ddlBoJack/MMAR MMAR
Dataset: HuggingFace

emovoice: Emotional TTS model with natural language descriptions
Code: https://github.com/yanghaha0908/EmoVoice EmoVoice
Model: HuggingFace

MagiCodec: Single-layer high-quality speech codec
Code: https://github.com/Ereboas/MagiCodec MagiCodec
Model: HuggingFace

VietASR: High-quality Vietnamese speech recognition model
Code: https://github.com/zzasdf/VietASR VietASR
Model: HuggingFace

URO-Bench: Comprehensive Benchmark for end-to-end spoken dialogue
Code: https://github.com/Ruiqi-Yan/URO-Bench URO-Bench
Dataset: HuggingFace

muQ: Universal music signal representation model
Code:https://github.com/tencent-ailab/MuQ MuQ
模型:MuQ-large, MuQ-MuLan-large
models on HuggingFace have been downloaded over 0.7 million times.

2024

NDVQ: Robust neural audio codec
Code: https://github.com/ZhikangNiu/NDVQ NDVQ

SLAM-LLM: Open-source toolkit for audio foundation model
Code: https://github.com/X-LANCE/SLAM-LLM SLAM-LLM

F5-TTS: Flow-matching based TTS model
Code: https://github.com/SWivid/F5-TTS F5-TTS
Model: https://huggingface.co/SWivid/F5-TTS
Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
models on HuggingFace have been downloaded over 8 million times.

EmoBox: Evaluation toolkit for multilingual speech emotion recognition
Code:https://github.com/emo-box/EmoBox EmoBox

Gigaspeech 2: Large-scale speech dataset for low-resource languages (Vietnamese, Indonesian, and Thai)
Code:https://github.com/SpeechColab/GigaSpeech2 GigaSpeech2
Dataset: ModelScope | HuggingFace
dataset have been downloaded over 50,000 times.

emotion2vec: Universal speech emotion representation model
Code:https://github.com/ddlBoJack/emotion2vec emotion2vec
Model: emotion2vec_plus_large | emotion2vec_base_finetuned | emotion2vec_base
models on ModelScope have been downloaded over 1.4 million times

EAT: Universal audio signal representation model
Code:https://github.com/cwx-worst-one/EAT EAT
Model: HuggingFace

2023

FastHuBERT: Efficient training framework for self-supervised speech representation learning
Code: https://github.com/yanghaha0908/FastHuBERT FastHuBERT

MT4SSL: Multi-task self-supervised speech model
Code: https://github.com/ddlBoJack/MT4SSL MT4SSL

Text2Animation:
Code: https://github.com/Moon0316/T2A Text2Animation

Participating in open-source projects

FISHER: A Foundation Model for Industrial Signal Comprehensive Representation
Code: https://github.com/jianganbai/FISHER FISHER

Spark-TTS: LLM-based TTS model
Code: https://github.com/SparkAudio/Spark-TTS Spark-TTS

YuE: Open-source foundation models for music generation
Code: https://github.com/multimodal-art-projection/YuE YuE

AniTalker: Lifelike talking faces model
Code: https://github.com/X-LANCE/AniTalker AniTalker

StoryTTS: Highly expressive text-to-speech dataset from Mandarin storytelling show
Code: https://github.com/X-LANCE/StoryTTS StoryTTS

VoiceFlow-TTS: Efficient Text-to-Speech with Rectified Flow Matching
Code: https://github.com/X-LANCE/VoiceFlow-TTS VoiceFlow-TTS