Open-Source Project

We have been committed to embracing open source, firmly believing in the power of openness and sharing. We strive to open-source our lab’s research projects as much as possible, giving back to the community and jointly advancing technological development and progress.

2025

Omni-Captioner: Multi-agent based detail audio caption
Code: https://github.com/ddlBoJack/Omni-Captioner
Model: https://huggingface.co/ddlBoJack/Omni-Captioner
Demo: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Captioner-Demo

Semantic-VAE: Semantic-alignment latent representation for better speech synthesis
Code: https://github.com/ZhikangNiu/Semantic-VAE
Model: https://huggingface.co/zkniu/Semantic-VAE

MeanAudio: Audio generation with Mean Flow
Code: https://github.com/xiquan-li/MeanAudio
Model: https://huggingface.co/AndreasXi/MeanAudio
Demo: https://huggingface.co/spaces/chenxie95/MeanAudio

A-DMA: Efficient F5-TTS model
Code：https://github.com/ZhikangNiu/A-DMA

MMAR: Benchmark for deep reasoning on audio signal
Code：https://github.com/ddlBoJack/MMAR
Dataset: HuggingFace

emovoice: Emotional TTS model with natural language descriptions
Code: https://github.com/yanghaha0908/EmoVoice
Model: HuggingFace

MagiCodec: Single-layer high-quality speech codec
Code: https://github.com/Ereboas/MagiCodec
Model: HuggingFace

E2E RAG for SLM: End-to-end RAG for speech language model
Code: https://github.com/the-bird-F/GLM-Voice-RAG
Dataset: HuggingFace

VietASR: High-quality Vietnamese speech recognition model
Code: https://github.com/zzasdf/VietASR
Model: HuggingFace

URO-Bench: Comprehensive Benchmark for end-to-end spoken dialogue
Code: https://github.com/Ruiqi-Yan/URO-Bench
Dataset: HuggingFace

muQ: Universal music signal representation model
Code: https://github.com/tencent-ailab/MuQ
Model: MuQ-large, MuQ-MuLan-large
models on HuggingFace have been downloaded over 1 million times.

2024

NDVQ: Robust neural audio codec
Code: https://github.com/ZhikangNiu/NDVQ

SLAM-LLM: Open-source toolkit for audio foundation model
Code: https://github.com/X-LANCE/SLAM-LLM

F5-TTS: Flow-matching based TTS model
Code: https://github.com/SWivid/F5-TTS
Model: https://huggingface.co/SWivid/F5-TTS
Demo: https://huggingface.co/spaces/mrfakename/E2-F5-TTS
models on HuggingFace have been downloaded over 10 million times.

EmoBox: Evaluation toolkit for multilingual speech emotion recognition
Code：https://github.com/emo-box/EmoBox

Gigaspeech 2: Large-scale speech dataset for low-resource languages (Vietnamese, Indonesian, and Thai)
Code：https://github.com/SpeechColab/GigaSpeech2
Dataset: ModelScope | HuggingFace
dataset have been downloaded over 60,000 times.

emotion2vec: Universal speech emotion representation model
Code：https://github.com/ddlBoJack/emotion2vec
Model: emotion2vec_plus_large | emotion2vec_base_finetuned | emotion2vec_base
models on ModelScope have been downloaded over 1.8 million times

EAT: Universal audio signal representation model
Code：https://github.com/cwx-worst-one/EAT
Model: HuggingFace

2023

FastHuBERT: Efficient training framework for self-supervised speech representation learning
Code: https://github.com/yanghaha0908/FastHuBERT

MT4SSL: Multi-task self-supervised speech model
Code: https://github.com/ddlBoJack/MT4SSL

Text2Animation:
Code: https://github.com/Moon0316/T2A

Participating in open-source projects

FISHER: A Foundation Model for Industrial Signal Comprehensive Representation
Code: https://github.com/jianganbai/FISHER

Spark-TTS: LLM-based TTS model
Code: https://github.com/SparkAudio/Spark-TTS

YuE: Open-source foundation models for music generation
Code: https://github.com/multimodal-art-projection/YuE

AniTalker: Lifelike talking faces model
Code: https://github.com/X-LANCE/AniTalker

StoryTTS: Highly expressive text-to-speech dataset from Mandarin storytelling show
Code: https://github.com/X-LANCE/StoryTTS

VoiceFlow-TTS: Efficient Text-to-Speech with Rectified Flow Matching
Code: https://github.com/X-LANCE/VoiceFlow-TTS

Xie Chen

Open-Source Project

2025

2024

2023

Participating in open-source projects