负责开源项目

我们一直在努力践行拥抱开源，坚信开放和共享的力量，尽最大可能开源实验室科研项目，回馈社区，共同推动技术的发展和进步。

截至目前（2026年3月），我们团队主持开发的开源模型总下载量超7千万次，GitHub star数超2万次；参与贡献的开源项目获GitHub star总数超4万次。

感谢团队学生和合作伙伴的共同努力与贡献。

2026

Audio-Oscar: 基于多智能体协作的复杂音频生成模型
代码: https://github.com/ziye26/Audio-Oscar

MMAE: 语音编辑评测基准
代码: https://github.com/ddlBoJack/MMAE
数据集: https://huggingface.co/datasets/BoJack/MMAE

WavTTS: 基于波形的高质量零样本语音合成模型
代码: https://github.com/cwx-worst-one/WavTTS
模型: https://huggingface.co/worstchan/WavTTS

OpenSTBench: 多维度语音翻译评测工具包
代码: https://github.com/sjtuayj/OpenSTBench

X-ASR: 基于百万小时级的中英文流式轻量级语音识别系统
代码: https://github.com/Gilgamesh-J/X-ASR
模型: https://huggingface.co/GilgameshWind/X-ASR-zh-en
试用: https://stream-asr.sjtuxlance.com/

WavCube: 面向语音理解与生成统一的连续表征模型
代码: https://github.com/yanghaha0908/WavCube
模型: https://huggingface.co/yhaha/WavCube

X-Voice: 多语言语音合成模型(支持30种语言)
代码: https://github.com/sunnyxrxrx/X-Voice
模型: https://huggingface.co/XRXRX/X-Voice
试用: https://huggingface.co/spaces/chenxie95/X-Voice
数据集: https://huggingface.co/datasets/XRXRX/X-Voice-Dataset-Train

X-VC: 基于离散码本空间的零样本流式语音转换模型
代码: https://github.com/Jerrister/X-VC
模型: https://huggingface.co/chenxie95/X-VC
试用: https://x-vc.sjtuxlance.com/

FineLAP: 面向帧级别对齐的细粒度音频语言模型
代码: https://github.com/xiquan-li/FineLAP
模型: https://huggingface.co/AndreasXi/FineLAP
数据集: https://huggingface.co/datasets/AndreasXi/FineLAP-100k

SoulX-Duplug: 可插拔式全双工语音对话状态流式预测模型(流式语义VAD)
代码: https://github.com/Soul-AILab/SoulX-Duplug
模型: https://huggingface.co/Soul-AILab/SoulX-Duplug-0.6B
试用: https://soulx-duplug.sjtuxlance.com/

Resonate: 基于音频大模型在线反馈的音频生成模型
代码: https://github.com/xiquan-li/Resonate
模型: https://huggingface.co/AndreasXi/Resonate
试用: https://huggingface.co/spaces/chenxie95/Resonate

Audio ControlNet: 细粒度音频生成与编辑
代码: https://github.com/juhayna-zh/AudioControlNet
模型: https://huggingface.co/collections/juhayna/audio-controlnet
试用: https://huggingface.co/spaces/chenxie95/AudioControlNet

CLSP: 面向细粒度与多层次风格描述的语音—文本表征对齐模型
代码: https://github.com/yfyeung/CLSP
模型: https://huggingface.co/yfyeung/CLSP
数据集: https://huggingface.co/datasets/yfyeung/FCaps

Habibi-TTS: 阿拉伯语多方言语音合成模型
代码: https://github.com/SWivid/Habibi-TTS
模型: https://huggingface.co/SWivid/Habibi-TTS
试用: https://huggingface.co/spaces/chenxie95/Habibi-TTS

2025

Emotional Dialectal TTS: 带情感的方言语音合成模型
代码: https://github.com/the-bird-F/Expressive-Vectors
试用: https://the-bird-f.github.io/Expressive-Vectors/

X-Talk: 全双工，低延迟，易部署，级联对话系统
代码: https://github.com/xcc-zach/xtalk
试用: https://xtalk.sjtuxlance.com/

IMTalker: 高效语音或视频驱动数字人模型
代码: https://github.com/cbsjtu01/IMTalker
模型: https://huggingface.co/cbsjtu01/IMTalker
试用: https://huggingface.co/spaces/chenxie95/IMTalker

SAC: 基于声学和语义信息解耦的双流高效单层语音离散编码器
代码: https://github.com/Soul-AILab/SAC
模型: https://huggingface.co/Soul-AILab/SAC-16k-62_5Hz
试用: https://sac-codec.github.io/

AUV: 面向语音、声音和音乐的单层高效语音编码器
代码: https://github.com/SWivid/AUV
试用: https://swivid.github.io/AUV/

Omni-Captioner: 基于multi-agent的音频精细化描述和理解
代码: https://github.com/ddlBoJack/Omni-Captioner
模型: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Captioner
试用: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Captioner-Demo

UniVoice: 语音识别与语音合成统一模型
代码: https://github.com/gwh22/UniVoice
模型: https://huggingface.co/guanwenhao/univoice-all

Semantic-VAE: 融合语义信息引导信息的高质量语音编解码模型
代码: https://github.com/ZhikangNiu/Semantic-VAE
模型: https://huggingface.co/zkniu/Semantic-VAE

MeanAudio: 高质量音频生成模型
代码: https://github.com/xiquan-li/MeanAudio
模型: https://huggingface.co/AndreasXi/MeanAudio
试用: https://huggingface.co/spaces/chenxie95/MeanAudio

UltraVoice: 面向口语对话模型的细粒度风格可控语音对话规模化扩展
代码：https://github.com/bigai-nlco/UltraVoice
模型: https://huggingface.co/AndreasXi/MeanAudio
数据集: HuggingFace

A-DMA: 高效语音合成模型
代码：https://github.com/ZhikangNiu/A-DMA

MMAR: 通用音频深度推理评测集
代码：https://github.com/ddlBoJack/MMAR
数据集: HuggingFace

emovoice: 自然语言描述的情感语音合成模型
代码: https://github.com/yanghaha0908/EmoVoice
模型: HuggingFace

MagiCodec: 单层高质量语音编码器
代码: https://github.com/Ereboas/MagiCodec
模型: HuggingFace

E2E RAG for SLM: 支持端到端语音检索增强的语音对话模型
代码: https://github.com/the-bird-F/GLM-Voice-RAG
数据集: HuggingFace

VietASR: 越南语语音识别模型
代码: https://github.com/zzasdf/VietASR
模型: HuggingFace

URO-Bench: 面向端到端语音对话的评测基准
代码: https://github.com/Ruiqi-Yan/URO-Bench
数据集: HuggingFace

muQ: 音乐信号通用表征模型
代码: https://github.com/tencent-ailab/MuQ
模型: MuQ-large, MuQ-MuLan-large

2024

NDVQ: 鲁棒音频声码器
代码: https://github.com/ZhikangNiu/NDVQ

SLAM-LLM: 音频大模型工具包
代码: https://github.com/X-LANCE/SLAM-LLM

F5-TTS: 基于流匹配的语音合成模型
代码: https://github.com/SWivid/F5-TTS
模型: https://huggingface.co/SWivid/F5-TTS
试用: https://huggingface.co/spaces/mrfakename/E2-F5-TTS

EmoBox: 多语言通用语音情感评测工具包
代码：https://github.com/emo-box/EmoBox

Gigaspeech 2: 小语种(越南语、印尼语和泰语)语音识别数据集
代码：https://github.com/SpeechColab/GigaSpeech2
数据集: ModelScope | HuggingFace

emotion2vec: 通用语音情感表征模型
代码：https://github.com/ddlBoJack/emotion2vec
模型: emotion2vec_plus_large | emotion2vec_base_finetuned | emotion2vec_base

EAT: 音频信号通用表征模型
代码：https://github.com/cwx-worst-one/EAT
模型: HuggingFace

2023

FastHuBERT: 高效语音自监督学习
代码: https://github.com/yanghaha0908/FastHuBERT

MT4SSL: 语音自监督模型
代码: https://github.com/ddlBoJack/MT4SSL

Text2Animation:
代码: https://github.com/Moon0316/T2A

参与开源项目

OpenMOSS MOVA: 音视频生成模型
代码: https://github.com/OpenMOSS/MOVA
模型: https://huggingface.co/collections/OpenMOSS-Team/mova

FISHER: 工业声学信号基础模型
代码: https://github.com/jianganbai/FISHER
模型: https://huggingface.co/collections/jiangab/fisher

Spark-TTS: 基于大语言模型的语音合成模型
代码: https://github.com/SparkAudio/Spark-TTS
模型: https://huggingface.co/SparkAudio/Spark-TTS-0.5B

YuE: 开源音乐合成模型
代码: https://github.com/multimodal-art-projection/YuE
模型: https://huggingface.co/collections/m-a-p/yue

AniTalker: 数字人生成模型
代码: https://github.com/X-LANCE/AniTalker

StoryTTS: 基于评书的中文高表现力语音合成数据集
代码: https://github.com/X-LANCE/StoryTTS

VoiceFlow-TTS: 基于Rectified Flow的高效语音合成模型
代码: https://github.com/X-LANCE/VoiceFlow-TTS

陈谐

负责开源项目

2026

2025

2024

2023

参与开源项目