Publications

You can also find my updated publications on my Google Scholar profile.

E3 TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications
Zheng Liang, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen
IEEE/ACM TASLP, 2024

CTC-Assisted LLM-Based Contextual ASR
Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen
In Proc. SLT, 2024

NDVQ: Robust neural audio codec with normal distribution-based vector quantization
Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu
In Proc. SLT, 2024

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu
In Proc. SLT, 2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu
In Proc. ACM MM, 2024

On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu
In Proc. INTERSPEECH, 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen
In Proc. INTERSPEECH, 2024

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan
In Proc. INTERSPEECH, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin
In Proc. INTERSPEECH, 2024

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain
In Proc. INTERSPEECH, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen
In Proc. INTERSPEECH, 2024

1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip Woodland, Xie Chen, Huy Phan, Thomas Hain
Odyssey 2024

Semi-supervised Acoustic Scene Classification with Test-Time Adaptation
Wen Huang, Anbai Jiang, Bing Han, Xinhu Zheng, Yihong Qiu, Wenxi Chen, Yuzhe Liang, Pingyi Fan, Wei-Qiang Zhang, Cheng Lu, Xie Chen, Jia Liu, Yanmin Qian
In ICME Workshop, 2024

Improving Acoustic Scene Classification via Self-Supervised and Semi-Supervised Learning with Efficient Audio Transformer
Yuzhe Liang, Wenxi Chen, Yihong Qiu, Xinhu Zheng, Boyuan Chen, Jia Liu, Wei-Qiang Zhang, Cheng Lu, Xie Chen
In ICME Workshop, 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen
In Proc. INTERSPEECH, 2024

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen
In Proc. INTERSPEECH, 2024

Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Junzhe Liu, Jianwei Yu, Xie Chen
In Proc. INTERSPEECH, 2024

emotion2vec: Self-supervised pre-training for speech emotion representation
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen
In Findings of ACL, 2024

Exploring Generation of Pronunciation Lexicon for Low-Resource Language Automatic Speech Recognition Based on Generic Phone Recognizer
Jinpeng Li, Xie Chen, Weiqiang Zhang
Journal of Shanghai Jiaotong University (Science), 2024

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li, Yiwei Guo, Xie Chen, Kai Yu
In Proc. ICASSP, 2024

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Sen Liu, Yiwei Guo, Xie Chen, Kai Yu
In Proc. ICASSP, 2024

Acoustic BPE for speech generation with discrete tokens
Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
In Proc. ICASSP, 2024

Leveraging speech PTM, text LLM, and emotional TTS for speech emotion recognition
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen
In Proc. ICASSP, 2024

Towards universal speech discrete tokens: A case study for ASR and TTS
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen
In Proc. ICASSP, 2024

Voiceflow: Efficient text-to-speech with rectified flow matching
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu
In Proc. ICASSP, 2024

UniCATS: A unified context-aware text-to-speech framework with contextual VQ-diffusion and vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu
In Proc. AAAI, 2024

BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath
In Proc. ICML, 2024

Advanced long-content speech recognition with factorized neural transducer
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian
IEEE/ACM TASLP, 2024

EAT: Self-supervised pre-training with efficient audio transformer
Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen
In Proc. IJCAI, 2024

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
In Proc. ASRU, 2023

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
In Proc. ASRU, 2023

Speaker Adaptive Text-to-Speech with Timbre-Normalized Vector-Quantized Feature
Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu
In IEEE/ACM TASLP, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian
In Proc. ACM MM, 2023

Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen
In Proc. INTERSPEECH, 2023

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen
In Proc. INTERSPEECH, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
In Proc. INTERSPEECH, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen
In Proc. INTERSPEECH, 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen
In Proc. INTERSPEECH, 2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu
In Proc. INTERSPEECH, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu, Yiwei Guo, Chengpeng Du, Xie Chen, Kai Yu
In Proc. INTERSPEECH, 2023

An Adapter Based Multi-Label Pre-Training for Speech Separation and Enhancement
Tianrui Wang, Xie Chen, Zhuo Chen, Shu Yu, Weibin Zhu
Proc. ICASSP, 2023

Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR
Xun Gong, Wei Wang, Hang Shao, Xie Chen, Yanmin Qian
Proc. ICASSP, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
Proc. ICASSP, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian
Proc. ICASSP, 2023

Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition
Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng
Proc. ICASSP, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Kai Yu, Xie Chen
Proc. ICASSP, 2023

Internal language model adaptation with text-only data for end-to-end speech recognition
Z Meng, Y Gaur, N Kanda, J Li, X Chen, Y Wu, Y Gong
Proc. INTERSPEECH, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu
Proc. INTERSPEECH, 2022

Factorized neural transducer for efficient language model adaptation
Xie Chen, Zhong Meng, S Parthasarathy, Jinyu Li
Proc. ICASSP, 2022

2021 and Before

Memory-efficient pipeline-parallel DNN training
D Narayanan, A Phanishayee, K Shi, X Chen, M Zaharia
Proc. ICML, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS
Y Deng, R Zhao, Z Meng, X Chen, B Liu, J Li, Y Gong, L He
Proc. INTERSPEECH, 2021

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
X. Chen, Y. Wu, Z. Wang, S. Liu, J. Li
Proc. ICASSP, 2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition
Z. Meng, N. Kanda, Y. Gaur, S. Parthasarathy, E. Sun, L. Lu, X. Chen, J. Li, Y. Gong
Proc. IEEE ICASSP, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Z. Meng, S. Parthasarathy, E. Sun, Y. Gaur, N. Kanda, L. Lu, X. Chen, R. Zhao, J. Li, Y. Gong
Proc. IEEE SLT, 2020

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition
X. Chen, S. Parthasarathy, W. Gale, S. Chang, M. Zeng
arXiv preprint arXiv:2010.11349, 2020

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers
J. Xu, X. Chen, S. Hu, J. Yu, X. Liu, H. Meng
Proceedings of ICASSP, 2020

Exploiting Future Word Contexts in Neural Network Language Model
X. Chen, X. Liu, Y. Wang, A. Ragni, M. Gales
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2019

Long-span language modeling for speech recognition
S. Parthasarathy, W. Gale, X. Chen, G. Polovets, S. Chang
arXiv preprint arXiv:1911.04571, 2019

Investigation of Sampling Techniques for Maximum Entropy Language Modeling Training
X. Chen, J. Zhang, T. Anastasakos, F. Alleva
Proceedings of ICASSP, 2019

Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition
M. Lam, X. Chen, S. Hu, J. Yu, X. Liu, H. Meng
Proceedings of ICASSP, 2019

Recurrent Neural Network Language Models Training using Natural Gradient
J. Yu, M. Lam, X. Chen, S. Hu, S. Liu, X. Wu, X. Liu, H. Meng
Proceedings of ICASSP, 2019

Active Memory Networks for Language Modeling
O. Chen, A. Ragni, M.J.F. Gales and X. Chen
Proceedings of INTERSPEECH, 2018

The Effect of Adding Authorship Knowledge in Automated Text Scoring
M. Zhang, X. Chen, R. Cummins, Q. Andersen and T. Briscoe
Workshop of BEA in NAACL, 2018

Limited-memory BFGS Optimization of Recurrent Neural Network Language Models For Speech Recognition
X. Liu, S. Liu, J. Sha, J. Yu, Z Xu, X. Chen, H. Meng
In Proceedings of ICASSP, 2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription
Y. Wang, X. Chen, M.J.F. Gales, A. Ragni, J. Wong
In Proceedings of ICASSP, 2018

Neural Network Language Modeling with Letter-based Features and Importance Sampling
H. Xu, K. Li, Y. Wang, J. Wang, S. Kang, X. Chen, D. Povey, S. Khudanpur
Proceedings of ICASSP, 2018

Future Word Context in Neural Network Language Model
X. Chen, X. Liu, A. Ragni, Y. Wang, M.J.F. Gales
Proceedings of ASRU, 2017

Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
X. Chen, A. Ragni, X. Liu, M.J.F. Gales
Proceedings of INTERSPEECH, 2017

Recurrent Neural Network Language Models for Keyword Search
X. Chen, A. Ragni, J. Vasilakes, X. Liu, K. Knill, M.J.F. Gales
Proceedings of ICASSP, 2017

Efficient Training and Evaluation of Recurrent Neural Network Language Models for Speech Recognition
X. Chen, X. Liu, Y. Wang, M. J. F. Gales and P. C. Woodland
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2016

Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models
X. Liu, X. Chen, Y. Wang, M. J. F. Gales and P. C. Woodland
IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016

Multi-Language Neural Network Language Models
A. Ragni, E. Dakin, X. Chen, M.J. F. Gales and K.M. Knill
Proceedings of INTERSPEECH, 2016

CUED-RNNLM – An Open-Source Toolkit for Efficient Training and Evaluation of Recurrent Neural Network Language Models
X. Chen, X. Liu, Y. Qian, M.J.F. Gales and P.C. Woodland
Proceedings of ICASSP, 2016

Investigation of back-off based interpolation between Recurrent Neural Network and N-Gram Language Models
X. Chen, X. Liu, M.J.F. Gales and P.C. Woodland
Proceedings of ASRU, 2015

Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition
X. Chen, T. Tan, X. Liu, P. Lancheantin, M. Wan, M.J.F. Gales and P.C. Woodland
Proceedings of INTERSPEECH, 2015

Improving the Training and Evaluation Efficiency of Recurrent Neural Network Language Models
X. Chen, X. Liu, M.J.F. Gales, P.C. Woodland
Proceedings of ICASSP, 2015

Recurrent Neural Network Language Model Training with Noise Contrastive Estimation for Speech Recognition
X. Chen, X. Liu, M.J.F. Gales, P.C. Woodland
Proceedings of ICASSP, 2015

Paraphrastic Recurrent Neural Network Language Models
X. Liu, X. Chen, M.J.F. Gales, P.C. Woodland
Proceedings of ICASSP, 2015

Robust Excitation-based Feature for Automatic Speech Recognition
T. Drugman, Y. Stylianou, L. Chen, X. Chen, M.J.F Gales
Proceedings of ICASSP, 2015

An Initial Investigation of Long-Term Adaptation for Meeting Transcription
X. Chen, M.J.F. Gales and K. Knill et, al.
Proceedings of INTERSPEECH, 2014

Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch
X. Chen, Y. Wang, X. Liu, M.J.F. Gales and P.C. Woodland
Proceedings of INTERSPEECH, 2014

Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
X. Liu, Y. Wang, X. Chen, M.J.F. Gales and P.C. Woodland
In Proceedings of ICASSP, 2014

Impact of Single-Microphone Dereverberation on DNN-based Meeting Transcription Systems
T. Yoshioka, X. Chen, and M.J.F. Gales
Proceedings of ICASSP, 2014

Construction of a Compact Dynamic Decoder Network for Large Vocabulary Continuous Speech Recognition
J. Liu, X. Chen, Y. Shan and Y. Shi
Tsinghua Journal of Chinese Studies, 2012

Fast Language Model Look-ahead Algorithm Using Extended N-gram Model
Y. Shan, X. Chen, Y. Shi and J. Liu
ACTA AUTOMATICA SINICA, 2012

X. Chen, A. Eversol, D. Yu and F. Seide
Pipelined Back-Propagation for Context-Dependent Deep Neural Networks
Proceedings of INTERSPEECH, 2012

An Efficient Layer-wised Beam Pruning Algorithm for Large Vocabulary Continuous Speech Recognition System
X Chen, Y Shan, X Zhang, J Liu
Proceedings of ICALIP, 2012

Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription
F. Seide, G. Li, X. Chen and D. Yu
Proceedings of ASRU, 2011