Paper Review - NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Review of the Paper: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper Link: arXiv:2403.03100
This paper presents an innovative approach to speech synthesis, specifically by introducing a codec model, FACodec, which disentangles speech representation into different subspaces. FACodec’s architecture and capabilities particularly captured my attention, as it addresses key challenges in speech tokenization and effective speech representation disentanglement.
I presented this paper at CCDS, IUB, where I had thoughtful discussions with lab members and supervisors about its methodology and implications for advancing speech research. For those interested, I’m sharing the presentation slides below.
Presentation Slides: Link to slides





















