|
[1]Douglas O’Shaughnessy, Louis Barbeau, David Bernardi, and Danièle Archambault,“Diphone Speech Synthesis,”in Speech Communication, Volume 7, Issue 1, Mar. 1988, p. 55-56. [2]Sneha Lukose, and Savitha S. Upadhya,“Text to Speech Synthesizer-Formant Synthesis,”in IEEE, 2017. [3]Chang-Shiann Wu, and Yu-Fu Hsieh,“Articulatory Speech Synthesizer,”in ACL Anthology, 2000, p. 345-352. [4]Robert E. Remez,“Sine-wave Speech,”in E. M. Izhikovitch (Ed.), 2008, pp. 2394. [5]Paul Taylor, “Text-to-Speech Synthesis,”in Cambridge University Press, New York, NY, USA, 1st edition, 2009. [6]Serean O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, and Mohammad Shoeybi,“Deep Voice: Real-Time Neural Text-to-Speech,”in arXivL1720.07825v2 [cs.CL], Mar. 2017. [7]Serean O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou,“Deep Voice 2: Multi-Speaker Neural Text-to-Speech,”in arXiv:1705.08947v2 [cs.CL], Sep. 2017. [8]Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, and John Miller,“Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning, ”in arXiv:1710.07654v3 [cs.SD], Feb. 2018. [9]Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, “A Neural Algorithm of Artistic Style,”in arXiv:1508.06576v2 [cs.CV], Sep. 2015. [10]Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, and Rif A. Saurous, “Tacotron: Towards End-to-End Speech Synthesis,” in arXiv:1703.10135v2 [cs.CL], Apr. 2017. [11]Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions,”in arXiv:1712.05884v2 [cs.CL], Feb. 2018. [12]Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara,“Voice Conversion Through Vector Quantization,”in IEEE Xplore, Agu. 2002. [13]Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo, “CycleGAN-VC2 : Improved CycleGAN-Base Non-Parallel Voice Conversion,” in arXiv:1904.04631v1 [cs.SD], Apr. 2019. [14]Takuhiro Kaneko, and Hirokazu Kameoka, “Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks,” in arXiv:1711.11293v2, Dec. 2017. [15]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Lukasz Kaiser, “Attention is All You Need,”in arXiv:1706.03762v5 [cs.CL], Dec. 2017. [16]Po-chun Hau, and Hung-yi Lee, “WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU,”in arXiv:2005.07412v3 [eess.AS], Aug. 2020. [17]Pyan Prenger, Rafael Valle, and Bryan Catanzaro “WaveGlow: A Flow-Base Generative Network for Speech Synthesis,”in arXiv:1811.00002V1 [cs.SD], Oct. 2018. [18]https://librivox.org/pages/public-domain/ [19]Chen Cai, and Yusu Wang,“A Note on Over-Smoothing for Graph Neural Networks,”in arXiv:2006.13318v1 [cs.LG], Jun. 2020. [20]https://datashare.ed.ac.uk/handle/10283/3061 [21]https://pypi.org/project/pyworld/
|