帳號:guest(18.227.49.42)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目勘誤回報
作者:徐梓恩
作者(英文):Tzu-En Hsu
論文名稱:基於文字合成語音之風格轉換系統
論文名稱(英文):Style Transfer System Base on Speech Synthesis
指導教授:江政欽
指導教授(英文):Cheng-Chin Chiang
口試委員:林信鋒
王明睿
口試委員(英文):Shin-Feng Lin
Morten Wang
學位類別:碩士
校院名稱:國立東華大學
系所名稱:資訊工程學系
學號:610821201
出版年(民國):110
畢業學年度:110
語文別:中文
論文頁數:40
關鍵詞:語音合成語音轉換人機互動
關鍵詞(英文):Speech SynthesisSpeech TransferHuman-Computer Interaction
相關次數:
  • 推薦推薦:0
  • 點閱點閱:27
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:17
  • 收藏收藏:0
現今以人機互動為基礎之裝置已經十分廣泛的出現在生活中。從每日所需要用到的通訊裝置至大型家電用品等,皆可漸漸看到這些裝置的應用。人們依賴人工智慧協助完成許多工作使人工智慧裝置開發的技術日趨成熟,其中語音合成也成為焦點開發項目。各通訊裝置中常見的語音助理皆使用此技術達成回饋使用者之指令,如Apple的Siri語音助理與Google的語音助理;Google的翻譯系統也能透過語音合成技術協助使用者得知所輸入之句子如何閱讀。而隨著這些技術的進步,這些系統也開始嘗試讓使用者能夠挑選想要語音助理閱讀句子時的聲音,如Siri語音助理已經可以實踐更改閱讀語音之性別為男性或女性。本論文以風格轉換的技術實踐指定語者的語音合成語音,不同於上述之Siri和Google語音助理僅能選擇男女之別的語者來進行語音合成;而是可讓使用者自行選擇自定語者的聲音發音,使語音助理的語音回應讓使用者更有親近感。
Nowadays, devices based on human-computer interaction have been widely used in life. From the daily devices such as communication devices to large household appliances and so on, you can gradually see the application of these devices. Due to people rely on artificial intelligence to accomplish lots of works, making the technology of artificial intelligence device development be mature gradually. Among them, speech synthesis has also become the focus of development projects. Common voice assistants in various communication devices all use this technology to achieve commands that give back to users such as Apple's Siri voice assistant and Google's voice assistant. Google's translation system can also help users know how to read the input sentences through speech synthesis technology. With the advancement of these technologies, these systems have also begun to try to allow users to choose the voice they want for the voice assistant when reading sentences. For example, the Siri voice assistant can actually change the gender of reading voice to male or female. The paper uses the technical practice of style conversion to choose the readers of synthesized speech. Unlike the above-mentioned Siri voice assistant, the voices that can be selected are limited to the difference between male and female or the voices provided by the system; instead, users can choose the voice they want to read the voice content. In the same time, using automatic customization selection to achieve the purpose of automation so that users can use the voices they want to make the system achieve a closer human-computer interaction effect.
致謝 I
摘要 III
ABSTRACT V
目錄 VII
圖目錄 IX
表目錄 XI
第1章 緒論 1
1.1. 研究動機與目的 1
1.2. 章節架構 2
第2章 文獻探討 3
2.1. 語音合成相關技術 3
2.2. 語音轉換相關技術 5
2.3. 擬解決之問題與解決方案概述 7
第3章 研究方法 9
3.1. 系統模組架構 9
3.1.1. 架構設計 9
3.1.2. 架構特色 10
3.2. 「文字轉語音」模組設計 12
3.2.1. 系統功能 12
3.2.2. 架構 12
3.2.3. 模型訓練 14
3.3. 「語者語音風格轉換」模組設計 17
3.3.1. 系統功能 17
3.3.2. 架構 17
3.3.3. 訓練模型 21
3.4. 語者多工器設計 24
3.4.1. 系統功能 24
3.4.2. 架構 25
3.4.3. 訓練模型 27
第4章 實驗結果 29
4.1. 語音合成系統結果比較 29
4.2. 風格轉換系統結果比較 31
4.3. 語者多工器結果分析 36
第5章 結論 37
參考文獻 39
[1]Douglas O’Shaughnessy, Louis Barbeau, David Bernardi, and Danièle Archambault,“Diphone Speech Synthesis,”in Speech Communication, Volume 7, Issue 1, Mar. 1988, p. 55-56.
[2]Sneha Lukose, and Savitha S. Upadhya,“Text to Speech Synthesizer-Formant Synthesis,”in IEEE, 2017.
[3]Chang-Shiann Wu, and Yu-Fu Hsieh,“Articulatory Speech Synthesizer,”in ACL Anthology, 2000, p. 345-352.
[4]Robert E. Remez,“Sine-wave Speech,”in E. M. Izhikovitch (Ed.), 2008, pp. 2394.
[5]Paul Taylor, “Text-to-Speech Synthesis,”in Cambridge University Press, New York, NY, USA, 1st edition, 2009.
[6]Serean O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, and Mohammad Shoeybi,“Deep Voice: Real-Time Neural Text-to-Speech,”in arXivL1720.07825v2 [cs.CL], Mar. 2017.
[7]Serean O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou,“Deep Voice 2: Multi-Speaker Neural Text-to-Speech,”in arXiv:1705.08947v2 [cs.CL], Sep. 2017.
[8]Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, and John Miller,“Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning, ”in arXiv:1710.07654v3 [cs.SD], Feb. 2018.
[9]Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, “A Neural Algorithm of Artistic Style,”in arXiv:1508.06576v2 [cs.CV], Sep. 2015.
[10]Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, and Rif A. Saurous, “Tacotron: Towards End-to-End Speech Synthesis,” in arXiv:1703.10135v2 [cs.CL], Apr. 2017.
[11]Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu, “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions,”in arXiv:1712.05884v2 [cs.CL], Feb. 2018.
[12]Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara,“Voice Conversion Through Vector Quantization,”in IEEE Xplore, Agu. 2002.
[13]Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo, “CycleGAN-VC2 : Improved CycleGAN-Base Non-Parallel Voice Conversion,” in arXiv:1904.04631v1 [cs.SD], Apr. 2019.
[14]Takuhiro Kaneko, and Hirokazu Kameoka, “Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks,” in arXiv:1711.11293v2, Dec. 2017.
[15]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Lukasz Kaiser, “Attention is All You Need,”in arXiv:1706.03762v5 [cs.CL], Dec. 2017.
[16]Po-chun Hau, and Hung-yi Lee, “WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU,”in arXiv:2005.07412v3 [eess.AS], Aug. 2020.
[17]Pyan Prenger, Rafael Valle, and Bryan Catanzaro “WaveGlow: A Flow-Base Generative Network for Speech Synthesis,”in arXiv:1811.00002V1 [cs.SD], Oct. 2018.
[18]https://librivox.org/pages/public-domain/
[19]Chen Cai, and Yusu Wang,“A Note on Over-Smoothing for Graph Neural Networks,”in arXiv:2006.13318v1 [cs.LG], Jun. 2020.
[20]https://datashare.ed.ac.uk/handle/10283/3061
[21]https://pypi.org/project/pyworld/
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *