深度學習在原住民語機器翻譯之研究__國立東華大學博碩士論文全文影像系統

帳號：guest(18.217.98.175) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者:	劉景恆
作者(英文):	Chin-Heng Liu
論文名稱:	深度學習在原住民語機器翻譯之研究
論文名稱(英文):	A Study on Taiwanese Indigenous Languages Machine Translation by Deep Learning
指導教授:	顏士淨
指導教授(英文):	Shi-Jim Yen
口試委員:	江政欽林紋正
口試委員(英文):	Cheng-Chin Chiang Wen-Cheng Lin
學位類別:	碩士
校院名稱:	國立東華大學
系所名稱:	資訊工程學系
學號:	610521228
出版年(民國):	112
畢業學年度:	106
語文別:	中文
論文頁數:	26
關鍵詞:	台灣原住民、阿美族、泰雅族、布農族、深度學習、機器翻譯
關鍵詞(英文):	Taiwanese Indigenous Peoples、Amis、Atayal、Bunun、Deep Learning、Machine Translation
相關次數:	推薦:0 點閱:47 評分: 下載:2 收藏:0

語言是世界文明發展不可或缺的重要媒介，然而現今許多少數民族語言卻逐漸消逝，成為了瀕危語言（endangered languages）[1]。根據中華民國原住民族委員會於2016年所公布的調查報告[2]顯示，台灣原住民（Taiwanese indigenous peoples）的年齡層越低，族語（indigenous languages）的使用比率有越低的現象，呈現出族語流失的潛在危機。

本研究選擇台灣原住民族中，人口數較多的幾個族群[3]——阿美族、泰雅族，以及布農族，做出一套「原住民語翻譯系統」，利用深度學習（Deep Learning）技術，能夠將族語讀入後，透過機器翻譯（Machine Translation），自動轉換為國語。期許能藉此加強推廣原住民語，進而增進族語的傳承。

Language is an indispensable medium for the development of world civilization, but nowadays many minority languages have gradually disappeared and become endangered languages. According to a survey published in 2016 by the Council of Indigenous Peoples of the Republic of China, the lower the average age of Taiwanese indigenous peoples, the lower the vitality of the indigenous languages, which presents a potential crisis of indigenous language loss.

In this study, we choose several Taiwanese indigenous ethnic groups with a larger population, the Amis, the Atayal, and the Bunun, building a "Taiwanese Indigenous Languages Translation System." Using the Deep Learning technology, it is able to convert indigenous languages to Mandarin automatically through the Machine Translation after reading the languages. We hope that we can promote the learning of Taiwanese indigenous languages and enhance the inheritance of endangered languages.

第一章緒論 .................................................. 1
第一節研究背景 .......................................... 1
第二節研究動機 .......................................... 2
第三節論文概述 .......................................... 4
第二章 Sequence to Sequence簡介 ......... 5
第一節傳統機器翻譯 vs. Seq2seq ............ 5
第二節 Sequential model .......................... 6
第三節 Encoder & Decoder ..................... .. 6
第三章資料來源與蒐集 ............................... 7
第一節語料來源 ........................................... 7
第二節族語E樂園 ......................................... 7
第三節原住民族語言線上詞典 ................... 8
第四節收集方法——網路爬蟲 .................... 9
第四章研究方法與步驟 ...............................11
第一節研究環境 .......................................... 11
第二節模型架構 .......................................... 11
第三節資料前處理 ...................................... 12
第四節優化方法與損失函數 ...................... 13
第五節初期實驗階段 .................................. 14
第六節資料擴增 .......................................... 15
第七節分詞 .................................................. 16
第八節重新測試資料 .................................. 17
第九節反向測試 .......................................... 18
第十節語言種類辨識 .................................. 18
第五章研究成果與討論 .............................. 19
第一節 BLEU ................................................ 19
第二節整合至圖形化介面 .......................... 21
第六章結論與未來展望 .............................. 23
參考文獻 ....................................................... 25

[1] 瀕危語言（Endangered_language）. 擷取自 https://en.wikipedia.org/wiki/Endangered_language

[2] 原住民族委員會主辦, & 世新大學承辦. (2016). 原住民族語言調查研究三年實施計畫第 3 期調查研究報告摘要. 新北市: 原住民族委員會.

[3] 原住民人口數統計資料. 擷取自 https://www.apc.gov.tw/portal/docList.html?CID=940F9579765AC6A0

[4] 南島語系（Austronesian languages）. 擷取自 wikipedia: https://en.wikipedia.org/wiki/Austronesian_languages

[5] 原住民族語言發展法總說明. 擷取自 https://law.apc.gov.tw/LawContent.aspx?id=GL0003

[6] 聯合國教科文組織UNESCO世界瀕危語言地圖網站對台灣原住民族語活力的評估. 擷取自 https://web.alcd.tw/uploads/2017/12/03/e951b572a213fea40f8b2f75f4d9db42.pdf

[7] UNESCO Atlas of the World‘s Languages in Danger. 擷取自 http://www.unesco.org/languages-atlas/

[8] 機器翻譯傳奇. 擷取自 https://web.archive.org/web/20070711175213/http://blog.cnfol.com/creative/articles/175678.html

[9] I. Sutskever, O. Vinyals, and Q. V. Le. (2014). Sequence to sequence learning with neural networks. In NIPS Advances in Neural Information Processing Systems 27.

[10] AI球評——Seq2seq模型應用筆記(PyTorch + Python3). 擷取自 https://medium.com/@gau820827/教電腦寫作-ai球評-seq2seq模型應用筆記-pytorch-python3-31e853573dd0

[11] 族語E樂園. 擷取自 http://web.klokah.tw

[12] 原住民族語言線上詞典. 擷取自 https://e-dictionary.apc.gov.tw

[13] Python x 網路爬蟲. 擷取自 https://medium.com/dualcores-studio/python-x-網路爬蟲-c30ffda0ad78

[14] keras-team. keras-team. 擷取自 https://github.com/keras-team/keras/tree/master/examples

[15] 數據處理——One-Hot Encoding. 擷取自 https://blog.csdn.net/google19890102/article/details/44039761

[16] 深度學習筆記：優化方法總結. 擷取自 https://blog.csdn.net/u014595019/article/details/52989301

[17] keras-team. 損失函數loss. 擷取自 https://keras.io/losses/

[18] 結巴中文斷詞台灣繁體版本 . 擷取自 https://github.com/ldkrsi/jieba-zh_TW

[19] K. Papineni; S. Roukos; T. Ward; W. J. Zhu. (2002). BLEU: a method for automatic evaluation of machine translation. In ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

[20] Google Seq2Seq. 擷取自 https://google.github.io/seq2seq

01.pdf

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文