帳號:guest(18.116.19.192)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目勘誤回報
作者:黃森洋
作者(英文):SEN-YANG Huang
論文名稱:利用RNA病毒基因體序列的深度學習以預測其宿主
論文名稱(英文):Deep learning of sequences from RNA virus genomes for predicting reservoir hosts
指導教授:張瑞宜
吳建銘
指導教授(英文):RUEI-YI Jhang
JIAN-MING Wu
口試委員:張瑞宜
吳建銘
劉長遠
口試委員(英文):RUEI-YI Jhang
JIAN-MING Wu
CHANG-YUAN Liou
學位類別:碩士
校院名稱:國立東華大學
系所名稱:生命科學系
學號:610813001
出版年(民國):109
畢業學年度:108
語文別:中文
論文頁數:25
關鍵詞:預測病毒宿主深度學習
關鍵詞(英文):Viral host predictionDeep learning
相關次數:
  • 推薦推薦:0
  • 點閱點閱:22
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:2
  • 收藏收藏:0
近年來一波接著一波致命性的新興病毒傳染病一再地出現,許多病因是RNA病毒序列突變或基因重組造成可跨越物種的新興病毒,隨著快速便捷的交通運輸系統,增加病毒傳播的速率,進而形成全球傳染性疾病的大流行。巨量資料(又稱為大數據)與資訊軟體技術的發展,可經由統計分析幫助我們觀察和追蹤病毒與其宿主序列上的關聯,透過人工智慧以預測未來新興病毒可能感染的宿主,用以提早預防傳染性疾病的爆發,這是傳統病毒學研究無法達成的方法,因為一般病毒實驗室絕不可能拿各種生物來執行實際感染的實驗以證明病毒與宿主的關連。然而目前尚未有明確分析序列特徵的方法以預測病毒與宿主的關連,本研究利用深度學習以尋找預測病毒宿主的方法,首先自美國國家生物技術資訊中心(The National Center for Biotechnology Information, NCBI)資料庫選取307筆病毒基因體序列,分別隸屬於十一種不同的RNA病毒屬,其感染的宿主分別包含偶蹄目、雞雁目、鼠目、鳥目、食肉目及靈長目等生物,這些生物都是人類常有可能接觸到的動物,也是眾多病毒的宿主。本研究使用下列方法歸納病毒序列特徵:一、透過不同核苷酸頻率為特徵運用機器學習;二、利用機率式轉移矩陣、次數式稀疏矩陣當作特徵,訓練深層卷積神經網路,預測病毒宿主;三、使用生成式對抗網路(Generative Adversarial Nets)和切割序列,改善小數據的問題。透過這些方法找出病毒與宿主關連性的特徵,並利用已知宿主的病毒做為預測準確率之測試。結果顯示機器學習KNN模型最高準確率為67%,深層卷積神經網路最高準確率為64%,測試結果證明深層卷積神經網路比機器學習穩定且快速。利用本論文建立的模型,可以在短時間內使用低成本的序列分析,預測病毒可能的宿主,本論文之研究方法可做為未來新興感染症防疫之重要參考。
In recent years, serial deadly emerging or re-emerging viral infections occur frequently. Many of these diseases resulted from mutations in RNA sequences or genetic recombination to form new viruses that may cause interspecies transmission. Fast and convenient transportation systems nowadays facilitate dissemination of the viruses leading to pandemic infection. Huge amounts of data (also known as big data) and the development of information software technology provide a new tool to track the relation between virus and its host based on abundant sequence data. Using machine learning to predict the potential hosts and precaution against future outbreak can be taken. However, it is challenging for finding sequence characteristics as a label to determine the relation of a virus to its host. In this study, genome sequences of 307 RNA viruses from the National Center for Biotechnology Information (NCBI) database were collected and used for machine training. These viruses belong to 11 different RNA virus genera that infect Artiodactyla, Carnivore, Galloanserae, Neoaves, Primate, and Rodent, respectively. Several analytical algorithms were used including (i) nucleotide frequencies as features by machine learning, (ii) probabilistic transfer matrix and order sparse matrix as features by Convolutional neural network (CNN), and (iii) using Generative Adversarial Nets (GAN) and splitting sequences to solve the problems of small data. The established methods were used to test the accuracy of prediction. The results showed that K-nearest neighbors algorithm (KNN) model has highest accuracy of 67% and GAN model generates 64% accuracy. Furthermore, deep CNN are more stable and faster than the machine learning. Taken together, analytical algorithms established in this study provide a low-cost method to predict potential viral hosts with high accuracy. This method can be used as an important strategy for prevention future emerging infectious diseases.
第一章 前言 1
1.人工智慧簡介 1
2.機器學習 2
3.深度學習 3
4.生成式對抗網路 GAN (Generative Adversarial Nets) 5
5.預測病毒宿主之相關研究 5
6.研究目的 7
第二章 研究方法 9
1.蒐集RNA病毒數據 9
2.查看核苷酸頻率進行機器學習預測病毒宿主 9
3.利用深度學習預測病毒宿主 13
第三章 結果 17
1.在機器學習中引用統計公式做為特徵並配合WGAN後KNN模型表現最佳 17
2.切割序列擴充訓練資料可以提升模型穩定度 18
3.在深度學習中預測病毒宿主片段的轉移矩陣表現最佳 18
4.機器學習擅長病毒宿主二元分類 19
第四章 討論 21
第五章 文獻 25
第六章 圖表 27
附件 49
1.Han, S.H., et al., Artificial Neural Network: Understanding the Basic Concepts without Mathematics. Dement Neurocogn Disord, 2018. 17(3): p. 83-89.
2.Angermueller, C., et al., Deep learning for computational biology. Mol Syst Biol, 2016. 12(7): p. 878.
3.Ioffe, S. and C. Szegedy Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv e-prints, 2015. arXiv:1502.03167.
4Nitish Srivastava, G.H., Alex Krizhevsky,Ilya Sutskever,Ruslan Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 2014.
5.Goodfellow, I.J., et al. Generative Adversarial Networks. arXiv e-prints, 2014. arXiv:1406.2661.
6.Dolan, P.T., Z.J. Whitfield, and R. Andino, Mechanisms and Concepts in RNA Virus Population Dynamics and Evolution. Annu Rev Virol, 2018. 5(1): p. 69-92.
7.Kao, R.R., et al., Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends Microbiol, 2014. 22(5): p. 282-91.
8.Viana, M., et al., Assembling evidence for identifying reservoirs of infection. Trends Ecol Evol, 2014. 29(5): p. 270-9.
9.Edwards, R.A., et al., Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol Rev, 2016. 40(2): p. 258-72.
10.Pride, D.T., et al., Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics, 2006. 7: p. 8.
11.Liu, D., et al., Predicting virus-host association by Kernelized logistic matrix factorization and similarity network fusion. BMC Bioinformatics, 2019. 20(Suppl 16): p. 594.
12.Zhang, M., et al., Prediction of virus-host infectious association by supervised learning methods. BMC Bioinformatics, 2017. 18(Suppl 3): p. 60.
13.Babayan, S.A., R.J. Orton, and D.G. Streicker, Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science, 2018. 362(6414): p. 577-580.
14.Burge, C., A.M. Campbell, and S. Karlin, Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A, 1992. 89(4): p. 1358-62.
15.Arjovsky, M., S. Chintala, and L. Bottou Wasserstein GAN. arXiv e-prints, 2017. arXiv:1701.07875.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *