作者:Desmon Kristanto Siahaan
作者(英文):Desmon Kristanto Siahaan
論文名稱(英文):Plagiarism Detection of Indonesian Documents by using Cosine Similarity
指導教授(英文):Guan-Ling Lee
口試委員(英文):Yao-Chung Chang
Shou-Chih Lo
關鍵詞:Porter Tala演算相似度偵測餘弦相似度
關鍵詞(英文):Porter Tala AlgorithmPlagiarismCosine Similarity
在學術環境中,研究論文的真實性非常重要。研究人員在發表論文時,必須確認是否與已經發表的論文存在著重覆性,而論文審查人員在審查論文時也必須確認論文是否有抄襲的可能。因此文件相似度的比對是一個很重要的議題,目前,英文文件的相似度比對已經被廣泛地探討,然而鮮少論文探討印尼語文件的相似度比較,在本篇論文中,我們探討了印尼語論文相似度比較的議題,並提出了一有效的演算方法,在方法中,我們利用Porter Tala演算方法將印尼單詞更改為詞根,Porter Tala是由Fadillah Z Tala所提出,針對印尼語單詞找出詞根的著名方法,在找出詞根後,我們利用餘弦相似度計算論文的相似度,實驗結果顯示我們提出的方法能有效地偵測出相似的論文。
In an academic environment, the authenticity of research papers is very important. When a researcher publishes a paper, he must confirm whether there is repetition with the published paper, and the reviewer must also confirm whether the paper may be copied. Therefore, the comparison of document similarity is an important issue. At present, the similarity comparison of English documents has been extensively discussed, but few papers discuss the similarity comparison of Indonesian documents. In this thesis, we discuss the topic of similarity comparison of Indonesian documents and propose an effective algorithm. In the proposed method, we use the Porter Tala algorithm to change the Indonesian word to the root. Porter Tala is a famous method proposed by Fadillah Z Tala to find the roots of Indonesian words. After finding the roots, we use the cosine similarity to calculate the similarity of the documents. The experimental results show that our proposed method can effectively detect similar documents.
Acknowledgment I
Abstract In Chinese II
Abstract In English III
Table of Contents IV
List of Figures VI
List of Tables VII
List of Definition VIII
Chapter 1. Introduction 1
Chapter 2. Related Work 4
2.1 Plagiarism 4
2.1.1 Level of Plagiarism 4
2.1.2 Techniques of Plagiarism 5
2.2 Text Preprocessing 6
2.2.1 Tokenization 7
2.2.2 Stopword 7
2.2.3 Stemming 8
2.2.4 Term Weighting 10
2.3. Measuring Similarity 11
Chapter 3. Proposed Algorithm 14
3.1 Morphological Structure 14
3.2 Porter Tala 19
Chapter 4. Experimental Result 25
Chapter 5. Conclusion and Future Work 29
References 30
* *