Implementation of K-Means Clustering Method for Trend Analysis of Thesis Topics (Case Study: Faculty of Computer Science, University of Jember)

  • Maulana Rafael Irianto Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Jember, Jln. Kalimantan 37, Jember 68121, Indonesia
  • Achmad Maududie Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Jember, Jln. Kalimantan 37, Jember 68121, Indonesia
  • Fajrin Nurman Arifin Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Jember, Jln. Kalimantan 37, Jember 68121, Indonesia

Abstract

The development of information technology causes a large number of digital documents, especially thesis documents, so that it can create opportunities for students to take the same and not varied topics. Thesis documents can be grouped by topic by identifying the abstract section. The results of the grouping can be seen with the trend with data visualization so that it can be analyzed to find out the trend of each topic. Retrieval of data in the repository of the University of Jember through a web scraping process as many as 490 thesis documents for students of the Faculty of Computer Science, University of Jember. The preprocessing stage is carried out by text mining methods which include cleaning, filtering, stemming, and tokenizing. Then calculate the weight of each word with the Term Frequency - Inverse Document Frequency algorithm, followed by the dimension reduction process using the Principal Component Analysis algorithm, which is normalized by Z-Score first. The outliers removal process is carried out before classifying documents. Furthermore, document grouping uses the K-Means Clustering method with Cosine Similarity as the distance calculation and the Silhouette Coefficient algorithm as a test. The test results were carried out with various k values and the optimal value was obtained at k = 2 with a Silhouette value of 0.80. Then the topic detection uses the Latent Dirichlet Allocation algorithm for each cluster that has been formed. Each cluster is visualized with a line chart and Trend Linear algorithm and analyzed to find out the trend. From the results of the analysis, it can be concluded that the topic of Decision Support System Development is trending down, and the topic of IT Performance Measurement and Forecasting is trending up. It can be concluded that the topic of Decision Support System Development needs to be reduced so that other topics can emerge.

Published
2022-12-10
How to Cite
IRIANTO, Maulana Rafael; MAUDUDIE, Achmad; ARIFIN, Fajrin Nurman. Implementation of K-Means Clustering Method for Trend Analysis of Thesis Topics (Case Study: Faculty of Computer Science, University of Jember). BERKALA SAINSTEK, [S.l.], v. 10, n. 4, p. 210-226, dec. 2022. ISSN 2339-0069. Available at: <https://jurnal.unej.ac.id/index.php/BST/article/view/29524>. Date accessed: 28 mar. 2024. doi: https://doi.org/10.19184/bst.v10i4.29524.
Section
General