Classification of Underdeveloped Areas in Indonesia Using the SVM and k-NN Algorithms

Harun Al Azies; Gangga Anuraga

doi:10.19184/jid.v22i1.16928

Harun Al Azies Department of Statistics, Faculty of Mathematics and Natural Sciences, PGRI Adi Buana University
Gangga Anuraga Department of Statistics, Faculty of Mathematics and Natural Sciences, PGRI Adi Buana University

DOI: https://doi.org/10.19184/jid.v22i1.16928

Abstract

The determination or classification of underdeveloped areas essentially consists of classifying several observations taking into account existing indicators. The classification method used is K-Nearest Neighbor (k-NN) and Support Vector Machines (SVM). This study aims to analyze the accuracy of the classification between SVM and k-NN algorithms in the classification of underdeveloped areas in Indonesia. The data source used in this study is secondary data obtained from the Central Bureau of Statistics (BPS). The data used are 514 districs and municipalities of Indonesia. After analysis, the conclusion is that there are 122 districs and municipalities that are left behind out of a total of 514 districs and municipalities in Indonesia. The most underdeveloped areas are on the island of Papua, followed by the areas of the islands of Bali and Nusa Tenggara, and Sulawesi. Based on the results of the classification of underdeveloped areas using the method SVM with the kernel RBF has the best results with the parameters C = 1 and γ = 0.05 while the results of the classification of underdeveloped areas using the method k-NN obtains the best results with k = 15 Based on the results of classification of underdeveloped areas using the SVM and the k-NN method, including the level of classification is very good. The two methods compared have the same precision value of 92.2% and can be used to determine the classification of underdeveloped areas.
Keywords: classification, machine learning, supervised learning, underdeveloped areas.

References

Abe S. 2010. Support Vector Machines for Pattern Classification 2nd Edition. London: Springer-Verlag.

Al Azies H. 2017. Analisis Perilaku Hidup Bersih Dan Sehat (PHBS) Rumah Tangga Penderita TB Di Wilayah Pesisir Kota Surabaya Menggunakan Pendekatan Regresi Logistik Biner. [Skripsi, Institut Teknologi Sepuluh Nopember]

Al Azies H, Trishnanti D, Mustikawati EPH. 2019.Comparison of Kernel Support Vector Machine (SVM) in Classification of Human Development Index (HDI), IPTEK Journal of Proceedings Series. 1:53-57.

Aulianita, Rizki. 2016. Komparasi Metode K-Nearest Neighbors dan Support Vector Machine Pada Sentiment Analysis Review Kamera. Journal Speed-Sentra Penelitian Engineering dan Edukasi. 8(3):71-77.

Ayodele, TO. 2010. New Advances in Machine Learning, Yagang Zhang (Ed). London: IntechOpen Limited.
Delgado M, Cernadas E, Barro, S, & AmorimD. 2014. Do we need hundreds of classifiers to solve real world classification problems?.The Journal of Machine Learning Research. 15:3133-3181.

Deng Z, Zhu X, Cheng D, Zong M, Zhang S. 2016. Efficient k-NN classification algorithm for big data. Neurocomputing. 195: 143–148.

Fernanda, J W, Anuraga G, Fahmi, MA. 2019. Risk factor analysis of hypertension with logistic regression and Classification and Regression Tree (CART). In Journal of Physics: Conference Series. 1217(1): 012109.

Gunn S. 1998. Support Vector Machine for Clasification and Regression. Southamton: University of Southampton Institutional Repository.

Guo G., Wang H., Bell D., Bi Y., Greer K. 2003 KNN Model-Based Approach in Classification. In: Meersman R., Tari Z., Schmidt D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science. 2888: 986-996.

James G, Witten D, Hastie T, Tibshirani R. 2013. An introduction to statistical learning: with applications in R. New York:Springer.

Jung M, Niculita O, Skaf Z. 2018. Comparison of different classification algorithms for fault detection and fault isolation in complex systems. Procedia Manufacturing.19:111-118.

Kotsiantis SB. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica. 31:249-268.

Purwandari T, Hidayat Y. 2017. Pemodelan Ketertinggalan Daerah di Indonesia Menggunakan Analisis Diskriminan. Prosiding Konferensi Nasional Penelitian Matematika dan Pembelajarannya (KNPMP). 2: 194-200.

Puspitasari D A, Rustam Z. 2018. Application of SVM-KNN using SVR as feature selection on stock analysis for Indonesia stock exchange. In AIP Conference Proceedings. 2023:020207.

Russel, S. J. dan Norvig, P. (2016), Artificial intelligence: a modern approach, Malaysia; Pearson Education Limited
Shalev-Shwartz S, Ben-David S. (2014). Understanding Machine Learning From Theory to Algorithms.UK: Cambridge University Press.

Smola A, Vishwanathan SVN. 2008. Introduction to machine learning. UK: Cambridge University Press.
Tan PN, Steinbach M, Karpatne A, Kumar V. 2019. Introduction to Data Mining, 2nd Edition. London: Pearson Education, Inc.

Vapnik VN. 1995. The Nature of Statistical Learning Theory (2nd ed.). Springer Verlag.