CLUSTERING DATA NUMERIK MENGGUNAKAN ALGORITME X-MEANS
(Clustering Numeric Data Using X-Means Algorithm)
Abstract
Data mining is the extraction of new and useful information from large data sets that helps in the decision-making process. Clustering is a technique of grouping data that has similar characteristics into the same cluster. Generally, the Clustering process is used for numeric or categorical data. The K-Means algorithm is one of the algorithms that can be used for numeric type data. The stage carried out in the K-Means algorithm is to divide n observations into k clusters so that each observation is included in the cluster with the closest average (centroid), but K-Means still has a weakness in determining the number of clusters. This must be determined specifically by the user. To overcome the weakness of K-Means, the X-Means algorithm was developed by Dan Pelleg and Andre Moore. In X-Means, the value of k is estimated by inputting a range of clusters based on the dataset itself, so that no specific determination of the number of clusters is needed. The purpose of this study is to examine the X-Means algorithm. The results showed that the division of clusters in the X-Means algorithm used the Bayesian Information Criterion (BIC) value. In the X-Means algorithm, inputting a range of clusters for the number of clusters can make the clustering process more efficient.
Keywords: Clustering, K-Means, numeric data, X-Means.