ALGORITME PARTITIONING AROUND MEDOID (PAM) DENGAN CALINSKI-HARABASZ INDEX UNTUK CLUSTERING DATA OUTLIER
(Partitioning Around Medoid (PAM) Algorithm with Calinski-Harabasz Index for Clustering Data Outlier)
The process of gathering information from a mathematical pattern in big data to help make decisions is called data mining. Cluster analysis is a multivariate statistical analysis technique that groups observations based on several variables based on the level of similarity. Clustering is a technique in data mining that aims to group data into several clusters. Data objects that have high similarity will be in the same cluster. Outliers data that is different from other data. In statistics, the presence of this outlier will result in data analysis being biased and not reflecting the actual phenomenon. Partitioning Around Medoid (PAM) or K-Medoid is a non-hierarchical-based clustering algorithm. The steps carried out in the PAM algorithm are grouping the data by dividing the data into n groups. Calinski-Harabasz Index is one of the methods used to determine the best number of clusters. The purpose of this study was to examine the PAM algorithm on data containing outliers and the Calinski-Harabasz Index as a method for selecting the best cluster. The results showed that the PAM algorithm and the Calinski-Harabasz Index have good robustness for outlier data.
Keywords: Calinski-Harabasz Index, Clustering, Outlier, PAM