OLS, LASSO dan PLS Pada data Mengandung Multikolinearitas
Abstract
Correlation between predictor variables (multicollinearity) become a problem in regression analysis. There are some methods to solve the problem and each method has its own complexity. This research aims to explore performance of OLS, LASSO and PLS on data that have correlation between predictor variables. OLS establishes model by minimizing sum square of residual. LASSO minimizes sum square of residual subject to sum of absolute coefficient less than a constant and PLS combine principal component analysis and multiple linear regression. By analyzing simulation and real data using R program, results of this research are that for data with serious multicollinearity (there are high correlations between predictor variables), LASSO tend to have lower bias average than PLS in prediction of response variable. OLS method has the greatest variance of MSEP, that is mostly not consistent in estimating the Mean Square Error Prediction (MSEP). MSEP that is resulted by using PLS is less than that by using LASSO.