Geospatial Approach for the Analysis of Forest Cover Change Detection using Machine Learning

Spatial data classification is famous over recent years in order to extract knowledge and insights into the data. It occurs because vast experimentation was used with various classifiers, and significant improvement was examined in accuracy and performance. This study aimed to analyze forest cover change detection using machine learning. Supervised and unsupervised learning methods were used to analyze spatial data. A Vector machine was used to support the supervised learning, and a neural network method was used to support unsupervised learning. The Normalized Difference Vegetation Index (NDVI) was used to identify the bands and extract pixel information relevant to the vegetation. The supervised method shows better results because of its robust performance and better analysis of spatial data classification using vegetation index. The proposed system experimentation was implemented by analyzing the results obtained from Support Vector Machine (SVM) and NN (Neural Network) methods. It is demonstrated in the results that the use of NDVI mainly enhances the performance and increases the classifier's accuracy to a greater extent.


Introduction
The knowledge of various social, economic, and cultural aspects is considered a correct viewpoint in land management and its planning. In various scenarios, landscape changes did not show up ultimately. Therefore, to give exceptional improvement to the lands, different geographic tools are used such as Geographic Information System and Photogrammetry. The point judges of the criteria of forest management that the population is decreasing in some regions. Therefore, most of the land is covered with trees and bushes, seriously affecting the landscape . To describe the data and gather useful information regarding the earth's surface, geographic information has been considered as a fruitful source. There are various applications involved, such as digital image analysis, analysis, and detection of a change in environmental conditions, science, education.
However, these areas are a source of the right domain to conduct adequate research. Geo-Spatial Approach for the Analysis of Forest Cover Change Detection using Machine Learning.
The spatial data is gathered from satellites that include images and define information regarding the image's pixels. The data collected seems unstructured and complex, and then this data is evaluated to get that hidden information. This process is mainly called spatial data analysis, and to get and relocate the landscape in spatial data is known as Geospatial data analysis. These landscape types are identified and classified by utilizing techniques that involve deep learning and machine learning. SVM is one of the adequate mechanisms used in ML, and in it, the kernel function is activated to conduct the descriptive analysis on the dataset of images. The features are extracted and based on those features, and the machine classifies the landscape types. The main landscape types involved are bare soil, urban land, waterbody, natural vegetation, and forest area. The spatial data gathered was very difficult to cater to, and it involves various critical issues regarding orientation, structure, and other atmospheric conditions (Aubrecht et al., 2009;Addink et al., 2007).
The ongoing writing study on distant detecting information, characterization utilizing AI techniques incorporates the rich data regarding the spatial information focal points, natural biology, accuracy farming, science and building, and military use. Recently, the distant detecting information characterization has been finished utilizing better AI and profound learning approaches (Pozdnoukhov & Kanevski, 2006). The diverse order strategies were utilized on far off detecting information. In high dimensional space, the impediment of dimensionality may yield excellent outcomes. The high dimensional information dealing with is a fundamental task in the enhancement issues (Gangappa et al., 2016). Subsequently, the enhancement technique, such as SVM may regularly be unaware of high dimensional space (Acharya & Yang, 2015). There is numerous grouping calculation which is regularly used to anticipate the class objects in the spatial data.
The regulated learning techniques for spatial information are neural networks, choice tree technique, irregular timberlands order techniques, K-implies bunching and arrangement strategy, harsh set based information decrease and characterization technique, and fuzzy rough set based order method (Singh et al., 2016;Chi et al., 2008;Pal, 2005). The fluffy rationale and neural networks are utilized in spatial information characterization. The adequate measure of research work commitment has been on fluffy and unpleasant based components (Shanthini et al., 2017;Foddy & Mathur, 2004;Al-Obeidat et al., 2015;Ham et al., 2005). Numerous different calculations have been utilized in advancement issues. In the graphic models, the preparation information with class marks was given at preparing a calculation. In request to get prepared, these techniques (Rawat & Kumar, 2015) utilize some spatial information highlights, for example, spatial goal, entropy, mean eleven and mean slop, and other pertinent highlights in the input information. We firmly contend that expectation exactness depends on the noteworthy highlights utilized in that model.
Various machine learning techniques were used to identify different landscape images, and they are considered supervised learning. It involves training data, and the classifier learns from the training data, and other decisions are mainly based on the learning of the classifier. The classifier was done its training based on the features extracted.
Sometimes more the faster features are the classifier's efficiency; however, the feature's amount may affect the classifier performance in some cases. As we can say, the classifier takes more time to validate those features of the dataset gathered, and the performance of the system falls. This may also lead to affect the accuracy of the classifier badly. Therefore, for this purpose and feature extraction, some feature reduction techniques are also being involved in the classification process. Dimensionality reduction is used to make the dimension space more accurate for the spatial data. Ultimately, we emphasize the evaluation of machine learning methods suitable for spatial data pixels. This study aimed to analyze forest cover change detection using machine learning. This research also aimed to find out suitable machine learning techniques that efficiently extract the features, reduce the features according to the demand, and distinguish the features based on the dataset's pixel information.

Supervised and Unsupervised Learning
In supervised learning, labeling is involved, in which there is a specific outcome against each entity. Furthermore, based on those outcomes, the algorithm accuracy is computed. While on the other hand, in an unsupervised approach, unlabeled data is provided in which the patterns are formed based on the features extracted. This study aimed to analyze forest cover change detection using machine learning.

Supervised Learning
Supervised learning contains a properly labeled dataset and then train the algorithm based on this labeled data (Mahmon & Ya'acob, 2014). The term fully labeled means that the training dataset contains answers to each question or query. A complete illustration of labeled data and its supervision through supervised learning is shown in Figure 1. For example, as related to this study, the forest images are labeled according to the years and their specific characteristics, and then after classification, it is to be identified which ones belong to specific families. When the model is fully trained, it is tested on a new set of images, and the models have to predict values against each set of images fed into it. The Support Vector Machine (SVM) is a supervised learning method, and they automatically analyze data, make classes, and put each object into a class by using some rules (Gangappa et al., 2017). In SVM, labeled data is manipulated for all the classes to be the data classified. The following equations (Equation 1 and Equation 2) for the SVM are as follows: Supervised learning is mainly used in two scenarios, one in the classification problems and the other in the regression problems. In classification problems, the prediction of values is made by the classifier in which data is recognized based on class. While in regression problems gather continuous data, and in it, the effect of one variable on the variable is identified, such as for a particular value X, what would be the expected value of the variable Y.

Unsupervised Learning
In contrast to supervised learning, unsupervised learning contains a deep learning model with a set of instructions on what to do next. In the training dataset, no labeling is involved, and the dataset is without any desired outcome. The network automatically gets the useful features and then analyzes the structure based on the features extracted. The illustration of unsupervised learning is given in Figure 2.  (2) Association: In this way, the algorithm tries to learn without the data being labeled. The algorithm takes some different decisions, such as the forest images are not labeled according to their specific characteristics and fed into the classifier. This is a case of an association, in which highlights of an information test associate with different highlights.
By taking a gander at a couple of crucial characteristics of an information point, a solo learning model can foresee different properties with which they are ordinarily related (Maity, 2016). The dataset used in this study was Landsat 8, and it consists of earth images based on two mechanisms, which are Operational Land Imager OLI and Thermal Infrared Sensor TRS (Lu & Yang, 2009). Data were collected near the infrared and panchromatic band. The dataset collected consists of various landscape types such as vegetation area, water area, and bare land. More details of the dataset used are illustrated in Table 1. This section presents a framework for spatial data. It contains essential information regarding handling data, evaluating and validating the data to compute useful results. The complete procedure is illustrated in Figure 3, as given below. First, pre-processing is done to make the instructed data in a structured format. In it, the dataset is considered as input. At first, the images are digitized to represent the intensity of each pixel in the spectral band. Before starting the primary procedure, the raw data need some extra techniques to be used in order to correct errors and avoid noise. The main techniques involved are radiometric correction, geometric correction, and noise removal.
After implying these techniques, the data is aligned to the real-world coordinates. In the next step, the vegetation indices are computed to be implied in different scenarios like climate change, detection, monitoring, and modeling of vegetation studies. This procedure helps in combining the information of different bands, and it is also very fruitful to find NDVI by using the following Equation (Equation 4).

NIR RED NDVI NIR RED
The final step is to get the region of interest of the image shape file is created. The training and testing samples are also extracted for classification. Then the training model is built by using machine learning techniques.

Results and Discussion
When the land data was pre-processed, class labels are generated geometrically, such as water, bare land, and forest. Then region of interest is extracted from the image and is  Land-use change from forest to bare land or rocky land occurred mostly in the northern region (Figure 9-figure 13). These changes are caused by human activities that require wood as building material. Urbanization also causes the human need for forest wood to increase, especially to build new settlements. Furthermore, the accessibility of forest locations is also a factor that causes the conversion of forest land to bare land or rocky land.   Moreover, the FCC is also able to identify bare land in green color. This result is in line with the results reported by Khalil & Haque (2018), Rajani & Varadarajan (2018), Sharma et al. (2016), that that the FCC is also able to identify bare land.

Conclusion
The accuracy tends to increase when a new band formed in the slack of spatial image.
The SVM model performed better than NN method, so we can say that supervised learning increases the system's accuracy and performance compared to unsupervised learning. The classification is performed with the spatial vegetation index related to NDVI. It is also observed that if different vegetation indices are used and evaluated on different datasets, there are great chances that the system's accuracy improves to a large extent. Figure 13. Mean of Land-use/Land Cover and FCC of 2020