This tutorial is about the Random Forest classification using the Semi-Automatic Classification Plugin (SCP) for QGIS. It is assumed that one has the basic knowledge of SCP and Basic Tutorials.
Random Forest is a particular machine learning technique, based on the iterative and random creation of decision trees (i.e. a set of rules and conditions that define a class).
WARNING: ESA SNAP is required. The ESA SNAP GPT executable must be defined in External programs settings.
The purpose of the classification is to identify the following land cover classes:
- Water;
- Built-up;
- Vegetation;
- Soil.
The following are the steps of the tutorial:
- Input Data
- Create the ROIs
- Random Forest Classification
Following the video of this tutorial.
1. Input Data
Any raster data can be used with Random Forest. In this tutorial, we are going to use a subset of a Sentinel-2 Satellite image (Copernicus land monitoring services), already converted to reflectance, and use the bands illustrated in the following table.
Sentinel-2 Bands | Central Wavelength [micrometers] | Resolution [meters] |
---|---|---|
Band 2 - Blue | 0.490 | 10 |
Band 3 - Green | 0.560 | 10 |
Band 4 - Red | 0.665 | 10 |
Band 5 - Vegetation Red Edge | 0.705 | 20 |
Band 6 - Vegetation Red Edge | 0.740 | 20 |
Band 7 - Vegetation Red Edge | 0.783 | 20 |
Band 8 - NIR | 0.842 | 10 |
Band 8A - Vegetation Red Edge | 0.865 | 20 |
Band 11 - SWIR | 1.610 | 20 |
Band 12 - SWIR | 2.190 | 20 |
You can download the image from this archive (about 20 MB, © Copernicus Sentinel data 2020 downloaded from https://scihub.copernicus.eu/), and then unzip the downloaded file. The downloaded product is already converted to reflectance and no preprocessing is required in this case.
Start QGIS and the SCP.
Open the tab Band set clicking the button in the SCP menu or the SCP dock. Click the button open_file and open the directory containing the input bands and select all the .tif files. The selected bands will be added to the active band set.
In the table Band set definition order the band names in ascending order (click the button to sort bands by name automatically). Finally, select Sentinel-2 from the list Wavelength quick settings, in order to set automatically the Center wavelength of each band and the Wavelength unit (required for spectral signature calculation).
We can display a Color Composite of bands: Near-Infrared, Red, and Green: in the Working toolbar, click the list RGB= and select the item 7-3-2 (corresponding to the band numbers in Band set). You can see that image colors in the map change according to the selected bands, and vegetation is highlighted in red (if the item 3-2-1 was selected, natural colors would be displayed). This color composite will be useful later for ROI creation.
Now we need to create the Training input in order to collect Training Areas (ROIs).
In the SCP dock select the tab Training input and click the button to create the Training input (define a name such as training.scp). The path of the file is displayed and a vector is added to QGIS layers with the same name as the Training input (in order to prevent data loss, you should not edit this layer using QGIS functions).
2. Create the ROIs
ROIs must be created by manually drawing a polygon. You could also import polygons from a vector file using this tool Import vector.
WARNING: because of compatibility with software SNAP only ROIs defined manually with a polygon will be used for classification; region growing ROIs and spectral signatures will not be used as training input.
We are going to create ROIs defining the Classes and Macroclasses. Each ROI is identified by a Class ID (i.e. C ID), and each ROI is assigned to a land cover class through a Macroclass ID (i.e. MC ID). Thus, we are going to create several ROIs for each macroclass (setting the same MC ID, but assigning a different C ID to every ROI). We are going to use the Macroclass IDs defined in the following table.
Macroclasses
Macroclass name | Macroclass ID |
---|---|
Water | 1 |
Built-up | 2 |
Vegetation | 3 |
Soil | 4 |
Create a few ROIs and save them in the Training input.
Please note that classification previews are not available with Random Forest.
3. Random Forest Classification
The Random Forest tool allows for classifying a Band set using the ROI polygons in the Training input.
Open the tab Random Forest clicking the button in the SCP menu or the SCP dock. In Select input band set we set 1 because we are going to classify the first Band set.
Check Use MC ID in order to use the Macroclass ID code of ROIs.
In Number of training samples enter 5000 as the number of training data (pixels) randomly used to traing the model. You can increase this number if the ROI polygons are very large and cover more than 5000 pixels.
In Number of trees enter 100 as the number of decision trees (a higher number allows for more accurate models, but it also increases the calculation time). Also check Evaluate classifier to report the evaluation of the classifier at the end of the process. You can ignore the option Evaluate feature power set.
TIP : You can save the classifier for later use, for instance classifying a different input band set, by checking Save classifier, and later select Load classifier to open the previously saved classifier; when loading a saved classifier no training input is required and the processing time is reduced.
Now click the button RUN and define the path of the classification output.
Also, a confidence raster is created which assess the reliability (from 0 minimum to 1 maximum) of the model at pixel levels.
We can see several classification errors especially in pixels with low confidence values. If pixels have low confidence values, we need to create new ROIs for these pixels.
The evaluation report allows for assessing the performance of the model (not the accuracy of the whole classification). We can also read the feature importance score, which is the importance of single bands in the Band set definition. For instance, we could try to remove the bands with the lowest score to reduce the computation time and obtaining similar results.
Well done! We have performed a Random Forest classification of a remote sensing image.