1. Input Data

Any raster data can be used with Random Forest. In this tutorial, we are going to use a subset of a Sentinel-2 Satellite image (Copernicus land monitoring services), already converted to reflectance, and use the bands illustrated in the following table.

Sentinel-2 BandsCentral Wavelength [micrometers]Resolution [meters]
Band 2 - Blue0.49010
Band 3 - Green0.56010
Band 4 - Red0.66510
Band 5 - Vegetation Red Edge0.70520
Band 6 - Vegetation Red Edge0.74020
Band 7 - Vegetation Red Edge0.78320
Band 8 - NIR0.84210
Band 8A - Vegetation Red Edge0.86520
Band 11 - SWIR1.61020
Band 12 - SWIR2.19020

You can download the image from this archive (about 20 MB, © Copernicus Sentinel data 2020 downloaded from https://scihub.copernicus.eu/), and then unzip the downloaded file. The downloaded product is already converted to reflectance and no preprocessing is required in this case.

Start QGIS and the SCP.

Open the tab Band set clicking the button in the SCP menu or the SCP dock. Click the button open_file and open the directory containing the input bands and select all the .tif files. The selected bands will be added to the active band set.

In the table Band set definition order the band names in ascending order (click the button to sort bands by name automatically). Finally, select Sentinel-2 from the list Wavelength quick settings, in order to set automatically the Center wavelength of each band and the Wavelength unit (required for spectral signature calculation).

Band set

We can display a Color Composite of bands: Near-Infrared, Red, and Green: in the Working toolbar, click the list RGB= and select the item 7-3-2 (corresponding to the band numbers in Band set). You can see that image colors in the map change according to the selected bands, and vegetation is highlighted in red (if the item 3-2-1 was selected, natural colors would be displayed). This color composite will be useful later for ROI creation.

Color composite RGB=7-3-2

Now we need to create the Training input in order to collect Training Areas (ROIs).

In the SCP dock select the tab Training input and click the button to create the Training input (define a name such as training.scp). The path of the file is displayed and a vector is added to QGIS layers with the same name as the Training input (in order to prevent data loss, you should not edit this layer using QGIS functions).

Definition of Training input in SCP

2. Create the ROIs

ROIs must be created by manually drawing a polygon. You could also import polygons from a vector file using this tool Import vector.

WARNING: because of compatibility with software SNAP only ROIs defined manually with a polygon will be used for classification; region growing ROIs and spectral signatures will not be used as training input.

We are going to create ROIs defining the Classes and Macroclasses. Each ROI is identified by a Class ID (i.e. C ID), and each ROI is assigned to a land cover class through a Macroclass ID (i.e. MC ID). Thus, we are going to create several ROIs for each macroclass (setting the same MC ID, but assigning a different C ID to every ROI). We are going to use the Macroclass IDs defined in the following table.

Macroclasses
Macroclass nameMacroclass ID
Water1
Built-up2
Vegetation3
Soil4

Create a few ROIs and save them in the Training input.

Created ROIs

Please note that classification previews are not available with Random Forest.

3. Random Forest Classification

The Random Forest tool allows for classifying a Band set using the ROI polygons in the Training input.

Open the tab Random Forest clicking the button in the SCP menu or the SCP dock. In Select input band set we set 1 because we are going to classify the first Band set.

Check Use MC ID in order to use the Macroclass ID code of ROIs.

In Number of training samples enter 5000 as the number of training data (pixels) randomly used to traing the model. You can increase this number if the ROI polygons are very large and cover more than 5000 pixels.

In Number of trees enter 100 as the number of decision trees (a higher number allows for more accurate models, but it also increases the calculation time). Also check Evaluate classifier to report the evaluation of the classifier at the end of the process. You can ignore the option Evaluate feature power set.

TIP : You can save the classifier for later use, for instance classifying a different input band set, by checking Save classifier, and later select Load classifier to open the previously saved classifier; when loading a saved classifier no training input is required and the processing time is reduced.

Random Forest tool

Now click the button RUN and define the path of the classification output.

Random Forest classification

Also, a confidence raster is created which assess the reliability (from 0 minimum to 1 maximum) of the model at pixel levels.

We can see several classification errors especially in pixels with low confidence values. If pixels have low confidence values, we need to create new ROIs for these pixels.

Random Forest confidence

The evaluation report allows for assessing the performance of the model (not the accuracy of the whole classification). We can also read the feature importance score, which is the importance of single bands in the Band set definition. For instance, we could try to remove the bands with the lowest score to reduce the computation time and obtaining similar results.

Random Forest evaluation

Well done! We have performed a Random Forest classification of a remote sensing image.