From GIS to Remote Sensing: Land Cover Classification of Cropland: a Tutorial Using the Semi-Automatic Classification Plugin for QGIS

In this post, I am presenting you a tutorial for the land cover classification of cropland. In particular, we are going to classify a Landsat image acquired over the US state of Kansas, near the city of Ulysses, using the new version 2.3.2 of the Semi-Automatic Classification Plugin for QGIS, which allows for supervised classifications (an updated tutorial is available here).

Before the tutorial, please watch the following video that illustrates the study area and provides very useful information for the interpretation of the image in false colors (footage courtesy of European Space Agency/ESA). Also, a brief description of the area that we are going to classify is available here.

As shown in the previous video, the study area is mainly covered by cropland, and we are going to classify a Landsat 8 image acquired in April 2013 (available from the U.S. Geological Survey), using a semi-automatic approach (i.e. the entire image is classified automatically basing on several samples of pixels "ROIs" that are used as reference classes).
Before the classification process, we are going to convert the DN values to reflectance. The reflectance is defined as the ratio between reflected and incident energy on a surface (see here for more information about the conversion to reflectance of Landsat images).

The atmospheric correction of reflectance values is useful during the pre processing phase, because the electromagnetic energy measured by a satellite is affected by the atmospheric effects (e.g. absorption or scattering). Therefore, image conversion to reflectance allows for the comparison and mosaic of different satellite images (e.g. Landsat 5 and Landsat 8), and improves the classification results.

The Semi-automatic Classification Plugin provides a simple correction using the DOS1 method (a brief explanation about image conversion to reflectance and DOS1 atmospheric correction is illustrated in the yellow frame at the end of this post). It is worth mentioning that GRASS GIS has also a tool for the calculation of the Top Of Atmosphere reflectance (see i.landsat.toar) and other DOS methods.

First, download the image from here. The image is a subset of the entire scene, including the following Landsat bands (each band is a single 16 bit raster):

2 - Blue;

3 - Green;

4 - Red;

5 - Near-Infrared;

6 - Short Wavelength Infrared 1;

7 - Short Wavelength Infrared 2.

The following workflow illustrates the main phases of the semi-automatic classification:

1. Conversion of raster bands from DN to reflectance

This phase is part of image pre processing, which enhance the input images by converting the DN (i.e. Digital Numbers) to the physical measure of Top Of Atmosphere reflectance (TOA). Also, we are going to apply a simple atmospheric correction using the DOS1 method (Dark Object Subtraction 1), which is an image-based technique.

As an example, the following images illustrate three charts of different spectral signatures (water, vegetation, and bare soil) calculated with the original image (DN), the TOA reflectance image and the DOS1 corrected reflectance image. As you can see, the DOS1 correction modifies the reflectance values, especially in the blue band. Notice that the band number in the following images is the numerical order of used bands, thus for example: band 1 of the chart is band 2 of Landsat 8 (i.e. blue band).

Original DN spectral signature

TOA spectral signature

DOS1 corrected spectral signature

Open QGIS and start the Semi-Automatic Classification Plugin; in the main interface select the tab Pre processing > Landsat;
Select the directory that contains the Landsat bands (and also the required metafile MTL.txt), and select the output directory where converted bands are saved;
Check the option Apply DOS1 atmospheric correction, and click Perform conversion to convert Landsat bands to reflectance;
At the end of the process, converted bands are loaded in QGIS.

2. Definition of the classification inputs

Now, we create a color composite of the image. In particular, the composite RGB = 543 (that is the equivalent of RGB = 432 for Landsat 7) is useful for the interpretation of cropland in the image (as shown in the previous video by ESA), because of healthy vegetation reflects a large part of the incident light in the near-infrared wavelength.
In order to create the color composite in QGIS, we create a virtual raster. Then, we define the band set that we are going to classify. Finally, we create a training shapefile which stores the ROIs (i.e. Regions Of Interest) that define the land cover classes.

From the menu Raster select Miscellaneous > Build Virtual Raster (catalog); click the button Select... and select the bands 3, 4, and 5; select the output file (for instance rgb.vrt); check Separate (bands will be separated) and click OK;

In the plugin dock ROI creation click the button Band set beside Select an image; the tab Band set will open; click the button Select All, then Add rasters to set (order the band names in ascending order, from top to bottom using ? and ? arrows);

In order to create the training shapefile (a polygon layer having several attribute fields that will store ROIs), in the dock ROI creation click the button New shapefile, and select where to save the shapefile (for instance ROI.shp).

3. Creation of the ROIs

We need to create several ROI, considering the spectral variability of land cover classes. It is important that ROIs represent homogeneous areas of the image, therefore we are going to draw the ROIs using a region growing process (i.e. a segmentation of the image, grouping similar pixels).
The following are the land cover classes that we are going to identify in the image:

crop (e.g. fields with green vegetation)
low vegetation (e.g. fields without green vegetation, or shrubland)
built-up (e.g. artificial areas, buildings and asphalt)
farms (e.g. farm areas)
bare soil (e.g. soil without vegetation)
water (e.g. surface water)

As you can see in the color composite, the color of cropland can vary from dark red to light red. We should create several ROIs for each of these colors.

In order to create a ROI, in the dock ROI creation click the button + beside Create a ROI and then click any pixel of the image; zoom in the map and click on a red pixel of a circular field; after a few seconds the ROI polygon will appear over the image (a semitransparent orange polygon);

Under ROI definition type a brief description of the ROI inside the field Information (e.g. crop; notice that descriptions are not used during the classification process, but are useful for distinguishing ROIs); the field ID (i.e. identifier of the class) is used as reference for the land cover classification, therefore it is important that each category has a unique ID value (ROIs sharing the same ID are evaluated as a single ROI); since we need more than one ROI per class, set the ID = 11, so that the first figure represents the class, and second one the ROI number (other crop ROIs will have values from 11 to 19; in the same way we set ID from 21 to 29 for low vegetation, and so on);
In order to save the ROI to the training shapefile click the button Save ROI to shapefile;
In the plugin main interface, select the tab ROI tools > Spectral signature, which displays the plots of selected ROIs, and select the item crop; if the checkox Plot s is checked, then the plot will display the standard deviation of each ROI.

Repeat the above steps for every class (in order to get the desired ROI, change the values for Minimum ROI size and Range radius); after the collection of several ROIs, it is useful to visualize the ROI spectral signatures in order to avoid the collection of ROIs that are spectrally similar. Following, some examples of ROIs for the land cover classes.

You can download the final ROI shapefile (in which I saved 14 ROIs) from here.

4. Classification of the study area

It is useful to perform classification previews, in order to assess the quality of classification (for example a common problem is soil classified as built-up and vice versa). If the result of the preview is not good, then we need to go back to the phase of ROI creation, delete the faulty ROIs and/or create new ones.

The classification can be performed through several algorithms; this time we are going to use the Spectral Angle Mapping classification.

In the Classification dock, under Classification preview set Size = 500 (i.e. the side of the classification preview in pixel unit); click the button + and then click on the image; after a few seconds, the classification preview will be loaded (with default colors);

For a better interpretation of classes, it is useful to assign a classification color to each class.
Also, we can save a visualization style in QGIS, and set this .qml file as the default style for every classification or preview.

In the panel Layers, left click on the classification preview and select Properties > Style; select the render type Singleband pseudocolor and change the colors and labels of classes, according to the training ROIs; then, click Save style ... to save the .qml file (e.g. classification.qml);

Under Classification style click the button Select qml to select the file classification.qml; thus the next classification or preview will be loaded with this style;

When we are satisfied with the result, we can perform the final classification (the main output of a classification is a raster file .tif).

click the button Perform classification and select where to save the output (e.g. classification.tif).

Here you can download the final land cover classification.

5. Calculation of classification accuracy

It is useful (and necessary) to assess the accuracy of land cover classification, in order to understand the reliability thereof, and to identify map errors.
Please, notice that this classification is only for demonstration purposes, because an accurate land cover classification would require ancillary data and field survey.
However, in order to assess the classification accuracy, we can compare the land cover classification to the ROIs we have created (theoretically, image pixels belonging to a certain ROI should be classified as that ROI's class).

Select the tab Post processing > Accuracy of the plugin main interface; select the classification.tif beside Select a classification to assess and select the ROI shapefile beside Select the reference shapefile; then click the button Calculate error matrix and the matrix will be displayed; you can save the error matrix by clicking the button Save error matrix to file;

Generally, classification accuracy is considered good if it is > 80%. For this classification, we can see that the overall accuracy of the classification is about 91.4% (the error matrix is reported below). Also, it is useful to consider the error for single classes (i.e. omission and commission).

Error Matrix

Panel #1 of 2

     MAP1

     cat# 11 12 21 22 23 24 25 31 32 

 M    11 1068 0 0 0 0 0 0 0 0 

 A    12 36 748 0 0 0 5 0 0 1 

 P    21 0 0 568 98 20 0 1 54 19 

 2    22 0 0 5 6074 0 0 0 17 6 

      23 0 0 0 0 296 0 0 0 2 

      24 3 7 0 0 0 275 2 1 10 

      25 0 0 0 3 0 1 85 36 61 

      31 0 0 0 0 0 0 0 105 39 

      32 0 0 0 0 4 0 2 40 168 

      33 0 0 0 0 0 0 0 203 5 

      41 0 0 0 0 0 0 0 0 0 

      51 0 0 0 1 0 0 0 39 1 

      52 0 0 6 1 0 0 4 96 42 

      61 0 0 0 0 0 0 0 2 0 

Col Sum  1107 755 579 6177 320 281 94 593 354 

Panel #2 of 2

     MAP1

     cat# 33 41 51 52 61 Row Sum

 M    11 0 0 0 0 0 1068

 A    12 0 0 0 0 0 790

 P    21 0 4 26 0 0 790

 2    22 0 3 0 0 0 6105

      23 0 0 0 0 0 298

      24 0 3 0 0 0 301

      25 0 0 0 1 0 187

      31 9 0 0 0 0 153

      32 0 0 0 3 0 217

      33 69 0 0 0 23 300

      41 0 233 0 0 0 233

      51 3 0 188 0 0 232

      52 0 0 0 135 0 284

      61 0 0 0 0 116 118

Col Sum  81 243 214 139 139 11076

Cats % Commission % Ommission Estimated Kappa

11 0.000000 3.523035 1.000000

12 5.316456 0.927152 0.942946

21 28.101266 1.899827 0.703487

22 0.507781 1.667476 0.988520

23 0.671141 7.500000 0.993089

24 8.637874 2.135231 0.911373

25 54.545455 9.574468 0.449877

31 31.372549 82.293423 0.668528

32 22.580645 52.542373 0.766738

33 77.000000 14.814815 0.224327

41 0.000000 4.115226 1.000000

51 18.965517 12.149533 0.806608

52 52.464789 2.877698 0.468684

61 1.694915 16.546763 0.982835

Kappa  Kappa Variance

0.872230 0.000014

Obs Correct Total Obs % Observed Correct

10128  11076  91.440953

6. Calculation of classification statistics

The classes of the final classification are more than the main land cover classes we have defined at step 3 (this is because we needed to assign diverse IDs also to ROIs belonging to the same class).
In order to estimate the cropland in the area, we are going to reclassify the classification raster in fewer classes, according to the main land cover classes.

From the Processing toolbox navigate to SAGA > Grid - Tools > Reclassify Grid Values; in the tool window, under Grid select classification.tif; under method select [2] simple table;

Under Lookup table click the button ... to open the window Fixed Table; the lookup table has 3 columns: minimum, maximum, and new. Minimum and maximum define the reclassification range for the new value, according to the following table.

minimum	maximum	new
0	0	0	Unclassified
11	19	1	Crop
21	29	2	Low vegetation
31	39	3	Built-up
41	49	4	Farm
51	59	5	Bare soil
61	69	6	Water

Click "OK" and after a few seconds the reclassified classification will be loaded.

Now we can perform the report of the classification, which provides the percentage and the area of land cover classes.

Select the tab Post processing > Classification report of the plugin main interface; select Reclassified grid beside Select a classification and click Calculate classification report; after a few seconds the report will be displayed (see the following table).

Class	PixelSum	Percentage %	Area [metre^2]
1	820502	10.362	738451800
2	6924277	87.446	6231849300
3	19916	0.252	17924400
4	15405	0.195	13864500
5	137209	1.733	123488100
6	1055	0.013	949500
TOTAL	7918364	100	7126527600

From these results, we can see that in the study area (at the moment of image acquisition) there are about 738 square kilometers of crop class (class 1, about 10% of the study area) and 6231 square kilometers of low vegetation class (class 2, about 87%). Also, we can notice that about 14 square kilometers are occupied by farms (class 4) and about 18% is built-up (class 3).
Of course, these numbers are the result of a tutorial for demonstration purposes; for a better estimation of class areas (and possibly the identification of the crop type), we need field data which can improve the creation of ROIs and help for the assessment of classification accuracy.
However, the aim of this tutorial is to show how land cover classifications can be performed simply and rapidly using open source programs, which are useful for several activities, such as land cover monitoring or precision farming.

The following is the video of this tutorial.

Landsat image conversion to reflectance and DOS1 atmospheric correction

Radiance is the “flux of energy (primarily irradiant or incident energy) per solid angle leaving a unit surface area in a given direction”, “Radiance is what is measured at the sensor and is somewhat dependent on reflectance” (NASA, 2011, p. 47).

The Spectral Radiance at the sensor's aperture (L? ) is measured in [watts/(meter squared * ster * µm)] and for Landsat images it is given by (https://landsat.usgs.gov/Landsat8_Using_Product.php):

L? = ML * Qcal + AL

where:
- ML = Band-specific multiplicative rescaling factor from Landsat metadata (RADIANCE_MULT_BAND_x, where x is the band number)
- AL = Band-specific additive rescaling factor from Landsat metadata (RADIANCE_ADD_BAND_x, where x is the band number)
- Qcal = Quantized and calibrated standard product pixel values (DN)

“For relatively clear Landsat scenes, a reduction in between-scene variability can be achieved through a normalization for solar irradiance by converting spectral radiance, as calculated above, to planetary reflectance or albedo. This combined surface and atmospheric reflectance of the Earth is computed with the following formula” (NASA, 2011, p. 119):

?p = (p * L? * d^2 )/ (ESUN? *cos?s)

where:
- ?p = Unitless planetary reflectance, which is “the ratio of reflected versus total power energy” (NASA, 2011, p. 47)
- L? = Spectral radiance at the sensor's aperture (at-satellite radiance)
- d = Earth-Sun distance in astronomical units (provided with Landsat 8 metafile, and an excel file is available from
http://landsathandbook.gsfc.nasa.gov/excel_docs/d.xls)
- ESUN? = Mean solar exo-atmospheric irradiances
- ?s = Solar zenith angle in degrees, which is equal to ?s = 90° - ?e where ?e is the Sun elevation

It is worth pointing out that Landsat 8 images are provided with band-specific rescaling factors that allow for the direct conversion from DN to TOA reflectance. However, the effects of the atmosphere (i.e. a disturbance on the reflectance that varies with the wavelenght) should be considered in order to measure the refletance at the ground. As described by Moran et al. (1992), the land surface reflectance (?) is:

? = [p * (L? - Lp) * d^2]/ [Tv * ( (ESUN? * cos?s * Tz ) + Edown )]

where:
- Lp is the path radiance
- Tv is the atmospheric transmittance in the viewing direction
- Tz is the atmospheric transmittance in the illumination direction
- Edown is the downwelling diffuse irradiance

Therefore, we need several atmospheric measurements in order to calculate ? (physically-based corrections). Alternatively, it is possible to use image-based techniques for the calculation of these parameters, without in-situ measurements during image acquisition.
The Dark Object Subtraction (DOS) is a family of image-based atmospheric corrections.
Chavez (1996) explains that “the basic assumption is that within the image some pixels are in complete shadow and their radiances received at the satellite are due to atmospheric scattering (path radiance). This assumption is combined with the fact that very few targets on the Earth's surface are absolute black, so an assumed one-percent minimum reflectance is better than zero percent”. It is worth pointing out that the accuracy of image-based techniques is generally lower than physically-based corrections, but they are very useful when no atmospheric measurements are available as they can improve the estimation of land surface reflectance.
The path radiance is given by (Sobrino, et al., 2004):

Lp = Lmin – LDO1%

where:
- Lmin = “radiance that corresponds to a digital count value for which the sum of all the pixels with digital counts lower or equal to this value is equal to the 0.01% of all the pixels from the image considered” (Sobrino, et al., 2004, p. 437), therefore the radiance obtained with that digital count value (DNmin)
- LDO1% = radiance of Dark Object, assumed to have a reflectance value of 0.01

Therfore for Landsat images:

Lmin = ML * DNmin + AL

The radiance of Dark Object is given by (Sobrino, et al., 2004):

LDO1% = 0.01* [(ESUN? * cos?s * Tz ) + Edown] * Tv / (p * d^2)

Therefore the path radiance is:

Lp = ML* DNmin + AL – 0.01* [(ESUN? * cos?s * Tz ) + Edown] * Tv / (p * d^2)

There are several DOS techniques (e.g. DOS1, DOS2, DOS3, DOS4), based on different assumption about Tv, Tz , and Edown .
The simplest technique is the DOS1, where the following assumptions are made (Moran et al., 1992):
- Tv = 1
- Tz = 1
- Edown = 0
Therefore the path radiance is:

Lp = ML* DNmin + AL – 0.01* ESUN? * cos?s / (p * d^2)

And the resulting land surface reflectance is given by:

? = [p * (L? - Lp) * d^2]/ (ESUN? * cos?s)

ESUN [W/(m2 * µm)] values for Landsat sensors are provided in the following table.

Band	Landsat 4*	Landsat 5**	Landsat 7**
1	1957	1983	1997
2	1825	1769	1812
3	1557	1536	1533
4	1033	1031	1039
5	214.9	220	230.8
7	80.72	83.44	84.90

* from Chander & Markham (2003)
** from Finn, et al. (2012)
For Landsat 8, ESUN can be calculated as (from http://grass.osgeo.org/grass65/manuals/i.landsat.toar.html):

ESUN = (p *d2) * RADIANCE_MAXIMUM / REFLECTANCE_MAXIMUM

where RADIANCE_MAXIMUM and REFLECTANCE_MAXIMUM are provided by image metadata.

References
-Chander, G. & Markham, B. 2003. Revised Landsat-5 TM radiometric calibration procedures and postcalibration dynamic ranges Geoscience and Remote Sensing, IEEE Transactions on, 41, 2674 – 2677
-Chavez, P. S. 1996. Image-Based Atmospheric Corrections - Revisited and Improved Photogrammetric Engineering and Remote Sensing, [Falls Church, Va.] American Society of Photogrammetry, 62, 1025-1036
-Finn, M.P., Reed, M.D, and Yamamoto, K.H. 2012. A Straight Forward Guide for Processing Radiance and Reflectance for EO-1 ALI, Landsat 5 TM, Landsat 7 ETM+, and ASTER. Unpublished Report from USGS/Center of Excellence for Geospatial Information Science, 8 p
http://cegis.usgs.gov/soil_moisture/pdf/A%20Straight%20Forward%20guide%20for%20Processing%20Radiance%20and%20Reflectance_V_24Jul12.pdf
-Moran, M.; Jackson, R.; Slater, P. & Teillet, P. 1992. Evaluation of simplified procedures for retrieval of land surface reflectance factors from satellite sensor output Remote Sensing of Environment, 41, 169-184
-NASA (Ed.) 2011. Landsat 7 Science Data Users Handbook Landsat Project Science Office at NASA's Goddard Space Flight Center in Greenbelt, 186
-Sobrino, J.; Jiménez-Muñoz, J. C. & Paolini, L. 2004. Land surface temperature retrieval from LANDSAT TM 5 Remote Sensing of Environment, Elsevier, 90, 434-440

Land Cover Classification of Cropland: a Tutorial Using the Semi-Automatic Classification Plugin for QGIS

2 - Blue; 3 - Green; 4 - Red; 5 - Near-Infrared; 6 - Short Wavelength Infrared 1; 7 - Short Wavelength Infrared 2.

The following workflow illustrates the main phases of the semi-automatic classification:

1. Conversion of raster bands from DN to reflectance

2. Definition of the classification inputs

3. Creation of the ROIs

4. Classification of the study area

5. Calculation of classification accuracy

6. Calculation of classification statistics

2 - Blue;

3 - Green;

4 - Red;

5 - Near-Infrared;

6 - Short Wavelength Infrared 1;

7 - Short Wavelength Infrared 2.