We are about to finish a R&D
project, where we developed and tested crop classification methodology
specifically suited for Estonian agricultural, ecological and climatic
conditions. We relied mostly on Sentinel-1 and -2 data and used neural network
machine learning approach to distinguish 28 different crop types. Results are promising
and the methodology is ready for operational service to automate another part
of agricultural monitoring.
Using machine learning in crop type
classification is not new, and definitely not a revolutionary breakthrough - already
for decades different classifiers (Support Vector Machine, Decision Trees,
Random Forest and many more) have been used in land cover classification.
Recently also neural networks, the wunderkind of machine learning and
image recognition, are widely used in crop discrimination. Satellite data, as
the main input to classification models, has no serious alternatives, since our
aim is to implement it on worldwide scale and in applications, which run near
real time. So, why even get excited about another crop type classification
study, which exploits same methods and datasets as tens of previous studies?
I can give you one reason.
Estonia has been very successful in following European Commission (EC) guidelines
and rules in modernizing the EU Common Agricultural Policy. In 2018 EC adopted new
rules that allow to completely replace physical checks on farms with a system
of automated checks based on analysis of Earth observation data. The same year Estonian Agricultural Registers And Information
Board (ARIB) launched the first nation-wide fully automated mowing detection
system, which uses Sentinel-1 and Sentinel-2 data and where the prediction
model inside the system is developed by KappaZeta. The system has been running
for 3 years, it has significantly reduced the amount of on-site checks and increased
the detection of non-compliances. In short – saved Estonian and EU taxpayers’
money. Automated crop discrimination is the next step in pursuing the above-mentioned
vision and will probably become the foundation of all agricultural monitoring.
With proved and tested methodology, it’s highly likely that Estonia will take
this next step in the very near future and launch it again on the nationwide
level. This is definitely a perspective to be excited about.
Now, let’s see how we tackled “the
good old” crop classification task.
Although algorithms and methods
are important to make a difference in prediction model performance, the
training data is the most valuable player in this game. In Estonia all farmers
who want to be eligible for subsidies need to declare crops online (field
geometry + crop type label). This open dataset
is freely accessible to everyone and has the permission to re-use and
redistribute both commercial and non-commercial purposes. Since the crop type
labels are defined by farmers and most of them are not double-checked by ARIB,
there can be mistakes (according to ARIB’s estimations, less than 5%). Therefore,
for additional validation we ran our own cluster analysis on time-series to
filter out obvious outliers in each class.
After we had the parcels and labels, we calculated
time-series of different satellite-based, plus some ground-based features (precipitation,
average temperatures, soil type). When extracting features from satellite images
there are two ways to go: pixel- or parcel-based extraction. We selected the
latter and averaged pixel values over the parcel to obtain one numerical feature
value per statistic for each point in time (see Figure 1).
For Sentinel-1 images
preprocessing we have developed our own processing chain to produce
reliable time-series for several features. From the previous studies it’s known
that features (channel values and indexes) from Sentinel-2 images combined with
features from Sentinel-1 images (coherence, backscatter) give better
classification results than any of these features separately.
We used data from 2018 and 2019
seasons (altogether more than 200 000 parcels) and aggregated all crop
type labels into 28 classes which were defined by the need of the ARIB.
Due to the very unbalanced
dataset we had to under-sample some classes and over-sample others for the
training data. In small classes we used the existing time-series and added
noise for data augmentation.
Model architecture was rather
simple – input layer, flatten layer, three fully connected dense layers (two of
them followed by batch normalization layer) and output (Figure 3). Our
experiment with adding 1D CNN layer after input didn’t improve results
significantly. More complicated ResNet (residual neural network) architecture increased
training time by approx. 30%, but results were similar to a linear neural
F1 score on
validation set (9% of all dataset) was 0.85
and on test set (2% of all
. In 10 classes the recall was more than 0.9 and in 16
classes more than 0.8. See more from Figure 4 and 5.
Some features are more important than others
In a near-real time operative
system our model and feature extraction would have to be as efficient as
possible. For an R&D project we could easily calculate 20+ features from
satellite images, feed them all to the model and let the machines compute. But
what if not all features are equally important?
They are not. We found that the 5
most important features are Sentinel-1 backscatter (s1_s0vh, s1_s0vv),
NDVI, TC Vegetation and PSRI from Sentinel-2. To our
surprise, soil type and precipitation sum before satellite image acquisition
had low relevance.
The 5 most important features played different role
during the season – Sentinel-2 features were more important in the beginning
and in the end of the season, while Sentinel-1 features had more effect during
This project was part of a much
larger initiative called “National Program for Addressing Socio-Economic
Challenges through R&D. Using remote sensing data in favor of the public
sector services.” Several research groups all over Estonia worked on prototypes
to use remote sensing in fields like detecting forest fire hazard, mapping floods
and monitoring urban construction. Now its up to Estonia’s public sector
institutions to take the initiative and turn prototypes into operational
services. With this work we have proved, that satellite-based crop classification
in Estonia is possible, accurate enough and ready to be implemented as the next
monitoring service for ARIB.
If you are more interested about
this study, our Sentinel-1 processing pipeline or machine learning expertise,
then feel free to get in touch. We have the mentality to share not hide our experience
and learn together on this exciting journey.