Modern agriculture is increasingly merging with technological advancements, especially in the area of remote sensing. With climate conditions becoming more unpredictable, there is a clear need for resilient agricultural monitoring systems. Manual checks of crop damages or specific agricultural activities can be tedious and inefficient, prompting both farmers and crop insurance companies to lean towards automated detection systems. This not only aids in identifying crop failures early but also assists in tracking different growth stages. At the same time, agricultural paying agencies are looking for automated ways to monitor field activities, such as ploughing and harvesting, in order to increase the cost-effectiveness of their work. A promising solution to these varied needs is the development of synthetic vegetation indices, notably the synthetic NDVI (Normalized Difference Vegetation Index).
Crafting reliable generative deep learning models comes with its challenges. One key challenge is integrating SAR (Synthetic Aperture Radar) and optical (RGB and NIR) remote sensing data, especially from satellites like Sentinel-1 and Sentinel-2 of the European Union and the European Space Agency (ESA). While this data is freely available, using it effectively requires precise methodologies. The performance of these GAN (Generative Adversarial Network) models largely depends on the right feature selection. For SAR, the choice between 6-day and 12-day temporal baselines for interferometric coherence is crucial. Here, KappaZeta's expertise comes into play, making the complex task of processing SAR data much more manageable. Historically, KappaZeta has developed models for both coherence durations. The 6-day models have shown good results in capturing swift changes in crop growth, especially for non-cereal crops, while the 12-day models were better suited for predicting events like crop decline as the season ends.
The existing set of generative models would have been sufficient enough if Sentinel-1B would not have had a mission-ending electrical anomaly in December of 2021. This incident made the 6-day coherence model unfeasible, at least until the launch of Sentinel-1C, leading to a significant question: Could the existing 12-day model be adapted and improved to function as an all-encompassing synthetic NDVI generative model, bridging the gap of the two previous models? This was the starting ground to my expedition into the world of synthetic vegetation indices and to everything else this captivating field had to offer.
Generative Adversarial Networks (GANs), used to generate artificial images of synthetic vegetation index values, are often perceived as rather complex neural networks to design. A primary source of this complexity stems from the dual structure involving two intertwined networks - a generator and a discriminator - that are trained simultaneously, leading to intricate training dynamics. Achieving equilibrium between the generator and the discriminator, where neither outperforms the other, often necessitates meticulous tuning, a balance that can be challenging to achieve. GANs are also notorious among machine learning engineers for the inherent difficulties in assessing the quality and diversity of generated outputs. Such assessments are typically non-trivial, often demanding subjective or indirect measures, complicating quantitative performance evaluation. To demystify this enormous complexity for myself, I resolved to dissect the problem from three distinct perspectives: model architecture, data manipulation, and performance evaluation. The subsequent sections provide a brief recap of my explorations and some intriguing findings within these domains.
We always aim for our models to generate precise and accurate outputs and this often requires delving into complex neural networks that can learn intricate patterns beyond human comprehension. With GANs, the myriad of model architecture design possibilities presents a challenge and it's hard to predict which designs will be effective and which might end up being counterproductive. When dealing with geospatial remote sensing data, especially in the context of agricultural parcels and making estimations about their conditions, it's crucial for the model to produce accurate results while maintaining good spatial awareness and the ability to generate visually compelling, believable outputs. However, the task of measuring and assessing these attributes is tauntingly complex, not even mentioning the difficulty of the pursuit of solutions to attain them.
In networks that employ convolutional layers and also deconvolutions, like GANs typically do, one strategy to at least theoretically enhance spatial awareness is by enlarging the kernel sizes within those layers. Not only can this adjustment improve spatial sensitivity, but it can also mitigate the visual checkerboard artefacts commonly observed in outputs from models that utilise deconvolutional layers. It is rarely the case that a theoretically posited solution translates seamlessly into practical efficacy, yet that's precisely what happened with the increased kernel size in this context (Fig. 1). However, seemingly straightforward solutions like this seldom come without trade-offs. Enlarging the kernel size can significantly inflate the model's memory footprint and elongate training durations - considerations of utmost importance when resources are constrained.
Figure 1. From left to right:
Original Image: The initial, untouched image.
Initial Model Output: Notice regular pattern distortions and lack of spatial detail.
Enhanced Model Output: Result from an improved model with adjusted kernel size and updated interpolation techniques. Notice the clearer and more spatially accurate representation.
A well-suited loss function often holds the key to achieving outstanding results and efficiently optimising a model. However, discovering the right one is seldom straightforward. In the realm of generative models, it's beneficial to incorporate loss functions that evaluate image-specific attributes such as luminance, contrast, and structure. These attributes help in gauging the quality of the output image in relation to the actual target image. When integrated with traditional loss functions like L1 or L2, the results can potentially be further enhanced. The application of an image-centric loss function notably improved predictions of NDVI values in these experiments, registering enhancements both metrically and visually. Yet, if our primary concern revolves around the agricultural parcels - which often represent just a small fraction of the entire image - why prioritise the whole image? Integrating masking techniques within the loss function enables the model to zone in on these specific agricultural plots, sidelining extraneous areas. This approach is particularly valuable considering that changes in index values within the parcels exhibit different temporal patterns compared to their surroundings. By masking these surrounding areas, the model hones in on the parcels with greater accuracy. However, it's a double-edged sword: while predictions for the parcels improve, those expecting a visually congruent comparison between the model's output and the original image might be left wanting. Models trained with such a specialised loss function tend to falter when predicting areas outside of the parcels.
If there ever was a god of machine learning, it would probably declare: 'Data is at the core of it all!' Truly grasping the nuances of data, which can often be more complex than anticipated, is the foundation of any successful model that is built upon it. Geospatial remote sensing data brings its own set of challenges, some of which one would never even think about when working with classical tabular datasets. But after days and weeks of agony it becomes evident that these challenges can also be turned into assets. Considering satellite pass direction during the model training process is a prime example of this phenomenon. Satellites, as they orbit the Earth, can capture the same location from different directions across their various passes. Depending on whether the satellite is on an ascending or descending pass when taking an image, there can be slight variations in the angular patterns of the output. These minute differences might escape the human eye, but neural networks excel at detecting them. Remarkably, integrating information about the input image's direction can significantly boost a model's accuracy and computational efficiency as well. The key takeaway here is that sometimes, challenges can be cleverly repurposed into strengths.
In standard image classification problems, evaluating neural networks' performance is usually rather straightforward with metrics like accuracy or the F1-score. However, when the focus shifts to continuous scales typical of different vegetation indices for example, evaluation nuances arise. Common metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) become essential and often satisfy the needs, but in remote sensing, there is more than just numeric accuracy to consider. Assessing the visual "goodness" and spatial integrity of the output images becomes paramount. This involves looking at the texture and smoothness of the image, its consistency with adjacent areas, its boundary delineations etc. Beyond the visual and spatial aspects, computational efficiency is a crucial aspect as well - the time taken for the model to process, its memory usage and activation patterns are all indicators of its efficiency. In essence, remote sensing brings forth multifaceted challenges that require a combination of quantitative metrics, qualitative assessments and often rather non-standard evaluation methods.
Fundamental probability theory asserts that predicting an exact value from a continuous distribution is impossible. This insight underscores the notion that while we should strive to minimise the error in predicting each pixel's NDVI value, achieving absolute accuracy is just unattainable. Fortunately, in many contexts, exactitude isn't even necessary. Often, synthetic NDVI serves as an alert system, signalling anomalies such as suboptimal wheat growth or a grassland not being mowed by a certain date. This leads to the idea of grouping pixels according to their index values or even further, we can look at how these values change compared to older images of the same location, much like a traffic-light system indicates status. This grouping makes it simpler to compare a model's results with the actual image, helping differentiate between high-performing and subpar models. The objective, then, is to ensure that the class distributions in the two images align as closely as possible (Fig. 2). Furthermore, if we want to emphasise the importance of minimising significant errors, it's beneficial to analyse the proportion of pixels with substantial disagreements between the model’s output and target image. This rather pragmatic simplification laid the groundwork for the Disagreement Index (DI), which emerged as a pivotal visual output evaluation metric in various experiments and helped to assess the model's ability to predict the direction of change (Fig. 3). Pursuing an all-encompassing solution is often elusive, especially in the beginning stages. Thus, strategic simplifications, despite their imperfections, can pave the path toward ultimate objectives.
Figure 2. Upper row: NDVI value changes compared to a historical image (not shown) for both the actual image (left) and model generated image (right).
Bottom row: Agreement map between the images in the upper row.
Figure 3. Agricultural parcels on the axis of DI and share of big disagreement pixels. Black dashed line represents the parcel ‘ideal zone’.
Topics described in previous paragraphs make up just a small share of experiments carried out during my two-month stay at KappaZeta and even smaller share of all of the possible aspects to further test and develop about the synthetic NDVI models in the future. One is for certain - some talented and dedicated people are making sure that KappaZeta will be the leading provider of reliable synthetic vegetation index services in the forthcoming years!