Using Google Earth Engine to Map Complex Shade-Grown Coffee Landscapes in Northern Nicaragua

Lisa C. Kelley, Lincoln Pitcher, Chris Bacon
2018 Remote Sensing  
Shade-grown coffee (shade coffee) is an important component of the forested tropics, and is essential to the conservation of forest-dependent biodiversity. Despite its importance, shade coffee is challenging to map using remotely sensed data given its spectral similarity to forested land. This paper addresses this challenge in three districts of northern Nicaragua, here leveraging cloud-based computing techniques within Google Earth Engine (GEE) to integrate multi-seasonal Landsat 8 satellite
more » ... agery (30 m), and physiographic variables (temperature, topography, and precipitation). Applying a random forest machine learning algorithm using reference data from two field surveys produced a 90.5% accuracy across ten classes of land cover, with an 82.1% and 80.0% user's and producer's accuracy respectively for shade-grown coffee. Comparing classification accuracies obtained from five datasets exploring different combinations of non-seasonal and seasonal spectral data as well as physiographic data also revealed a trend of increasing accuracy when seasonal data were included in the model and a significant improvement (7.8-20.1%) when topographical data were integrated with spectral data. These results are significant in piloting an open-access and user-friendly approach to mapping heterogeneous shade coffee landscapes with high overall accuracy, even in locations with persistent cloud cover. Remote Sens. 2018, 10, 952 2 of 19 coffee and young and mature woodland or forest, limiting an understanding of coffee production in relation to other forested and agricultural land covers (e.g., 37.5-58.7% in an early effort by the authors of [10]; see also ). Most shade coffee is also produced in plots of land smaller than the resolution of openly accessible satellite data (e.g., MODIS) under a canopy of forest trees [12] . Resulting rustic shade coffee systems thus closely resemble adjacent patches of forested land in visible/near-infrared wavelengths [15] [16] [17] . Topographical effects (e.g., shadows) and chronic cloud cover compound mapping challenges in the mountainous regions where most coffee is grown [11, 13] . A number of recent studies have explored how refined classification algorithms and high-resolution passive spaceborne and airborne data might be used to improve smallholder coffee classification. In one study, QuickBird imagery (0.6-2.4 m spatial resolution) was combined with a neural network classifier to predict the distribution of shade coffee trees with 96.9% accuracy in New Caledonia [12] . Another study integrated aerial orthoimagery (0.25 m spatial resolution) with existing maps of forest cover, elevation data, and expert knowledge in a Bayesian predictive model to classify coffee with 87% overall accuracy in Rwanda [18] . Analysts have also explored classification using open-access Landsat data (30 m) [19] , often integrating Landsat data with physiographic variables. The authors of [11], for example, showed how data on land surface temperature from Landsat ETM+ thermal imagery can help detect subtle differences in leaf and canopy density, morphology, biomass, species composition, and canopy water status in shade coffee lands (see also Reference [20]). By integrating these data with visible and infrared bands and a post-classification stratification model using elevation and precipitation data, these authors achieved a producer and user accuracy of 91.8% and 61.1% respectively for shade coffee in Costa Rica's highlands. Elsewhere, the role of multi-temporal Landsat data in accurately mapping shade coffee has been explored [13, 17] . By integrating three Landsat ETM+ scenes from various seasons, for example, one study from El Salvador differentiated between coffee plantations and non-mangrove forest with 81.6% accuracy [13] . Outside the coffee sector, multi-temporal or seasonal spectral features or indices are now commonly used to identify otherwise difficult-to-classify agricultural and forested lands [21] [22] [23] [24] [25] . Such approaches are capable of exploiting information on intra-annual variation in dominant crops or vegetative covers [25] [26] [27] . Multi-temporal spectral datasets have been particularly effective in achieving high classification accuracy when satellite imagery is integrated with physiographic or textural variables [22, 28, 29] . The authors of [28], for example, obtained an 86% overall accuracy across 14 land-cover classes in a heterogeneous landscape in southern Spain by combining seasonal imagery with digital terrain measures and textural data. The authors of [22] mapped larch plantations in China with 91.9% accuracy by combining multi-seasonal Landsat 8 imagery and bivariate textural features in a random forest (RF) classification model. Historically, the cost of satellite imagery precluded such approaches, particularly discouraging their application across broad spatial scales and/or in regions marked by high cloud coverage [30, 31] . Since 2008, however, open-access to the entire Landsat imagery archive has resolved this challenge, in effect creating the world's largest openly accessible library of geospatial information at 30-m resolution [30] . The development of Google Earth Engine (GEE) and other cloud-based computing techniques, in turn, has allowed analysts to process and analyze dense stacks of satellite imagery on the fly. GEE, for example, hosts the entire Landsat archive, provides tools and an application program interface for summoning, processing, and mosaicking this imagery, and runs all analyses in parallel across many machines in Google's cloud-based processing platform [32] . As prior work has illustrated, these developments enable analyses at high computational efficiency even when relying on multi-temporal image mosaics [23, 33, 34] . The authors of [33], for example, used >650,000 Landsat scenes and >1 million central processing unit (CPU) hours to produce annual maps of tree cover gain and loss, performing these computations in just several days. New cloud-based computational capacities also enable the efficient use of "ensemble" classifiers such as RF which average across hundreds of individual decision trees to ensure unbiased land-cover classifications [22, 35] .
doi:10.3390/rs10060952 fatcat:5hbuuy6mwvcr7oyi7ut4pnshne