Function Modeling Improves the Efficiency of Spatial Modeling Using Big Data from Remote Sensing

2017 Big Data and Cognitive Computing  
Spatial modeling is an integral component of most geographic information systems (GISs). However, conventional GIS modeling techniques can require substantial processing time and storage space and have limited statistical and machine learning functionality. To address these limitations, many have parallelized spatial models using multiple coding libraries and have applied those models in a multiprocessor environment. Few, however, have recognized the inefficiencies associated with the
more » ... spatial modeling framework used to implement such analyses. In this paper, we identify a common inefficiency in processing spatial models and demonstrate a novel approach to address it using lazy evaluation techniques. Furthermore, we introduce a new coding library that integrates Accord.NET and ALGLIB numeric libraries and uses lazy evaluation to facilitate a wide range of spatial, statistical, and machine learning procedures within a new GIS modeling framework called function modeling. Results from simulations show a 64.3% reduction in processing time and an 84.4% reduction in storage space attributable to function modeling. In an applied case study, this translated to a reduction in processing time from 2247 h to 488 h and a reduction is storage space from 152 terabytes to 913 gigabytes. To address some of these limitations, advances in both GIS and statistical software have focused on integrating functionality through coding libraries that extend the capabilities of any software package. Common examples include RSGISlib [12], GDAL [13], SciPy [14], ALGLIB [15], and Accord.NET [16]. At the same time, new processing techniques have been developed to address common challenges with big data that aim to more fully leverage improvements in computer hardware and software configurations. For example, parallel processing libraries such as OpenCL [17] and CUDA [18] are stable and actively being used within the GIS community [19, 20] . Similarly, frameworks such as Hadoop [21] are being used to facilitate cloud computing, and offer improvements in big data processing by partitioning processes across multiple CPUs within a large server farm, thereby improving user access, affordability, reliability, and data sharing [22, 23] . While the integration, functionality, and capabilities of GIS and statistical software continue to expand, the underlying framework of how procedures and methods are used within spatial models in GIS tends to remain the same, which can impose artificial limitations on the type and scale of analyses that can be performed. Spatial models are typically composed of multiple sequential operations. Each operation reads data from a given data set, transforms the data, and then creates a new data set (Figure 1 ). In programming, this is called eager evaluation (or strict semantics) and is characterized by a flow that evaluates all expressions (i.e., arguments) regardless of the need for the values of those expressions in generating final results [24] . Though eager evaluation is intuitive and used by many traditional programming languages, creating and reading new data sets at each step of a model in GIS comes at a high processing and storage cost, and is not viable for large area analysis outside of the supercomputing environment, which is not currently available to the vast majority of GIS users.
doi:10.3390/bdcc1010003 fatcat:55egt3czvrgp3mqzqolujomoru