Algorithms for Managing, Querying and Processing Big Data in Cloud Environments

Alfredo Cuzzocrea
2016 Algorithms  
Big data (e.g., [1] [2] [3] ) has become one of the most challenging research topics in current years. Big data is everywhere, from social networks to web advertisements, from sensor and stream systems to bio-informatics, from graph management tools to smart cities, and so forth. Cloud computing environments (e.g., [4] [5] [6] ) represent the "natural" context for such data, as they embed several emerging trends, both at the research level and the technological level, which comprise
more » ... ance, high reliability, high availability, transparence, abstraction, virtualization, and so forth. At the convergence of these emerging trends, managing, querying and processing big data in Cloud environments, which have received a great deal of attention from the research community recently (e.g., [7-9]), plays a leading role, and algorithmic approaches to these challenges are very promising now. These approaches come from a rich variety of multi-disciplinary areas, ranging from mathematical models to approximation models, from resource-constrained paradigms to memory-bounded methods, and so forth. Furthermore, algorithms for managing big data according to a "systematic" view of the problem are gaining momentum. For instance, algorithms for efficiently managing MapReduce tasks over Clouds are a clear instance of the latter scientific area. Inspired by these exciting research challenges, this special issue "Algorithms for Managing, Querying and Processing Big Data in Cloud Environments" of Algorithms focuses the attention on topics related to the theory and practice of algorithms for managing big data in Cloud environments, the design and analysis of algorithms for managing big data in Cloud environments, the tuning and experimental evaluation of algorithms for managing big data in Cloud environments, and so forth. The aim is that of providing a significant milestone on the road of the investigated topic, to be significant for both theory and practice, as well as applications and systems that are founded on such algorithms. The special issue contains four papers which have been accepted after two rigorous review rounds. In the following, we provide an overview on these papers. The first paper [10], entitled "Multiobjective Cloud Particle Optimization Algorithm Based on Decomposition", by Li et al., investigates the relevant multi-objective evolutionary paradigm based on decomposition (MOEA/D) that, as the authors correctly state, has received attention from many researchers in recent years. The paper thus presents a novel multi-objective algorithm based on decomposition and the Cloud computing model called multi-objective decomposition evolutionary algorithm based on Cloud Particle Differential Evolution (MOEA/D-CPDE). In the proposed method, the best solution found so far acts as a seed in each generation and evolves two individuals by a cloud generator. A new individual is produced by updating the current individual with the position vector difference of these two individuals. The performance of the proposed algorithm is verified on 16 well-known multi-objective problems, and experimental results indicate that MOEA/D-CPDE is competitive. The second paper [11] , entitled "Implementation of a Parallel Algorithm Based on a Spark Cloud Computing Platform", by Wang et al., proposes to parallelize the well-known MAX-MIN Ant System Algorithms 2016, 9, 13 2 of 4 (MMAS) algorithm in order to solve the annoying Traveling Salesman Problem (TSP) based on a Spark Cloud computing platform. Indeed, as authors correctly highlight, parallel algorithms, such as the ant colony algorithm, take a long time when solving large-scale problems. In the solution proposed by authors, MMAS is combined with Spark MapReduce to execute the path building and the pheromone operation in a distributed computer Cluster. In addition to this, to improve the precision of the proposed solution, the local optimization strategy 2-opt is adapted in MMAS. Experimental results show that Spark has a very great accelerating effect on the ant colony algorithm when the city scale of TSP or the number of ants is relatively large. The third paper [12], entitled "A Data Analytic Algorithm for Managing, Querying, and Processing Uncertain Big Data in Cloud Environments", by Jiang et al., considers the problem of mining big data for supporting the discovery of useful information and knowledge. In this context, they propose a data analytic algorithm for managing, querying and processing transactions of uncertain big data in Cloud environments. The proposed framework, based on this algorithm, allows users to query these big data by specifying constraints expressing their interests, and processes the user-specified constraints to discover useful information and knowledge. Due to the fact that each item in every transaction in these uncertain big data is associated with an existential probability value expressing the likelihood of that item to be present in a particular transaction, computation could be intensive. In order to cope with this issue, the proposed algorithm makes use of the MapReduce model in a Cloud environment for effective data analytics on uncertain big data. Experimental results show the effectiveness of the overall solution. Finally, the fourth paper [13], entitled "An Effective and Efficient MapReduce Algorithm for Computing BFS-based Traversals of Large-Scale RDF Graphs", by Cuzzocrea et al., focuses its attention on Resource Description Framework (RDF) graphs in terms of a relevant case of Big Web Data occurring in the so-called Semantic Web, leading to the well-known large-scale RDF graphs. They study the problem of effectively and efficiently computing traversals of large-scale RDF graphs over MapReduce and propose a solution that is based on the Breadth First Search (BFS) strategy for visiting (RDF) graphs to be decomposed and processed according to the MapReduce framework. The authors demonstrate how such implementation speeds up the analysis of RDF graphs with respect to competitor approaches. Experimental results clearly support the reliability of the provided contributions. The described contributions still open the door for future research challenges to be investigated in order to further improve the management of big data in Cloud environments, yet they are inspired by previous research experiences in related scientific areas. For instance, big data compression (e.g., [14] [15] [16] [17] ) seems a promising solution to this end, as compressing data improves data management efficiency, but it has yet to be provided under well-defined probabilistic guarantees on deriving accuracy (e.g., [18] [19] [20] ). Similarly, data fragmentation/partition techniques (e.g., [21] [22] [23] ) should be considered as well, in the area of further solutions for improving performance while taking advantage of the typical distributed nature of Cloud platforms that, in our opinion, still expose interesting features not yet completely exploited beyond those of previous (distributed) settings (e.g., Grids, Clusters, and so forth). In conclusion, there is still a lot of work to do in the context of managing, querying and processing big data in Cloud environments. We firmly hope this special issue represents a reliable milestone towards this difficult, yet exciting, research direction. Acknowledgments: The Special Issue editor would like to express his gratitude to all contributors and reviewers whose efforts allowed making this special issue a success, as well as to the editorial staff of MDPI Algorithms Journal for their continuous and diligent assistance whenever needed.
doi:10.3390/a9010013 fatcat:gw6w3qv53fe5dbykhkciuzawsq