9,164 Hits in 6.3 sec

Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads [article]

Parmita Mehta, Sven Dorkenwald, Dongfang Zhao, Tomer Kaftan, Alvin Cheung, Magdalena Balazinska, Ariel Rokem, Andrew Connolly, Jacob Vanderplas, Yusra AlSayyad
2016 arXiv   pre-print
In this paper, we present the first comprehensive evaluation of large-scale image analysis systems using two real-world scientific image data processing use cases.  ...  It is unclear, however, how well these systems support real-world image analysis use cases, and how performant are the image analytics tasks implemented on top of such systems.  ...  Data, award from the Gordon and Betty Moore Foundation and the Alfred P Sloan Foundation, and the Washington Research Foundation Fund for Innovation in Data-Intensive Discovery.  ... 
arXiv:1612.02485v1 fatcat:6oumkgh37fh3dbvjguaf5i7eei

Benchmarking Big Data Systems: State-of-the-Art and Future Directions [article]

Rui Han, Zhen Jia, Wanling Gao, Xinhui Tian, Lei Wang
2015 arXiv   pre-print
The great prosperity of big data systems such as Hadoop in recent years makes the benchmarking of these systems become crucial for both research and industry communities.  ...  However, most of the existing big data benchmarks can be described as attempts to solve specific problems in benchmarking systems.  ...  ACKNOWLEDGMENTS This technical report is a significant extended version of its preliminary version entitled "On Big Data Benchmarking", which is published in BPOE-4 (Co-located with ASPLOS 2014) [36]  ... 
arXiv:1506.01494v1 fatcat:3icae6wgjjfj7afsmlzppd4e2q

On Big Data Benchmarking [chapter]

Rui Han, Xiaoyi Lu, Jiangtao Xu
2014 Lecture Notes in Computer Science  
Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities.  ...  , variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems.  ...  Requirements and Challenges Big data benchmarks are developed to evaluate and compare the performance of big data systems and architectures.  ... 
doi:10.1007/978-3-319-13021-7_1 fatcat:f4zrxikfjrctpj6xnxw6s7alki

On Big Data Benchmarking [article]

Rui Han, Xiaoyi Lu
2014 arXiv   pre-print
Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities.  ...  , variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems.  ...  Requirements and Challenges Big data benchmarks are developed to evaluate and compare the performance of big data systems and architectures.  ... 
arXiv:1402.5194v1 fatcat:cbd3zzbse5c7xbc63w6dl6j5ie

Using pattern-models to guide SSD deployment for Big Data applications in HPC systems

Junjie Chen, Philip C. Roth, Yong Chen
2013 2013 IEEE International Conference on Big Data  
These benefits are needed in HPC systems, especially with the growing demand of supporting Big Data applications.  ...  Our research will be helpful in guiding designs and developments for Big Data applications in current and projected HPC systems including exascale systems.  ...  Data volumes of many scientific simulations and applications in critical research areas like astrophysics, geographic systems, climate sciences, medical image processing, and high-energy physics, have  ... 
doi:10.1109/bigdata.2013.6691592 dblp:conf/bigdataconf/ChenRC13 fatcat:oa6hhlez4fbxdeaeqnw2l3376m

Construing the big data based on taxonomy, analytics and approaches

Ajeet Ram Pathak, Manjusha Pandey, Siddharth Rautaray
2018 Iran Journal of Computer Science  
Big data have become an important asset due to its immense power hidden in analytics.  ...  of it.  ...  the scope of big data analytics to each kind of multi-media data, i.e. text, images and videos.  ... 
doi:10.1007/s42044-018-0024-3 fatcat:teiovluolngepjyebzz2wnwjxu

Employing Vertical Elasticity for Efficient Big Data Processing in Container-Based Cloud Environments

Jin-young Choi, Minkyoung Cho, Jik-Soo Kim
2021 Applied Sciences  
We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark.  ...  Recently, "Big Data" platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly.  ...  Although Big Data technologies can improve every part of a business from providing insights for new analytical applications to augmenting traditional on-premise systems, as the overall scale of Big Data  ... 
doi:10.3390/app11136200 fatcat:a54hcl3ntndwpfxsmjoawjmh6y

Evaluating the Benefits of Key-Value Databases for Scientific Applications [chapter]

Pol Santamaria, Lena Oden, Eloy Gil, Yolanda Becerra, Raül Sirvent, Philipp Glock, Jordi Torres
2019 Lecture Notes in Computer Science  
The convergence of Big Data applications with High -Performance Computing requires new methodologies to store, manage and process large amounts of information.  ...  Given the computing needs, we study the effects of replacing a traditional storage system with a distributed Key-Value database on a cell segmentation application.  ...  Parallel File Systems (PFS) have performance issues on Big Data workloads even if they are HPC-oriented, such as GPFS.  ... 
doi:10.1007/978-3-030-22734-0_30 fatcat:vjlt4pqsona4njuxdob64fgg4a

A novel cloud based elastic framework for big data preprocessing

Omer Dawelbeit, Rachel McCrindle
2014 2014 6th Computer Science and Electronic Engineering Conference (CEEC)  
A number of analytical big data services based on the cloud computing paradigm such as Amazon Redshift and Google Bigquery have recently emerged.  ...  Although these big data services have addressed the issue of big data analysis, the ability to efficiently de-normalise and prepare this data to a format that can be imported into these services remains  ...  Cloud Big Data Services Another cloud based feature that enables the interactive analysis of big data, is big data services, these services are based on columnar database systems [6] which, unlike traditional  ... 
doi:10.1109/ceec.2014.6958549 fatcat:y3lqoazdm5ds3ni2xbctztwbtu

A Survey of Benchmarks to Evaluate Data Analytics for Smart-* Applications [article]

Athanasios Kiatipis, Alvaro Brandon, Rizkallah Touma, Pierre Matri, Michal Zasadzinski, Linh Thuy Nhuyen, Adrien Lebre, Alexandru Costan
2019 arXiv   pre-print
Afterwards, for each of these requirements, there is a description of the benchmarks one can use to precisely evaluate the performance of the underlying systems and technologies.  ...  They hide a staggering complexity, relying on multiple layers of data collection, transmission, aggregation, analysis and also storage, both at the network edge and on the cloud.  ...  This work is part of the "BigStorage: Storage-based Convergence between HPC and Cloud to handle Big Data" project, H2020-MSCA-ITN-2014-642963, funded by the European Commission within the Marie Skłodowska-Curie  ... 
arXiv:1910.02004v1 fatcat:l2bghlqczffspfzbfnx222gdvy

Recent advances in autonomic provisioning of big data applications on clouds

Rajiv Ranjan, Lizhe Wang, Albert Y. Zomaya, Dimitrios Georgakopoulos, Xian-He Sun, Guojun Wang
2015 IEEE Transactions on Cloud Computing  
His research interests include parallel and distributed processing, memory and I/O systems, software systems for big data applications, and performance evaluation.  ...  Wu et al. develop a prototype generic workflow system by leveraging existing technologies for a quick evaluation of scientific workflow optimization strategies.  ... 
doi:10.1109/tcc.2015.2437231 fatcat:w7xiaqzhs5eg7hd3fubpnneury

Big data research

Magdalena Balazinska
2015 Proceedings of the VLDB Endowment  
The need for effective tools for big data data management and analytics continues to grow.  ...  While the ecosystem of tools is expanding many research problems remain open: they include challenges around efficient processing, flexible analytics, ease of use, and operation as a service.  ...  I would also like to thank our sponsors: the National Science Foundation, the Intel Science and Technology Center for Big Data, Amazon, Microsoft, Google, EMC, HP, and Yahoo.  ... 
doi:10.14778/2824032.2824140 fatcat:56ficva7svb2vcv6d5tcovmuui

Big data stream analysis: a systematic literature review

Taiwo Kolajo, Olawande Daramola, Ayodele Adebiyi
2019 Journal of Big Data  
for now and standard benchmark dataset for big data streaming analytics has not been widely adopted.  ...  This made it difficult for existing data mining tools, technologies, methods, and techniques to be applied directly on big data streams due to the inherent dynamic characteristics of big data.  ...  for Research and Advanced Training Fellowship, FR Number: 3240301383; and the Cape Peninsula University of Technology, South Africa.  ... 
doi:10.1186/s40537-019-0210-7 fatcat:6llv2yxdwrdl5chehm7lxm2yru

HPTMT Parallel Operators for High Performance Data Science Data Engineering [article]

Vibhatha Abeykoon, Supun Kamburugamuve, Chathura Widanage, Niranda Perera, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, Geoffrey Fox
2021 arXiv   pre-print
The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and  ...  Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning.  ...  Later on, these systems adopted the data analytics domain under their umbrella of big data problems.  ... 
arXiv:2108.06001v1 fatcat:qbnz7lk4mffc5mccq3xzntxkym

Distributed Intelligence on the Edge-to-Cloud Continuum: A Systematic Literature Review

Daniel Rosendo, Alexandru Costan, Patrick Valduriez, Gabriel Antoniu
2022 Journal of Parallel and Distributed Computing  
The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML).  ...  While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the development  ...  A survey on IoT Big Data Analytics covering Big Data generation, acquisition, storage, learning, and analytics is presented in [131] .  ... 
doi:10.1016/j.jpdc.2022.04.004 fatcat:mopdegh4vrgt5k47vrmc7xum24
« Previous Showing results 1 — 15 out of 9,164 results