Filters








9,646 Hits in 2.8 sec

Characterizing and subsetting big data workloads

Zhen Jia, Jianfeng Zhan, Lei Wang, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, Jingwei Li
2014 2014 IEEE International Symposium on Workload Characterization (IISWC)  
Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures.  ...  In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data  ...  We can successfully subset big data workloads.  ... 
doi:10.1109/iiswc.2014.6983058 dblp:conf/iiswc/JiaZWHMYLL14 fatcat:bqi6cq3qw5hlnmhhdzswi5swgm

A characterization of big data benchmarks

Wen Xiong, Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu
2013 2013 IEEE International Conference on Big Data  
However, benchmarking big data systems is much more challenging than ever before. First, big data systems are still in their infant stage and consequently they are not well understood.  ...  In this paper, we first analyze the redundancy among benchmarks from ICTBench, HiBench and typical workloads from real world applications: spatio-temporal data analysis for Shenzhen transportation system  ...  BigDataBench was purposed for large-scale systems and architecture researches and for characterizing big data applications; each benchmark in BigDataBench is equal to a single big application [7] .  ... 
doi:10.1109/bigdata.2013.6691707 dblp:conf/bigdataconf/XiongYBZZZBLX13 fatcat:tearw6y2cva55arivabjhhottu

Characterization and architectural implications of big data workloads

Lei Wang, Rui Ren, Jianfeng Zhan, Zhen Jia
2016 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
Big data areas are expanding in a fast way in terms of increasing workloads and runtime systems, and this situation imposes a serious challenge to workload characterization, which is the foundation of  ...  Second, corroborating the previous work, Hadoop and Spark based big data workloads have higher front-end stalls. Comparing with the traditional workloads i. e.  ...  WCRT WCRT is a comprehensive workload characterization tool, which can subset the whole workload set by removing redundant ones to facilitate workload characterization and other architecture research.  ... 
doi:10.1109/ispass.2016.7482083 dblp:conf/ispass/0004RZJ16 fatcat:u5s763kbgra5rpbv5cly3tvxgi

Benchmarking Big Data Systems: State-of-the-Art and Future Directions [article]

Rui Han, Zhen Jia, Wanling Gao, Xinhui Tian, Lei Wang
2015 arXiv   pre-print
and veracity), as well as implement application-specific but still comprehensive workloads.  ...  The complexity, diversity, and rapid evolution of big data systems gives rise to various new challenges about how we design generators to produce data with the 4V properties (i.e. volume, velocity, variety  ...  ACKNOWLEDGMENTS This technical report is a significant extended version of its preliminary version entitled "On Big Data Benchmarking", which is published in BPOE-4 (Co-located with ASPLOS 2014) [36]  ... 
arXiv:1506.01494v1 fatcat:3icae6wgjjfj7afsmlzppd4e2q

Memory system characterization of big data workloads

Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm
2013 2013 IEEE International Conference on Big Data  
This paper examines how these trends may intersect by characterizing the memory access patterns of various Hadoop and noSQL big data workloads.  ...  Two recent trends that have emerged include (1) Rapid growth in big data technologies with new types of computing models to handle unstructured data, such as mapreduce and noSQL (2) A growing focus on  ...  Recent studies have also proposed characterizing and understanding these big data usage cases.  ... 
doi:10.1109/bigdata.2013.6691693 dblp:conf/bigdataconf/DimitrovKLVW13 fatcat:x5kheseoh5cyfdclaisevd3lwq

Workload characterization for MG-RAST metagenomic data analytics service in the cloud

Wei Tang, Jared Bischof, Narayan Desai, Kanak Mahadik, Wolfgang Gerlach, Travis Harrison, Andreas Wilke, Folker Meyer
2014 2014 IEEE International Conference on Big Data (Big Data)  
In this paper, we characterize the MG-RAST workloads running in the cloud, from the perspectives of computation, I/O, and data transfer.  ...  The consequent data deluge has imposed big burdens for data analysis applications.  ...  ACKNOWLEDGMENTS This work was supported in part by the NIH award U01HG006537 "OSDF: Support infrastructure for NextGen sequence storage, analysis, and management", and U.S.  ... 
doi:10.1109/bigdata.2014.7004394 dblp:conf/bigdataconf/TangBDMGHWM14 fatcat:4qcliocqhbfyxam2ch26dmst2u

BenchCouncil's View on Benchmarking AI and Other Emerging Workloads [article]

Jianfeng Zhan, Lei Wang, Wanling Gao, Rui Ren
2019 arXiv   pre-print
This paper outlines BenchCouncil's view on the challenges, rules, and vision of benchmarking modern workloads like Big Data, AI or machine learning, and Internet Services.  ...  We conclude the challenges of benchmarking modern workloads as FIDSS (Fragmented, Isolated, Dynamic, Service-based, and Stochastic), and propose the PRDAERS benchmarking rules that the benchmarks should  ...  On the basis of the data motif methodology, we are proposing a new benchmark suite, named BENCHCPU [6] , to characterize emerging workloads, including Big Data, AI, and Internet Services.  ... 
arXiv:1912.00572v2 fatcat:oc73gvvw2behdiq27ib2yvdifu

Classifying Student's Learning Experience using Improved Apriori and CART

Pooja Verma, Rajesh Boghey, Sandeep Rai
2017 International Journal of Computer Applications  
The experimental results are performed and tested on various parameters such as precision and recall and final Score.  ...  The various student's learning experience and their classification is done here using Fuzzy-Apriori and CART provide and better way to final and issue problems in various fields.  ...  According to the definition of Big Data, Big Data is characterized by volume, velocity, and variety where traditional data processing methods and tools cannot be qualified.  ... 
doi:10.5120/ijca2017915311 fatcat:blwi3ksoinbx5dzngllqupxkre

System-Level Characterization of Datacenter Applications

Manu Awasthi, Tameesh Suri, Zvika Guz, Anahita Shayesteh, Mrinmoy Ghosh, Vijay Balakrishnan
2015 Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering - ICPE '15  
A large volume of recent literature in characterizing "Big Data" applications have largely focused on two extremes of the characterization spectrum.  ...  In recent years, a number of benchmark suites have been created for the "Big Data" domain, and a number of such applications fit the client-server paradigm.  ...  , and how. • Characterization results are presented for several "Big Data" workloads, concentrating on the coarse-grain, per-server, system-level behavior rather than the finegrained microarchitectural  ... 
doi:10.1145/2668930.2688059 dblp:conf/wosp/AwasthiSGSGB15 fatcat:sxlhqvtmljfvxnbe5y226yefzu

Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt! [chapter]

Alireza Khoshkbarforoushha, Rajiv Ranjan, Peter Strazdins
2016 Communications in Computer and Information Science  
Robust resource share estimation of data-intensive workloads is integral to efficient workload management in a cluster where multiple systems co-exist and share the same infrastructure.  ...  To address above challenges, we propose an inclusive framework and related techniques for workload profiling, similar job identification, and resource distribution prediction in a cluster.  ...  Modern big data clusters run a diverse mix of applications and production workloads [18] , thereby characterizing similar jobs is challenging.  ... 
doi:10.1007/978-3-319-33313-7_17 fatcat:goglen56ejggrba5oxphfz2eqy

ShenZhen transportation system (SZTS): a novel big data benchmark suite

Wen Xiong, Zhibin Yu, Lieven Eeckhout, Zhengdong Bei, Fan Zhang, Chengzhong Xu
2016 Journal of Supercomputing  
Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads.  ...  We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets.  ...  Background: big data and MapReduce Big data applications are often characterized using the four Vs: volume, velocity, variety and veracity.  ... 
doi:10.1007/s11227-016-1742-7 fatcat:2uszi5spwjhopi3xme75pnhfcm

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads [article]

Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Xu Wen, Xiwen He, Hainan Ye, Rui Ren
2018 arXiv   pre-print
This paper proposes a new approach to modelling and characterizing big data and AI workloads.  ...  /BigDataBench), and perform comprehensive characterization of those data motifs from perspective of data sizes, types, sources, and patterns as a lens towards fully understanding big data and AI workloads  ...  In this paper, we propose a new approach to modelling and characterizing big data and AI workloads.  ... 
arXiv:1808.08512v1 fatcat:4tmagnlfmvfbfj2kwqpdujfcbu

BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite [article]

Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Xu Wen, Rui Ren, Chen Zheng, Xiwen He, Hainan Ye, Haoning Tang, Zheng Cao, Shujie Zhang (+1 others)
2018 arXiv   pre-print
the combination of one or more data motifs---to represent diversity of big data and AI workloads.  ...  Unfortunately, complexity, diversity, frequently-changed workloads, and rapid evolution of big data and AI systems raise great challenges.  ...  We thoroughly perform workload characterizations of big data and AI benchmarks on CPUs and GPUs, respectively.  ... 
arXiv:1802.08254v2 fatcat:6ktsa3yowvaqtjbez26akp7a7e

Energy efficient job scheduling in single-ISA heterogeneous chip-multiprocessors

Ying Zhang, Lide Duan, Bin Li, Lu Peng, Srinivasan Sadagopan
2014 Fifteenth International Symposium on Quality Electronic Design  
In recent years, single-ISA heterogeneous chip multiprocessors (CMP) consisting of big high-performance cores and small power-saving cores on the same die have been proposed for the exploration of high  ...  In this work, we pay attention to reducing the energy consumption for workloads running on heterogeneous CMPs and propose a scheduling algorithm based on dynamic execution behaviors to exploit better energy-efficiency  ...  Therefore, the rules for the small core essentially characterize the execution phases that are not likely to result in extreme high power on a big core.  ... 
doi:10.1109/isqed.2014.6783390 dblp:conf/isqed/ZhangDLPS14 fatcat:dvqmwnjbmjajhjala7xwruk3v4

Big Data Benchmark Compendium [chapter]

Todor Ivanov, Tilmann Rabl, Meikel Poess, Anna Queralt, John Poelman, Nicolas Poggi, Jeffrey Buell
2016 Lecture Notes in Computer Science  
employ for Big Data systems.  ...  The goal is to understand the current state in Big Data benchmarking and guide practitioners in their approaches and use cases.  ...  The dataload is characterized by the size and the nature of the data sets used as inputs for a benchmark, and the workload is characterized by the number of concurrent clients and the distribution of the  ... 
doi:10.1007/978-3-319-31409-9_9 fatcat:n7lwtxainnblpf2xp4c5o2eynq
« Previous Showing results 1 — 15 out of 9,646 results