Filters








3,747 Hits in 5.2 sec

Throughput measurement using a synthetic job stream

David C. Wood, Ernest H. Forman
1971 Proceedings of the May 16-18, 1972, spring joint computer conference on - AFIPS '72 (Spring)  
A synthetic job stream was used to obtain this measure.  ...  The validity of using a synthetic-job stream to measure throughput has been confirmed by comparison with a representative job stream.  ... 
doi:10.1145/1479064.1479074 dblp:conf/afips/WoodF71 fatcat:o5ojsix5rzfjriyxiusobg37hm

Throughput Prediction of Asynchronous SGD in TensorFlow [article]

Zhuojin Li, Wumo Yan, Marco Paolieri, Leana Golubchik
2019 arXiv   pre-print
In this paper, we present a solution to predicting training throughput from profiling traces collected from a single-node configuration.  ...  We validate our approach on TensorFlow training jobs for popular image classification neural networks, on AWS and on our in-house cluster, using nodes equipped with GPUs or only with CPUs.  ...  We also run a real cluster with W workers with GPUs and compare the measured throughput with our predictions, and with the throughput previously measured for 1 parameter server.  ... 
arXiv:1911.04650v1 fatcat:wk7wbys2czen7c77imtotkv3ve

Evaluating the efficacy of the cloud for cluster computation

D. Knight, K. Shams, G. Chang, T. Soderstrom
2012 2012 IEEE Aerospace Conference  
The cluster's local network also demonstrated sub-100 µs inter-process latency with sustained inter-node throughput in excess of 8 Gbps.  ...  These results demonstrate that while not a rival of dedicated supercomputing clusters, commercial cloud technology is now a feasible option for moderately demanding scientific workloads.  ...  processing jobs.  ... 
doi:10.1109/aero.2012.6187359 fatcat:byxlg5by2fewzhftxdm4ft6zuu

Enabling high-speed asynchronous data extraction and transfer using DART

Ciprian Docan, Manish Parashar, Scott Klasky
2010 Concurrency and Computation  
DART is a thin software layer built on RDMA technology to enable fast, low-overhead and asynchronous access to data from a running simulation, and support high-throughput, low-latency data transfers.  ...  A performance evaluation using the GTC and XGC-1 particle-in-cell based FSP simulations running on the Cray XT3/XT4 system at Oak Ridge National Laboratory demonstrates how DART can effectively and efficiently  ...  Before running these tests, we measured the link throughput between the streaming server and the remote node using the TCP transport protocol, and the peak measured value was 5.01 Gbps.  ... 
doi:10.1002/cpe.1567 fatcat:i42eqg25kbdkxlbghenkovn7vy

Elastic Stream Processing with Latency Guarantees

Bjorn Lohrmann, Peter Janacik, Odej Kao
2015 2015 IEEE 35th International Conference on Distributed Computing Systems  
To showcase the effectiveness of our approach, we provide an experimental evaluation on a large commodity cluster, using both a synthetic workload as well as an application performing real-time sentiment  ...  We describe how to continuously measure the necessary performance metrics for the model, and how it can be used to enforce latency guarantees, by determining appropriate scaling actions at runtime.  ...  We use this simple job to showcase the inherent trade-offs and difficulties in scalable real-time stream processing. A.  ... 
doi:10.1109/icdcs.2015.48 dblp:conf/icdcs/LohrmannJK15 fatcat:graw4wr3xne45ge2kj7i2wpbkq

Benchmarking Distributed Stream Data Processing Systems

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 2018 IEEE 34th International Conference on Data Engineering (ICDE)  
Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics.  ...  First, we give a definition of latency and throughput for stateful operators.  ...  We used synthetic data, which we generate with our data generator. Queries. The first query that we use is an aggregation query.  ... 
doi:10.1109/icde.2018.00169 dblp:conf/icde/KarimovRKSHM18 fatcat:yfvlfvsgvzaj7opgqin6cudxzu

Analysis of a Simple Approach to Modeling Performance for Streaming Data Applications

Jonathan C. Beard, Roger D. Chamberlain
2013 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems  
This paper outlines a computationally simple approach to modeling the overall throughput and buffering needs of a streaming application deployed on heterogeneous hardware.  ...  While necessary to ease programmability of these systems, this hidden complexity makes quantitative performance modeling a difficult task.  ...  Chamberlain and Washington University receive income based on a license of technology by the university to Exegy, Inc.  ... 
doi:10.1109/mascots.2013.49 dblp:conf/mascots/BeardC13 fatcat:aza7om3kgbemnivvlibbqtulbi

IncApprox

Dhanya R. Krishnan, Do Le Quoc, Pramod Bhatotia, Christof Fetzer, Rodrigo Rodrigues
2016 Proceedings of the 25th International Conference on World Wide Web - WWW '16  
We implemented our algorithm in a data analytics system called INCAPPROX based on Apache Spark Streaming.  ...  Approximate computation returns an approximate output for a job instead of the exact output.  ...  The throughput of the stream is tuned to measure the system throughput. The stream starts with 1000 messages/second and continues to increase throughput until the system is exhausted.  ... 
doi:10.1145/2872427.2883026 dblp:conf/www/KrishnanQBFR16 fatcat:ukywbrylyjfkxepyksllsfryka

DEBAR: A scalable high-performance de-duplication storage system for backup and archiving

Tianming Yang, Hong Jiang, Dan Feng, Zhongying Niu, Ke Zhou, Yaping Wan
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
into large sequential disk I/Os, hence achieving a very high de-duplication throughput.  ...  DEBAR uses a two-phase de-duplication scheme (TPDS) that exploits memory cache and disk index properties to judiciously turn the notoriously random and small disk I/Os of fingerprint lookups and updates  ...  The idea that using synthetic dataset to determine the de-duplication throughput has been proposed in DDFS [30] , but DDFS just builds data duplication within a synthetic backup stream and ignores the  ... 
doi:10.1109/ipdps.2010.5470468 dblp:conf/ipps/YangJFNZW10 fatcat:2s3qdt2d25huxg2drmlpa3gswi

Efficient Strong Scaling Through Burst Parallel Training [article]

Seo Jin Park, Joshua Fried, Sunghyun Kim, Mohammad Alizadeh, Adam Belay
2022 arXiv   pre-print
Second, GPU multiplexing prioritizes throughput for foreground training jobs, while packing in background training jobs to reclaim underutilized GPU resources, thereby improving cluster-wide utilization  ...  Together, these two ideas enable DeepPool to deliver a 1.2 - 2.3x improvement in total cluster throughput over standard data parallelism with a single task when the cluster scale is large.  ...  Pairwise collocation of various synthetic CUDA kernels with varied compute intensity and execution latency using stream priorities.  ... 
arXiv:2112.10065v3 fatcat:untgj3jxznh6domavntwrfslyi

User-guided symbiotic space-sharing of real workloads

Jonathan Weinberg, Allan Snavely
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
Using a large HPC platform, a representative application workload, and a sampling of expert users, we show that user inputs are of value and that for our chosen workload, userguided symbiotic scheduling  ...  Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources.  ...  This work was also supported in part by NSF NGS Award #0406312 entitled Performance Measurement & Modeling of Deep Hierarchy Systems.  ... 
doi:10.1145/1183401.1183450 dblp:conf/ics/WeinbergS06 fatcat:pynmyy2hgvbsbpxiidgqneburm

Online template induction for machine-generated emails

Michael Whittaker, Nick Edmonds, Sandeep Tata, James B. Wendt, Marc Najork
2019 Proceedings of the VLDB Endowment  
Crusher delivers an order of magnitude more throughput than a prototype built using a stream processing engine.  ...  Whether it be a bill reminder, a hotel confirmation, or a shipping notification, our emails contain useful bits of information that enable a number of applications.  ...  Scalability We evaluate the scalability of the clustering service by measuring its peak throughput against our synthetic workload when deployed with various numbers of servers.  ... 
doi:10.14778/3342263.3342264 fatcat:jr5tp533drderplf7je6grif7a

Issues and challenges in the performance analysis of real disk arrays

E. Varki, A. Merchant, J. Xu, X. Qiu
2004 IEEE Transactions on Parallel and Distributed Systems  
We show how measurement data and baseline performance models can be used to extract information about the various features implemented in a disk array.  ...  We use standard performance techniques to develop an integrated performance model that incorporates some of the complexities of real disk arrays.  ...  This work was supported in part by Hewlett Packard Labs and by the US National Science Foundation under grants ITR-0082399 and CCR-0093111.  ... 
doi:10.1109/tpds.2004.9 fatcat:lmro2zsa2nhljli2tqt47lj6dy

Utilizing Virtualized Hardware Logic Computations to Benefit Multi-User Performance

Michael J. Hall, Neil E. Olson, Roger D. Chamberlain
2021 Electronics  
., by sharing a fixed function) provides a means to effectively utilize hardware resources by context switching the logic to support multiple data streams of computation.  ...  Multiple applications or users can take advantage of this by using the virtualized computation in an accelerator as a computational service, such as in a software as a service (SaaS) model over a network  ...  For the SHA-2 application, we target a Xilinx Virtex-7 XC7VX485T FPGA and use the Xilinx Vivado 2013.4 tools. The clock period is unconstrained in the runs.  ... 
doi:10.3390/electronics10060665 fatcat:4zonesdgiraixbh6z4gmxep47i

StreamApprox

Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetzer, Volker Hilt, Thorsten Strufe
2017 Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference on - Middleware '17  
We evaluated StreamApprox using a set of microbenchmarks and real-world case studies.  ...  Thus, they are not well-suited for stream analytics. This motivated the design of StreamApprox-a stream analytics system for approximate computing.  ...  Experimental Setup Synthetic data stream.  ... 
doi:10.1145/3135974.3135989 dblp:conf/middleware/QuocCBFHS17 fatcat:2zlds3w2uzbzvdga6426whk474
« Previous Showing results 1 — 15 out of 3,747 results