Filters








13,757 Hits in 3.6 sec

Reduce Operations: Send Volume Balancing While Minimizing Latency

M. Ozan Karsavuran, Seher Acer, Cevdet Aykanat
2020 IEEE Transactions on Parallel and Distributed Systems  
The reduce-communication hypergraph model suffers from failing to correctly encapsulate send-volume balancing.  ...  We propose a novel vertex weighting scheme that enables part weights to correctly encode send-volume loads of processors for send-volume balancing.  ...  Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc  ... 
doi:10.1109/tpds.2020.2964536 fatcat:vvpxizcb7rgrzcrc2np5pcqd6e

Improving all-reduce collective operations for imbalanced process arrival patterns

Jerzy Proficz
2018 Journal of Supercomputing  
Two new algorithms for the all-reduce operation, optimized for imbalanced process arrival patterns (PAPs) are presented: (i) sorted linear tree (SLT), (ii) pre-reduced ring (PRR) as well as a new way of  ...  ., more than P × τ ) it performs only P − 1 send and receive operations, while every other process does 2P − 1 ones.  ...  while performing reduce operation using auxiliary short messages for signaling the PATs between the cooperating processes.  ... 
doi:10.1007/s11227-018-2356-z fatcat:o27cj73t6na33mmzviun5qav4e

Exploiting Green Energy to Reduce the Operational Costs of Multi-Center Web Search Engines

Roi Blanco, Matteo Catena, Nicola Tonellotto
2016 Proceedings of the 25th International Conference on World Wide Web - WWW '16  
Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ∼25% the energy operational costs of multi-center search engines.  ...  We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations.  ...  We aim to determine if our proposed MCF algorithm allows to markedly reduce market power consumption and operational costs with latency comparable to that achieved by the baselines.  ... 
doi:10.1145/2872427.2883021 dblp:conf/www/BlancoCT16 fatcat:xc72q5negzac3msdjgsdja7ze4

Optimizing nonzero-based sparse matrix partitioning models via reducing latency

Seher Acer, Oguz Selvitopi, Cevdet Aykanat
2018 Journal of Parallel and Distributed Computing  
h i g h l i g h t s • Optimizing fine-grain hypergraph model to reduce bandwidth and latency. • Optimizing medium-grain hypergraph model to reduce bandwidth and latency. • Message net concept to encapsulate  ...  minimization of total message count. • Practical enhancements to establish a trade-off between bandwidth and latency. • Significant performance improvements validated on nearly one thousand matrices.  ...  volume) while maintaining balance on the computational loads of processors.  ... 
doi:10.1016/j.jpdc.2018.08.005 fatcat:vzetfsnabna3pm63lvzdt5rfhe

Reducing latency cost in 2D sparse matrix partitioning models

Oguz Selvitopi, Cevdet Aykanat
2016 Parallel Computing  
These models aim at minimizing total message count while maintaining a balance on communication volume loads of processors; hence, they address both bandwidth and latency costs.  ...  Among evaluated models, the models that rely on 2D jagged partitioning obtain the most promising results by striking a balance between minimizing bandwidth and latency costs. sends/receives messages to  ...  The fine-grain model correctly minimizes the total communication volume while maintaining computational load balance. For more details, see [46, 49] .  ... 
doi:10.1016/j.parco.2016.04.004 fatcat:bnb5uuhfhjhovp57vhwf5yol5y

A Recursive Hypergraph Bipartitioning Framework for Reducing Bandwidth and Latency Costs Simultaneously

Oguz Selvitopi, Seher Acer, Cevdet Aykanat
2016 IEEE Transactions on Parallel and Distributed Systems  
The message nets encode the message count so that minimizing conventional cutsize captures the minimization of bandwidth and latency costs together.  ...  In this work, we propose a recursive hypergraph bipartitioning framework that reduces the total volume and total message count in a single phase.  ...  Considering both the volume and the message nets, minimizing the cutsize corresponds to reducing both the total volume and the total message count.  ... 
doi:10.1109/tpds.2016.2577024 fatcat:2w5tcebcevfxrn3um4vuf6muxe

Operating system support for distributed multimedia

David K. Y. Yau, Simon S. Lam
1998 International Journal of Intelligent Systems  
In this paper, we present the concept of input᎐output efficient buffers for reduced copying, the concept of fast system calls for low-latency network access, and the concept of kernel threads for flow  ...  The interface targets three areas for improvement: reduced copying, reduced reliance on explicit kernel᎐user interactions, and provision of rate-based flow control.  ...  Reduced System Calls Multimedia applications may send packets in bursts.  ... 
doi:10.1002/(sici)1098-111x(199812)13:12<1175::aid-int5>3.3.co;2-m fatcat:yka2t4e3rrdfpdmgyirwv3pddu

Operating system support for distributed multimedia

David K. Y. Yau, Simon S. Lam
1998 International Journal of Intelligent Systems  
In this paper, we present the concept of input᎐output efficient buffers for reduced copying, the concept of fast system calls for low-latency network access, and the concept of kernel threads for flow  ...  The interface targets three areas for improvement: reduced copying, reduced reliance on explicit kernel᎐user interactions, and provision of rate-based flow control.  ...  Reduced System Calls Multimedia applications may send packets in bursts.  ... 
doi:10.1002/(sici)1098-111x(199812)13:12<1175::aid-int5>3.0.co;2-i fatcat:6ylunp5czvc5zjbvoj2hdp5ety

Active memory operations

Zhen Fang, Lixin Zhang, John B. Carter, Ali Ibrahim, Michael A. Parker
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
AMOs can eliminate significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism.  ...  The performance of modern microprocessors is increasingly limited by their inability to hide main memory latency.  ...  On average, active messages reduce network traffic by a factor of 4.3, while AMOs reduce network traffic by 5.5X.  ... 
doi:10.1145/1274971.1275004 dblp:conf/ics/FangZCIP07 fatcat:ajzlsvdgorezbk6isb6nnlno24

Reducing Communication in Graph Neural Network Training [article]

Alok Tripathy, Katherine Yelick, Aydin Buluc
2020 arXiv   pre-print
We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods.  ...  Finally, our 1.5D algorithm optimizes communication for a given memory footprint, reducing communication volume by a factor of O(c) and latency cost by a factor of O(c 2 ) at the expense of asymptotically  ...  These numbers are actually optimistic and do not take into account the need to perform individualized "request and send" operations for exploiting the graph partitioning results.  ... 
arXiv:2005.03300v3 fatcat:ethsd346trgkxafz5t4sbxwqpy

Reducing Electricity Demand Charge for Data Centers with Partial Execution [article]

Hong Xu, Baochun Li
2013 arXiv   pre-print
In this paper, we study the familiar problem of reducing data center energy cost with two new perspectives.  ...  We propose a simple idea of using partial execution to reduce the peak power demand and energy cost of data centers.  ...  This represents state-ofthe-art that exploits partial execution for improving latency while satisfying SLA [32] , instead of using it to reduce the demand charge.  ... 
arXiv:1307.5442v2 fatcat:sdmggqvycnd4jk3vipojf3akju

Reducing energy and increasing performance with traffic optimization in many-core systems

George B. P. Bezerra, Stephanie Forrest, Payman Zarkesh-Ha
2011 International Workshop on System Level Interconnect Prediction  
Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization.  ...  As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance.  ...  This method minimizes the average distance trav-eled by packets and, consequently, energy and latency.  ... 
doi:10.1109/slip.2011.6135429 dblp:conf/slip/BezerraFZ11 fatcat:ozt5h564hzf3jecgoexw2idf5q

Leveraging smart phones to reduce mobility footprints

Stephen Smaldone, Benjamin Gilbert, Nilton Bila, Liviu Iftode, Eyal de Lara, Mahadev Satyanarayanan
2009 Proceedings of the 7th international conference on Mobile systems, applications, and services - Mobisys '09  
This is not required for correctness, but can reduce Horatio's mobility footprint by reducing the volume of wireless data it needs to transmit.  ...  Reducing Resume Latency Horatio can reduce resume latency by serving as a lookaside cache [15] for data state.  ... 
doi:10.1145/1555816.1555828 dblp:conf/mobisys/SmaldoneGBILS09 fatcat:4ssae5vjlrgcpaz3l3irlpn7pe

The effectiveness of request redirection on CDN robustness

Limin Wang, Vivek Pai, Larry Peterson
2002 ACM SIGOPS Operating Systems Review  
., server load, network proximity, cache locality--in an effort to reduce response time and increase the system capacity under load.  ...  This paper explores the design space of strategies employed to redirect requests, and defines a class of new algorithms that carefully balance load, locality, and proximity.  ...  The load-balanced variant of R-CHash is called LR-CHash, while the counterpart for R-HRW is called LR-HRW.  ... 
doi:10.1145/844128.844160 fatcat:fejh7rjzu5bidfjh2lr7dadf74

Anemone

Michael R. Hines, Mark Lewandowski, Kartik Gopalan
2005 Proceedings of the twentieth ACM symposium on Operating systems principles - SOSP '05  
As a consequence, application execution time is degraded due to higher disk latency involved in paging operations.  ...  In fact, typical remote memory paging latencies of 100 to 500µs can be easily achieved whereas the latency of paging to slow local disk (especially while paging in) is typically around 2 to 10ms, depending  ...  As a consequence, application execution time is degraded due to higher disk latency involved in paging operations.  ... 
doi:10.1145/1095810.1118608 fatcat:nbqkjksxpffdflehshtz2s6gom
« Previous Showing results 1 — 15 out of 13,757 results