Filters








22,779 Hits in 6.8 sec

Scheduling dynamic parallelism on accelerators

Filip Blagojevic, Costin Iancu, Katherine Yelick, Matthew Curtis-Maury, Dimitrios S. Nikolopoulos, Benjamin Rose
2009 Proceedings of the 6th ACM conference on Computing frontiers - CF '09  
In this paper we consider multiple approaches for such scheduling problems and use the Cell BE system to demonstrate the different schedulers and the trade-offs between them.  ...  We then consider the addition of cooperative scheduling to the Linux kernel and a user-level work-stealing approach.  ...  The parallelization trade-offs for each application are analyzed in [6] .  ... 
doi:10.1145/1531743.1531769 dblp:conf/cf/BlagojevicIYCNR09 fatcat:7fyxlqxebff7zc34assd4mjkuy

On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors [chapter]

Carolina Bonacic, Carlos Garcia, Mauricio Marin, Manuel Prieto, Francisco Tirado
2011 Lecture Notes in Computer Science  
To our knowledge, computations related to capturing user preferences through their clicks on the query result webpages and include this feature in the document ranking process are currently performed in  ...  Real time search -a setting in which Web search engines are able to include among their query results documents published on the Web in the very recent past -is a clear evidence that many of the off-line  ...  Intuitively, the larger the block size, the coarser the parallelism for strategies like BP, but smaller blocks tend to improve data locality so a trade-off is in place.  ... 
doi:10.1007/978-3-642-19328-6_22 fatcat:xasicf5q3vhlxfb4bj4imow7ra

An efficient and generic reversible debugger using the virtual machine based approach

Toshihiko Koju, Shingo Takada, Norihisa Doi
2005 Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments - VEE '05  
Currently, our debugger provides four types of trade-off settings (designated by unit and optimization) to consider trade-offs between granularity, accuracy, overhead and memory requirement.  ...  The user can choose the appropriate setting flexibly during debugging without finishing and restarting the debuggee.  ...  With "Change trade-off setting", users can choose the appropriate trade-off setting to consider trade-offs between granularity, accuracy, overhead and memory requirement.  ... 
doi:10.1145/1064979.1064992 dblp:conf/vee/KojuTD05 fatcat:be6emzguanhppoti7ka2xclzju

Bio Molecular Engine

A. Gallini, C. Ferretti, G. Mauri
2005 Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05  
Evolutionary computation has been often used by computer scientists to evolve the morphologies and control systems of artificial life.  ...  In particular we discuss how to use a grid to evolutionary find a good solution to a well defined design issue: how much parallelism is good for a given problem computed in our environment.  ...  We need to map a computation onto parallel devices in such a way that we obtain the best performance under that trade-off.  ... 
doi:10.1145/1102256.1102314 dblp:conf/gecco/GalliniFM05 fatcat:vaewuvpbrfhwzht6b2op3v2aya

DPAC

Yanyan Jiang, Chang Xu, Xiaoxing Ma
2013 Proceedings of the 2013 Middleware Doctoral Symposium on - MDS '13  
This motivates us to design DPAC, an infrastructure that support in building dynamic program analysis tools for concurrency Java programs.  ...  We show two concrete case studies how our DPAC helps building existing dynamic program analysis approaches, as well as tuning subtle implementation details for supporting customized function implementation  ...  We are trying to find a good trade off which substantially improves the record-time efficiency with only a mild loss of replay guarantee.  ... 
doi:10.1145/2541534.2541591 dblp:conf/middleware/JiangXM13 fatcat:nh3efspkjjduhfagz7xldadjvm

The Risks of WebGL: Analysis, Evaluation and Detection [article]

Alex Belkin, Nethanel Gelernter, Israel Cidon
2019 arXiv   pre-print
We demonstrate in our experiments the major improvements of WebGL 2.0 over WebGL 1.0 both in performance and in convenience.  ...  We implemented a Chrome extension that proved itself effective in detecting and blocking WebGL.  ...  We also limited the WebGL performance, using the best trade-off parameters observed in Experiment 2 in Section 5.3 to prevent high GPU usage, with a minimal effect on the user experience.  ... 
arXiv:1904.13071v1 fatcat:qclshlxlqzaorcvmw7fftegzmy

Automated Space/Time Scaling of Streaming Task Graph [article]

Hossein Omidian, Guy G.F. Lemieux
2016 arXiv   pre-print
In this paper, we describe a high-level synthesis (HLS) tool that automatically allows area/throughput trade-offs for implementing streaming task graphs (STG).  ...  In addition to traditional node selection and replication methods used in prior work, we have uniquely implemented node combining and splitting to find a better area/throughput trade-off.  ...  Copyright In this paper, we describe the beginning of a high-level synthesis tool that can perform such automated space/time tradeoffs.  ... 
arXiv:1606.03717v1 fatcat:io75vcpurzdx5hqpkeibbh2jb4

On the Instrumentation of OpenMP and OmpSs Tasking Constructs [chapter]

Harald Servat, Xavier Teruel, Germán Llort, Alejandro Duran, Judit Giménez, Xavier Martorell, Eduard Ayguadé, Jesús Labarta
2013 Lecture Notes in Computer Science  
We present in this paper a fruitful synergy of a shared memory parallel compiler and runtime, and a performance extraction library.  ...  by incorporating data that is only known in the compiler and runtime side.  ...  Intel, Xeon, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other brands and names are the property of their respective  ... 
doi:10.1007/978-3-642-36949-0_47 fatcat:3xmioo3abvcyvndfp7akjazpiq

Quantitative analysis of the speed/accuracy trade-off in transaction level modeling

Gunar Schirner, Rainer Dömer
2008 ACM Transactions on Embedded Computing Systems  
In this article, we systematically analyze and quantify the speed/accuracy trade-off in TLM.  ...  The general TLM trade-off offers gains of up to four orders of magnitude in simulation speed, generally however, at the price of low accuracy.  ...  Overview In this article, we systematically study and analyze the TLM trade-off quantitatively.  ... 
doi:10.1145/1457246.1457250 fatcat:5zwzeovp3rcnrcuzjgjpw3a6ba

Massive Social Network Analysis: Mining Twitter for Social Good

David Ediger, Karl Jiang, Jason Riedy, David A. Bader, Courtney Corley
2010 2010 39th International Conference on Parallel Processing  
On a 128processor Cray XMT, GraphCT estimates the betweenness centrality of an artificially generated (R-MAT) 537 million vertex, 8.6 billion edge graph in 55 minutes and a realworld graph (Kwak, et al  ...  Facebook consists of over 400 million active users sharing over 5 billion pieces of information each month.  ...  ACKNOWLEDGMENTS This work was supported in part by the CASS-MT Center led by Pacific Northwest National Laboratory and NSF Grants CNS-0708307 and IIP-0934114.  ... 
doi:10.1109/icpp.2010.66 dblp:conf/icpp/EdigerJRBCFR10 fatcat:i6clwbmjuzetxgntubekzl3qxq

A concurrent dynamic analysis framework for multicore hardware

Jungwoo Ha, Matthew Arnold, Stephen M. Blackburn, Kathryn S. McKinley
2009 SIGPLAN notices  
It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads.  ...  We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral.  ...  With this model, user threads time-share CABs and may migrate from CAB to CAB according to the user-level scheduler, but in all cases, there is only one user thread mapped to a CAB at any given time.  ... 
doi:10.1145/1639949.1640101 fatcat:i7dgkfveqnex3ms5bvefsqn3u4

A concurrent dynamic analysis framework for multicore hardware

Jungwoo Ha, Matthew Arnold, Stephen M. Blackburn, Kathryn S. McKinley
2009 Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA 09  
It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads.  ...  We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral.  ...  With this model, user threads time-share CABs and may migrate from CAB to CAB according to the user-level scheduler, but in all cases, there is only one user thread mapped to a CAB at any given time.  ... 
doi:10.1145/1640089.1640101 dblp:conf/oopsla/HaABM09 fatcat:lzkkrc3w4ne4bjznhiwaie36jm

LiteRace

Daniel Marino, Madanlal Musuvathi, Satish Narayanasamy
2009 Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation - PLDI '09  
They are precise in the sense that they only report actual data races.  ...  In this paper we present LiteRace, a very lightweight data race detector that samples and analyzes only selected portions of a program's execution.  ...  Acknowledgments We would like to thank the anonymous reviewers for providing valuable feedback on this paper. We would also like to thank Trishul Chilimbi for helpful discussions.  ... 
doi:10.1145/1542476.1542491 dblp:conf/pldi/MarinoMN09 fatcat:addzgztxqzc4rgxrbxufzqhwky

LiteRace

Daniel Marino, Madanlal Musuvathi, Satish Narayanasamy
2009 SIGPLAN notices  
They are precise in the sense that they only report actual data races.  ...  In this paper we present LiteRace, a very lightweight data race detector that samples and analyzes only selected portions of a program's execution.  ...  Acknowledgments We would like to thank the anonymous reviewers for providing valuable feedback on this paper. We would also like to thank Trishul Chilimbi for helpful discussions.  ... 
doi:10.1145/1543135.1542491 fatcat:dze563venne2jb6b2owvskrzua

Power-aware pipelining with automatic concurrency control

Massimo Torquati, Daniele De Sensi, Gabriele Mencagli, Marco Aldinucci, Marco Danelutto
2018 Concurrency and Computation  
In this paper we describe the design of automatic concurrency control algorithm for implementing power-efficient communications on shared-memory multicores.  ...  The selection of the algorithm used to access such queues (i.e. the concurrency control) is a critical aspect both for performance and power consumption.  ...  The optimization of the performance/power trade-off has been mainly pursued by means of dynamic reconfigurations of the system at different abstraction levels, from the hardware level up to the run-time  ... 
doi:10.1002/cpe.4652 fatcat:fjre7bmiofhatixajcz7tdjlea
« Previous Showing results 1 — 15 out of 22,779 results