32,882 Hits in 6.3 sec

Towards more efficient execution

Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras
2013 Proceedings of the 27th international ACM conference on International conference on supercomputing - ICS '13  
This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach.  ...  Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.  ...  For applications with irregular memory accesses decoupling outperforms coupled execution both in terms of performance and power efficiency.  ... 
doi:10.1145/2464996.2465012 dblp:conf/ics/KoukosBSK13 fatcat:zw5s6xtfuzcxxfinsg36ci4sma

Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication [article]

Joseph Schuchart and Christoph Niethammer and José Gracia and George Bosilca
2021 arXiv   pre-print
In order to increase the flexibility of the RMA interface, we add the capability to duplicate windows, allowing access to the same resources encapsulated by a window using different configurations.  ...  The MPI standard has long included one-sided communication abstractions through the MPI Remote Memory Access (RMA) interface.  ...  . #1664142 and the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy Office of Science and the National Nuclear Security Administration.  ... 
arXiv:2111.08142v1 fatcat:xoy65diazrhdfb4xowsldmuwlq

Decoupled Access-Execute on ARM big.LITTLE [article]

Anton Weber, Kim-Anh Tran, Stefanos Kaxiras, Alexandra Jimborean
2017 arXiv   pre-print
DAE is a compiler technique that splits code regions into two distinct phases: a memory-bound Access phase and a compute-bound Execute phase.  ...  This proposal explores the power-savings and the performance gains that can be achieved by utilizing the ARM big.LITTLE core in combination with Decoupled Access-Execute (DAE).  ...  With Decoupled Access-Execute (DAE), Koukos et al.  ... 
arXiv:1701.05478v1 fatcat:23k4qfykqzd4xcct7nlku6zchq

Towards compositionality in execution time analysis

Sebastian Hahn, Jan Reineke, Reinhard Wilhelm
2015 ACM SIGBED Review  
Static timing analysis is therefore employed to compute upper bounds on the execution times of a program. Analysis results at high precision are required to avoid over-provisioning of resources.  ...  Therefore, recent analysis approaches often assume a certain independence of system components -referred to as timing compositionality.  ...  If at all, bounding the amount of interference in a cumulative way rather than locally classifying memory accesses as hit/miss seems more likely feasible in terms of precision and efficiency.  ... 
doi:10.1145/2752801.2752805 fatcat:xbcfcm4uvbcyth3bb3azkpah5q

Boosting mobile GPU performance with a decoupled access/execute fragment processor

José-María Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis
2012 SIGARCH Computer Architecture News  
We thus propose the migration of GPU designs towards the decoupled access-execute concept.  ...  Moreover, the trend towards better screens will inevitably lead to a higher demand for improved graphics rendering.  ...  Access/Execute Decoupling We propose to adopt a more energy-efficient prefetching approach to hide memory latency, which is based on the access/execute architectural paradigm [24] .  ... 
doi:10.1145/2366231.2337169 fatcat:f7j676py7jhehcl55u2v2k3b5u

Boosting mobile GPU performance with a decoupled access/execute fragment processor

Jose-Maria Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
We thus propose the migration of GPU designs towards the decoupled access-execute concept.  ...  Moreover, the trend towards better screens will inevitably lead to a higher demand for improved graphics rendering.  ...  Access/Execute Decoupling We propose to adopt a more energy-efficient prefetching approach to hide memory latency, which is based on the access/execute architectural paradigm [24] .  ... 
doi:10.1109/isca.2012.6237008 dblp:conf/isca/ArnauPX12 fatcat:wu7qgj5rqnh2rhftcvo5xdenge

Execution synthesis

Cristian Zamfir, George Candea
2010 Proceedings of the 5th European conference on Computer systems - EuroSys '10  
Execution synthesis is a technique for automating this detective work: given a program and a bug report, it automatically produces an execution of the program that leads to the reported bug symptoms.  ...  Using a combination of static analysis and symbolic execution, it "synthesizes" a thread schedule and various required program inputs that cause the bug to manifest.  ...  In this way, the search is consistently steered toward choosing and exploring executions that appear to be more likely to reach the intermediate goal soon.  ... 
doi:10.1145/1755913.1755946 dblp:conf/eurosys/ZamfirC10 fatcat:aqihlxdfs5d5nhbieel3q2fhaq

A Decoupled Execution Paradigm for Data-Intensive High-End Computing

Yong Chen, Chao Chen, Xian-He Sun, William D. Gropp, Rajeev Thakur
2012 2012 IEEE International Conference on Cluster Computing  
In this study, we propose a decoupled execution paradigm (DEP) to address the challenging I/O bottleneck issues.  ...  A bstract-High-end computing (HEC) applications in critical areas of science and technology tend to be more and more data intensive.  ...  The application is executed in a decoupled but fundamentally more efficient manner for data-intensive HEC with the collective support from data processing nodes and computing nodes.  ... 
doi:10.1109/cluster.2012.80 dblp:conf/cluster/ChenCSGT12 fatcat:m7znse6itjf53jowvtyld7ljca

Omniscient debugging for executable DSLs

Erwan Bousse, Dorian Leroy, Benoit Combemale, Manuel Wimmer, Benoit Baudry
2018 Journal of Systems and Software  
A generic solution must: support a wide range of executable DSLs independently of the metaprogramming approaches used for implementing their semantics; be efficient for good responsiveness.  ...  As compared to a solution that copies the model at each step, it is on average six times more efficient in memory, and at least 2.2 faster when exploring past execution states, while only slowing down  ...  To answer RQ #3, we observe that our approach is more efficient in memory than a clone-based approach. RQ #4: Efficiency in time for exploring past states.  ... 
doi:10.1016/j.jss.2017.11.025 fatcat:thudz42e4zhztd2xuip5ijlqny

C++ Patterns: Executing Around Sequences

Kevlin Henney
2000 European Conference on Pattern Languages of Programs  
One such recurring schema, or programming cliché, is the embracing or bracketing of a sequence by a pair of actions, such as resource acquisition and release operations executed around the actual resource  ...  They are connected into a language and further explored through a narrative example.  ...  , briefer approaches.  ... 
dblp:conf/europlop/Henney00 fatcat:virjkwsukzeqfdpxql7t3wvlcq

Transactional WaveCache: Towards Speculative and Out-of-Order DataFlow Execution of Memory Operations [article]

Leandro A. J. Marzulo, Felipe M. G. França, Vítor Santos Costa
2007 arXiv   pre-print
Speedups of 33.1% and 24% were observed on more memory intensive applications, and slowdowns up to 16% arise if memory bandwidth is a bottleneck.  ...  If a hazard is detected in a speculative Wave, all the following Waves (children) are aborted and re-executed. We evaluate the WaveCache on a set artificial benchmarks.  ...  In this work we present and study an even more aggressive approach.  ... 
arXiv:0712.1167v1 fatcat:kxhwpxcjtbbprbptznd6ssmacm

Scalable Execution of Legacy Scientific Codes [chapter]

Joy Mukherjee, Srinidhi Varadarajan, Naren Ramakrishnan
2006 Lecture Notes in Computer Science  
This paper presents Weaves, a language neutral framework for scalable execution of legacy parallel scientific codes.  ...  The more expressive collaborating partial differential equation (PDE) solvers are used to exemplify developmental aspects, while freely available Sweep3D is used for performance results.  ...  Weaves exploits this proximity to manifest efficient state sharing through direct in-memory data accesses within a single address space.  ... 
doi:10.1007/11758501_11 fatcat:tzlyaqklmnfkbfugoznc3zdi6y

Ensemble Toolkit: Scalable and Flexible Execution of Ensembles of Tasks [article]

Vivekanandan Balasubramanian, Antons Treikalis, Ole Weidner and Shantenu Jha
2016 arXiv   pre-print
Ensemble toolkit uses a scalable pilot-based runtime system that decouples workload execution and resource management details from the expression of the application, and enables the efficient and dynamic  ...  execution of ensembles on heterogeneous computing resources.  ...  We also thank Peter Kasson, Thomas Cheatham and Michael Shirts for useful discussion about adaptive execution patterns.  ... 
arXiv:1602.00678v3 fatcat:3vqdrnbqpbfoxiszvdiod5b2yy

VEAL: Virtualized Execution Accelerator for Loops

Nathan Clark, Amir Hormati, Scott Mahlke
2008 2008 International Symposium on Computer Architecture  
We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated  ...  To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators.  ...  Second, the repeating control sequence (instructions) used to configure the accelerator can be stored in a circular buffer, which is more efficient to access than a large instruction cache.  ... 
doi:10.1109/isca.2008.33 dblp:conf/isca/ClarkHM08 fatcat:w2xoaxkptned5gopvbdnc32yie

XOX Fabric: A hybrid approach to blockchain transaction execution [article]

Christian Gorenflo, Lukasz Golab, Srinivasan Keshav
2020 arXiv   pre-print
We therefore propose XOX: a novel two-pronged transaction execution approach that both minimizes invalid transactions in the Fabric blockchain and maximizes concurrent execution.  ...  execution on a subset of network nodes.  ...  CONCLUSION AND FUTURE WORK In this work, we propose a novel hybrid execution model for Hyperledger Fabric consisting of a pre-order and a post-order execution step.  ... 
arXiv:1906.11229v3 fatcat:eqrycygoergtpnd4fxmmrzw4o4
« Previous Showing results 1 — 15 out of 32,882 results