Filters








96 Hits in 7.7 sec

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels [article]

Fernando Endo and Damien Couroussé and Henri-Pierre Charles
2017 arXiv   pre-print
This allows auto-tuning to pay off in very short-running applications.  ...  of machine code generation.  ...  Acknowledgments This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program Investissement d'avenir.  ... 
arXiv:1707.04566v1 fatcat:sdtgqm6iv5ekzmxnuxtuvveisq

Pushing the Limits of Online Auto-Tuning: Machine Code Optimization in Short-Running Kernels

Fernando Endo, Damien Courousse, Henri-Pierre Charles
2016 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC)  
ACKNOWLEDGMENTS This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program Investissement d'avenir.  ...  To the best of our knowledge, this work is the first to propose an approach of online auto-tuning that can obtain speedups in short-running kernel-based applications.  ...  This allows auto-tuning to be successfully employed in very short-running kernels, thanks to the low runtime code generation overhead.  ... 
doi:10.1109/mcsoc.2016.11 dblp:conf/mcsoc/EndoCC16 fatcat:c36q7x4kwncxxhqku33f5zhbzy

Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science [article]

Grigori Fursin and Abdul Memon and Christophe Guillon and Anton Lokhmotov
2015 arXiv   pre-print
Nowadays, engineers have to develop software often without even knowing which hardware it will eventually run on in numerous mobile phones, tablets, desktops, laptops, data centers, supercomputers and  ...  Unfortunately, optimizing compilers are not keeping pace with ever increasing complexity of computer systems anymore and may produce severely underperforming executable codes while wasting expensive resources  ...  Can afford auto-tuning? Ask for new resources yes no no yes Solution 6 Solution 7 Can afford auto-tuning? yes Dataset 2 ?  ... 
arXiv:1506.06256v1 fatcat:l3xrlisen5gglk5jtiusfx67dy

GASSER: An Auto-Tunable System for General Sliding-Window Streaming Operators on GPUs

Tiziano De Matteis, Gabriele Mencagli, Daniele De Sensi, Massimo Torquati, Marco Danelutto
2019 IEEE Access  
Furthermore, Gasser provides an auto-tuning approach able to automatically find the optimal value of the configuration parameters (i.e., batch length and the degree of parallelism) needed to optimize throughput  ...  Today's stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to scale out on large clusters of commodity machines.  ...  In the second part, we show the accuracy of the auto-tuning model and we analyze in detail the two auto-tuning strategies discussed in Section IV-B and the hard/soft reconfiguration protocols.  ... 
doi:10.1109/access.2019.2910312 fatcat:l377xhz3ybg4ray3qbwt47teii

Multi-dimensional intra-tile parallelization for memory-starved stencil computations [article]

Tareq Malas, Georg Hager, Hatem Ltaief, David Keyes
2015 arXiv   pre-print
Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades.  ...  It is thus well suited for future architectures that will be strongly challenged by the cost of data movement, be it in terms of performance or energy consumption.  ...  ACKNOWLEDGMENTS For computer time, this research used the resources of the Extreme Computing Research Center (ECRC) at KAUST. The authors thank the ECRC for supporting T. Malas.  ... 
arXiv:1510.04995v1 fatcat:twbfi3zicbe7bdu3hgn7d37h7q

The BLIS Framework

Field G. Van Zee, Vernon Austel, John A. Gunnels, Lee Killough, Tyler M. Smith, Bryan Marker, Tze Meng Low, Robert A. Van De Geijn, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler
2016 ACM Transactions on Mathematical Software  
The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures.  ...  We show how, with very little effort, the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).  ... 
doi:10.1145/2755561 fatcat:yrv7amzpyvexdiimqutxtij5zm

Hardware and Software Solutions for Energy-Efficient Computing in Scientific Programming

Daniele D'Agostino, Ivan Merelli, Marco Aldinucci, Daniele Cesini, Cristian Mateos
2021 Scientific Programming  
programming because the local computational capabilities are typically limited and require a careful evaluation of power consumption.  ...  is currently inverting this trend due to the huge amount of data it generates, pushing computing power back to places where the data are generated—the so-called fog/edge computing.  ...  Focusing on the Energy toolbox, it analyses projects available in an online repository (e.g., GitHub) on the machine running the Docker container with regard to its energy efficiency. is means it finds  ... 
doi:10.1155/2021/5514284 fatcat:xcnglwhhabcylokuyknabd2oyu

The database architectures research group at CWI

Martin Kersten, Stefan Manegold, Sjoerd Mullender
2012 SIGMOD record  
Ongoing work aims at combining adaptive indexing techniques with the ideas of physical design and auto-tuning tools.  ...  Most likely, we can re-use many of the techniques developed in the context of MonetDB/XQuery, in particular run-time query optimization [12] .  ... 
doi:10.1145/2094114.2094124 fatcat:ytc5c2o2rzegtmktclkkxeagqa

Proactive Control of Approximate Programs

Xin Sui, Andrew Lenharth, Donald S. Fussell, Keshav Pingali
2016 SIGPLAN notices  
to determine, for a desired level of approximation, knob settings that optimize metrics such as running time or energy usage.  ...  components in the program.  ...  Auto-tuning Auto-tuning explores a space of exact implementations to optimize a cost metric like running time; in contrast, the control problem defined in this paper deals with both error and cost dimensions  ... 
doi:10.1145/2954679.2872402 fatcat:tm2mzk4g55c53ghl3f3hngti4e

Proactive Control of Approximate Programs

Xin Sui, Andrew Lenharth, Donald S. Fussell, Keshav Pingali
2016 Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '16  
to determine, for a desired level of approximation, knob settings that optimize metrics such as running time or energy usage.  ...  components in the program.  ...  Auto-tuning Auto-tuning explores a space of exact implementations to optimize a cost metric like running time; in contrast, the control problem defined in this paper deals with both error and cost dimensions  ... 
doi:10.1145/2872362.2872402 dblp:conf/asplos/SuiLFP16 fatcat:ecfjykfjszc4rdgbkeei2jz7qq

Proactive Control of Approximate Programs

Xin Sui, Andrew Lenharth, Donald S. Fussell, Keshav Pingali
2016 SIGARCH Computer Architecture News  
to determine, for a desired level of approximation, knob settings that optimize metrics such as running time or energy usage.  ...  components in the program.  ...  Auto-tuning Auto-tuning explores a space of exact implementations to optimize a cost metric like running time; in contrast, the control problem defined in this paper deals with both error and cost dimensions  ... 
doi:10.1145/2980024.2872402 fatcat:n7b3d6e6ezgvlfek6fifywwjhu

Proactive Control of Approximate Programs

Xin Sui, Andrew Lenharth, Donald S. Fussell, Keshav Pingali
2016 ACM SIGOPS Operating Systems Review  
to determine, for a desired level of approximation, knob settings that optimize metrics such as running time or energy usage.  ...  components in the program.  ...  Auto-tuning Auto-tuning explores a space of exact implementations to optimize a cost metric like running time; in contrast, the control problem defined in this paper deals with both error and cost dimensions  ... 
doi:10.1145/2954680.2872402 fatcat:yyurdnqlxngbzhx7wf324refuq

D9.2.2: Final Software Evaluation Report

Jose Carlos, Guillaume Colin de Verdière, Matthieu Hautreux, Giannis Koutsou
2012 Zenodo  
The characteristics of these prototypes were selected in order to allow investigation into a number of key aspects relevant to high performance computing, namely interconnects, I/O, energy efficiency and  ...  This deliverable reports on the latest software developments in high performance computing, as identified by the PRACE-1IP, WP9 members.  ...  Thus an auto-tune tool may be interesting to use during the runtime of the application, to find the optimal block size. 4.  ... 
doi:10.5281/zenodo.6553027 fatcat:6vbrtqizm5eutmmskf44eltoqq

KIST

Rob Jansen, Matthew Traudt, John Geddes, Chris Wacek, Micah Sherr, Paul Syverson
2018 ACM Transactions on Privacy and Security  
Our analysis indicates that congestion occurs almost exclusively inside of the kernel egress socket buffers, dwarfing the Tor and the kernel ingress buffer times.  ...  Tor [14] is the most popular overlay network for communicating anonymously online.  ...  We thank Andrea Shepard for implementing an initial version of Tor's socket scheduling code.  ... 
doi:10.1145/3278121 fatcat:f5bwgv2lqnh6fg3c7hrad4qkle

Software challenges in extreme scale systems

Vivek Sarkar, William Harrod, Allan E Snavely
2009 Journal of Physics, Conference Series  
the fastest possible implementations of circuits such as adders with limited fan-in blocks (known as the Kogge-Stone adder).  ...  processor which could run in both SIMD and MIMD modes.  ...  Global Auto-tuning and Dynamic Optimizations Compilers will play an important role in auto-tuning and dynamic optimizations, systematically applying the set of optimizations described in this section to  ... 
doi:10.1088/1742-6596/180/1/012045 fatcat:iukutry2dvbitfdh6ng7kgz564
« Previous Showing results 1 — 15 out of 96 results