Filters








216 Hits in 6.8 sec

Can traditional programming bridge the ninja performance gap for parallel computing applications?

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Hideki Saito, Rakesh Krishnaiyer, Mikhail Smelyanskiy, Milind Girkar, Pradeep Dubey
2015 Communications of the ACM  
Large, pervasive restructurings that change how a program computes its result are outside of the purview of a traditional compiler.  ...  for modern computers.  ... 
doi:10.1145/2742910 fatcat:prco3b3iifbp7hk4s2emvtd47e

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Hideki Saito, Rakesh Krishnaiyer, Mikhail Smelyanskiy, Milind Girkar, Pradeep Dubey
2012 SIGARCH Computer Architecture News  
We first quantify the extent of the "Ninja gap", which is the performance gap between naively written C/C++ code that is parallelism unaware (often serial) and best-optimized code on modern multi-/many-core  ...  manycore architectures in delivering significant speedup, and close-tooptimal performance for commonly used parallel computing workloads.  ...  In this paper, we aim at quantifying the extent of the Ninja gap, analyzing the causes of the gap and investigating how much of the gap can be bridged with low effort using traditional C/C++ programming  ... 
doi:10.1145/2366231.2337210 fatcat:dljosjl2arhrhm3dlgpcm4rome

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Hideki Saito, Rakesh Krishnaiyer, Mikhail Smelyanskiy, Milind Girkar, Pradeep Dubey
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
We first quantify the extent of the "Ninja gap", which is the performance gap between naively written C/C++ code that is parallelism unaware (often serial) and best-optimized code on modern multi-/many-core  ...  manycore architectures in delivering significant speedup, and close-tooptimal performance for commonly used parallel computing workloads.  ...  In this paper, we aim at quantifying the extent of the Ninja gap, analyzing the causes of the gap and investigating how much of the gap can be bridged with low effort using traditional C/C++ programming  ... 
doi:10.1109/isca.2012.6237038 dblp:conf/isca/SatishKCSKSGD12 fatcat:fdby4hegwreirnzu7hubzv5puy

Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism

Mojtaba Mehrara, Po-Chun Hsu, Mehrzad Samadi, Scott Mahlke
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
In this work, to exploit hardware concurrency while retaining traditional sequential programming model, we develop ParaScript, an automatic runtime parallelization system for JavaScript applications on  ...  Furthermore, considering the wide-spread deployment of multicores in today's computing systems, exploiting parallelism in these applications is a promising approach to meet their performance requirement  ...  This research was supported by the National Science Foundation grant CNS-0964478 and the Gigascale Systems Research Center, one of five research centers funded under the Focus Center Research Program,  ... 
doi:10.1109/hpca.2011.5749719 dblp:conf/hpca/MehraraHSM11 fatcat:coksrw5225g5hpther2lpqajsi

HiPC 2021 Workshop on Parallel Programming in the Exascale Era (PPEE 2021)

Vivek Kumar, Swarnendu Biswas, Vishwesh Jatala
2021 2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)  
These systems are aimed to solve problems that were previously out of reach and to improve the parallel performance of applications by a factor of 50x.  ...  The four critical challenges for exascale systems are extreme parallelism, power demand, data movement, and reliability.  ...  While the applicability of such techniques can be broadened, a key problem for Exascale computing to succeed is the need to ensure that loop structures and operations -order of loops, tiling, unrolling  ... 
doi:10.1109/hipcw54834.2021.00014 fatcat:pgtv7sjr7jewnjtgxg5j36n4li

On the Programmability and Performance of Heterogeneous Platforms

Konstantinos Krommydas, Thomas R.W. Scogland, Wu-Chun Feng
2013 2013 International Conference on Parallel and Distributed Systems  
In this paper, we characterize the performance achievable across a range of optimizations, along with their programmability, for multi-and many-core platforms -specifically, an Intel Sandy Bridge CPU,  ...  General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization  ...  ACKNOWLEDGMENTS This work was supported in part by the Institute for Critical Technology and Applied Science (ICTAS) and by the National Science Foundation under Grant No. 0916719.  ... 
doi:10.1109/icpads.2013.41 dblp:conf/icpads/KrommydasSF13 fatcat:otrka733cze7fdal55cvmnqsxa

Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures

Mikhail Smelyanskiy, Jason Sewall, Dhiraj D. Kalamkar, Nadathur Satish, Pradeep Dubey, Nikita Astafiev, Ilya Burylov, Andrey Nikolaev, Sergey Maidanov, Shuo Li, Sunil Kulkarni, Charles H. Finan (+1 others)
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
The wide applicability of these models, their computational intensity, and their real-time constraints require high-throughput parallel architectures.  ...  We characterize and compare our workload's performance on two modern, parallel architectures: the Intel R Xeon R Processor E5-2680, and the recently announced Intel R Xeon Phi TM 1 (formerly codenamed  ...  On average, the Ninja gap is 1.9x for SNB-EP and 4x for KNC.  ... 
doi:10.1109/sc.companion.2012.139 dblp:conf/sc/SmelyanskiySKSDABNMLKFG12 fatcat:wi4nkhlfnffhtm7hcbueszzvs4

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

Kaixi Hou, Hao Wang, Wu-chun Feng
2014 2014 43rd International Conference on Parallel Processing Workshops  
However, for data-intensive applications, the bandwidth constraint of MIC hinders the full utilization of computational resources, especially when massive parallelism is required to process big data sets  ...  Our study offers evidence that traditional compiler optimizations can deliver parallel programmability to the masses on the Intel Xeon Phi platform.  ...  This provides a positive evidence of using "simple" techniques to fill the Ninja gap [3] , which is defined as the performance gap between the code generated by the expert programmers (prefer to re-design  ... 
doi:10.1109/icppw.2014.44 dblp:conf/icppw/HouWF14 fatcat:5rn33ekzvncyrpubcrokyy2kpa

Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications

Ala Al-Fuqaha, Mohsen Guizani, Mehdi Mohammadi, Mohammed Aledhari, Moussa Ayyash
2015 IEEE Communications Surveys and Tutorials  
The overall picture of IoT emphasizing the vertical markets and the horizontal integration between them.  ...  energy of the device to communicate with other devices and integrate the required services.  ...  the users while bridging the gap between the divergent IoT protocols and performing opportunistic traffic analytics.  ... 
doi:10.1109/comst.2015.2444095 fatcat:5z42s2bulrhilenkh7vztnorfa

A study of mobile device utilization

Cao Gao, Anthony Gutierrez, Madhav Rajan, Ronald G. Dreslinski, Trevor Mudge, Carole-Jean Wu
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
In this paper, we investigate whether the same is true for current mobile applications. We analyze the behavior of a broad range of commonly used mobile applications on real devices.  ...  Further analysis of TLP behavior and big-little core energy efficiency suggests that current mobile workloads can benefit from an architecture that has the flexibility to accommodate both high performance  ...  with the flexiblity to satisfy both high performance and good energy-efficiency for different program phases is a good choice for mobile devices.  ... 
doi:10.1109/ispass.2015.7095808 dblp:conf/ispass/GaoGRDMW15 fatcat:kmrqax7rlvexbniyii36wl5aq4

JavaGrande— High Performance Computing with Java [chapter]

Michael Philippsen, Ronald F. Boisvert, Valdimir S. Getov, Roldan Pozo, Josè Moreira, Dennis Gannon, Geoffrey C. Fox
2001 Lecture Notes in Computer Science  
Why isn't Java commonly used for the compute-or I/O-intensive core of grande applications? The main reason is undoubtedly performance.  ...  Today nearly every JVM for traditional computing devices uses just-in-time (JIT) compiler technology.  ...  A word of thanks to Sun Microsystems, especially to Sia Zadeh, for nancial and other support.  ... 
doi:10.1007/3-540-70734-4_5 fatcat:ye2zflsahbeklexukw5riw6pyq

Evaluating automatically parallelized versions of the support vector machine

Valeriu Codreanu, Bob Dröge, David Williams, Burhan Yasar, Po Yang, Baoquan Liu, Feng Dong, Olarik Surinta, Lambert R.B. Schomaker, Jos B.T.M. Roerdink, Marco A. Wiering
2014 Concurrency and Computation  
In contrast to other SVM optimization techniques, the gradient-ascent algorithm can work on all data in parallel, allowing for a large computational performance gain when executed on GPU devices.  ...  Each new hardware generation increases the GPU-CPU performance gap, especially with regard to single-precision floating point operations [6] . This can also be observed in Figure 1 .  ...  This gap is referred to as the ninja performance gap and is identified in [30] . Several possible solutions have been proposed for bridging this gap and hence for making parallel programming easier.  ... 
doi:10.1002/cpe.3413 fatcat:yb5gskvpnbguni7ydycu5nxxfy

Achieving Robust, Scalable Cluster I/O in Java [chapter]

Matt Welsh, David Culler
2000 Lecture Notes in Computer Science  
We demonstrate the applicability of Tigris through a one-pass, parallel, disk-to-disk sort exhibiting high performance.  ...  We present Tigris, a high-performance computation and I/O substrate for clusters of workstations that is implemented entirely in Java.  ...  Java has proven to be a viable platform for constructing such applications; what remains now is to bridge the gap between application demands and the mechanisms provided the underlying platform.  ... 
doi:10.1007/3-540-40889-4_2 fatcat:k2ewe36mkvc4fk2aawu7en62de

Weaving a Formal Methods Education with Problem-Based Learning [chapter]

J Paul Gibson
2008 Communications in Computer and Information Science  
In fact, the best computing problems can be used with children (young and old), undergraduates and postgraduates.  ...  In this paper we present a process for weaving formal methods through a University curriculum that is founded on the application of problem-based learning and a library of good software engineering problems  ...  Formal Methods: Learning Objectives Through our formal methods problems we can verify our high level objective of helping students to bridge the gap between computer science and software engineering by  ... 
doi:10.1007/978-3-540-88479-8_32 fatcat:lvn6aczmafdpzmsebbo54kx4w4

The Three Pillars of Machine Programming [article]

Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B Tenenbaum, Tim Mattson
2021 arXiv   pre-print
Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces.  ...  In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research.  ...  More recently, Halide has demonstrated the potential of this approach to bridge the "ninja-gap" by generating code that significantly outperforms expert-tuned codes with only a small amount of high-level  ... 
arXiv:1803.07244v3 fatcat:omlg3emt3fd7ricvr25erjei4i
« Previous Showing results 1 — 15 out of 216 results