Filters








299 Hits in 7.2 sec

Runtime pointer disambiguation

Péricles Alves, Fabian Gruber, Johannes Doerfert, Alexandros Lamprineas, Tobias Grosser, Fabrice Rastello, Fernando Magno Quintão Pereira
2015 Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2015  
We then produce two versions of a code region: one that is aliasing-free -hence, easy to optimize -and another that is not. Our checks let us safely branch to the optimizable region.  ...  To optimize code effectively, compilers must deal with memory dependencies.  ...  Acknowledgements We thank the OOPSLA referees for many constructive comments and suggestions. The Brazilian side of this cooperation has been sponsored by CNPq, FAPEMIG, CAPES and LG Electronics.  ... 
doi:10.1145/2814270.2814285 dblp:conf/oopsla/AlvesGDLGRP15 fatcat:uaqqeartwjhirmyvwy2sg7rbuq

Runtime pointer disambiguation

Péricles Alves, Fabian Gruber, Johannes Doerfert, Alexandros Lamprineas, Tobias Grosser, Fabrice Rastello, Fernando Magno Quintão Pereira
2015 SIGPLAN notices  
We then produce two versions of a code region: one that is aliasing-free -hence, easy to optimize -and another that is not. Our checks let us safely branch to the optimizable region.  ...  To optimize code effectively, compilers must deal with memory dependencies.  ...  Acknowledgements We thank the OOPSLA referees for many constructive comments and suggestions. The Brazilian side of this cooperation has been sponsored by CNPq, FAPEMIG, CAPES and LG Electronics.  ... 
doi:10.1145/2858965.2814285 fatcat:2xsekwcm4rbcdlnta62w4bxl4i

GPU-accelerated simulations of isolated black holes

Adam G M Lewis, Harald P Pfeiffer
2018 Classical and quantum gravity  
At the highest level we use TLoops, a C++ library of our design, to automatically emit CUDA code equivalent to tensorial expressions written into C++ source using a syntax similar to analytic calculation  ...  Next, we trace out and cache explicit matrix representations of the numerous linear transformations in the SpEC code, which allows these to be performed on the GPU using pre-existing matrix-multiplication  ...  Acknowledgments We thank Nils Deppe and Mark Scheel for helpful discussions. Calculations were performed with the SpEC-code [32].  ... 
doi:10.1088/1361-6382/aab256 fatcat:a47tr4i7hjf2tfszdwxpyippqy

Managing Intervals Efficiently in Object-Relational Databases

Hans-Peter Kriegel, Marco Pötke, Thomas Seidl
2000 Very Large Data Bases Conference  
The height h of the virtual backbone tree corresponds to the current expansion and granularity of the data space but does not depend on n.  ...  By design, the new Relational Interval Tree 1 (RI-tree) employs built-in indexes on an as-they-are basis and is easy to implement.  ...  Therefore the effort of code development and code maintenance is minimal.  ... 
dblp:conf/vldb/KriegelPS00 fatcat:bzdvjjf36baddntyqvtlswflyi

On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia [article]

Nischay Ram Mamidi, Kumar Prasun, Dhruv Saxena, Anil Nemili, Bharatkumar Sharma, S.M. Deshpande
2021 arXiv   pre-print
The programming model CUDA is used to develop the GPU codes. The meshfree solver is based on the least squares kinetic upwind method with entropy variables (q-LSKUM).  ...  The optimised GPU codes are compared with the naive codes, and conclusions are drawn from their performance.  ...  For Fortran, Python, and Julia codes, thread index and block dimensions are used to access values stored in the shared memory.  ... 
arXiv:2108.07031v1 fatcat:riwwhg6yvfchflgavqecfwni2q

Finite Differencing of Computable Expressions

Robert Paige, Shaye Koenig
1982 ACM Transactions on Programming Languages and Systems  
Finite differencing is formally specified in terms of more basic transformations shown to preserve program semantics. Estimates of the speedup that the technique yields are given.  ...  Finite differencing is a program optimization method that generalizes strength reduction, and provides an efficient implementation for a host of program transformations including "iterator inversion."  ...  sorting and was used effectively to obtain a logarithmic speedup of the bankers algorithm (see Appendix A3).  ... 
doi:10.1145/357172.357177 fatcat:dh4gfkkgnnenvmqvspbttjs3ii

Calculation of Stochastic Heating and Emissivity of Cosmic Dust Grains with Optimization for the Intel Many Integrated Core Architecture [article]

Troy A. Porter, Andrey E. Vladimirov
2013 arXiv   pre-print
, and provide code samples and performance benchmarks for each step.  ...  Their absorption of starlight produces emission spectra from the near- to far-infrared, which depends on the sizes and properties of the dust grains, and spectrum of the heating radiation field.  ...  In order to compensate for that, when GCC is used for compiling HEATCODE, the code falls back to using the natural base exponential and logarithmic functions.  ... 
arXiv:1311.4627v1 fatcat:rdvwotcdcbeizjynby5c77oxaa

Reconstructing Hardware Transactional Memory for Workload Optimized Systems [chapter]

Kunal Korgaonkar, Prabhat Jain, Deepak Tomar, Kashyap Garimella, Veezhinathan Kamakoti
2011 Lecture Notes in Computer Science  
We would like to express our thanks to all colleagues who submitted papers and congratulate those whose papers were accepted.  ...  This creates grand challenges to architectural and system designs, as well as to methods of programming these systems, which form the core theme of APPT 2011.  ...  The logarithm algorithm uses an iterative technique that uses table lookup and polynomial approximation as described in [14] .  ... 
doi:10.1007/978-3-642-24151-2_1 fatcat:32cx745cn5cfdm5sbeah6eyiey

Parallel computing in information retrieval – an updated review

A. Macfarlane, S.E. Robertson, J.A. Mccann
1997 Journal of Documentation  
We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems.  ...  Each processor is connected to its north, south, east and west neighbour processors (known as a NEWS grid) and to the row and column of the matrix by a bus system.  ...  We are also grateful to the two anonymous referees who gave us valuable comments on an earlier draft of this paper, improving it considerably.  ... 
doi:10.1108/eum0000000007201 fatcat:2zuwtehixbd6xk33hwb3j43nse

An Efficient Lock-Free Logarithmic Search Data Structure Based on Multi-dimensional List

Deli Zhang, Damian Dechev
2016 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS)  
The use of skiplists eliminates the need of rebalancing and ensures amortized logarithmic sequential search time, but concurrency is limited under write-dominated workload because the linkage between multiple  ...  Logarithmic search data structures, such as search trees and skiplists, are fundamental building blocks of many applications.  ...  Their lock-free skiplist uses rotating wheels instead of the usual towers to improve locality of reference and speedup traversals.  ... 
doi:10.1109/icdcs.2016.19 dblp:conf/icdcs/ZhangD16 fatcat:yjdvc7jrh5c6pg3ojwjdipwsu4

High Performance Computing in Chemistry

Marius Lewerenz, Uwe Harms
1998 Journal of Molecular Modeling  
In view of the fast methodological development, serial and parallel code differ marginally in the actual quantum chemical code while a specialized set of library routines supports maintenance, parallelization  ...  in a single multipole expansion.  ...  Bibliography Acknowledgements The authors like to thank Fred Manby and Peter Knowles for their collaboration and support of this work.  ... 
doi:10.1007/s0089480040147 fatcat:43vbtchwk5dddgu6hoqw4h4ijq

STEPS 4.0: Fast and memory-efficient molecular simulations of neurons at the nanoscale [article]

Weiliang Chen, Tristan Carel, Omar Awile, Nicola Cantarutti, Giacomo Castiglioni, Alessandro Cattabiani, Baudouin Del Marmol, Iain Hepburn, James G King, Christos Kotsalos, Pramod Kumbhar, Jules Lallouette (+3 others)
2022 bioRxiv   pre-print
Recent advances in computational neuroscience have demonstrated the usefulness and importance of stochastic, spatial reaction-diffusion simulations.  ...  Current and future improvements to the solver are not sustainable without proper software engineering practices.  ...  Models for validation and performance investigation presented in this publication are available at https://github.com/CNS-OIST/STEPS4ModelRelease/.  ... 
doi:10.1101/2022.03.28.485880 fatcat:ob2owqnrobhpdmsh7v7mq675yq

Multimedia applications

2004 ChoiceReviews  
We demonstrates it is essential to recognize and parallelize filters with induction variable state to enable scalable parallelization.  ...  Such applications regularly process continuous sequences of data and can be naturally represented under the stream programming domain to take take advantage of domain-specific optimizations.  ...  We added iter() to the IR so uses of it can be type-checked by StreamIt's type checker. The desugaring step (see §3.4) is performed after Graph Expansion and before Scheduling.  ... 
doi:10.5860/choice.42-0991 fatcat:kacttdfzjjbr5da342fqd5da6m

Dynamically transforming data structures

Erik Osterlund, Welf Lowe
2013 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE)  
We demonstrate the effect on performance with a transformation ArrayList data structure using an array variant and a linked hash bag variant as alternative internal representations.  ...  Fine-tuning which data structure implementation to use for a given problem is sometimes tedious work since the optimum solution depends on the context, i.e., on the operation sequences, actual parameters  ...  We also thank the anonymous reviewers who helped making this paper better and added some interesting topics for future work.  ... 
doi:10.1109/ase.2013.6693099 dblp:conf/kbse/OsterlundL13 fatcat:l2q4f3lhm5gbfm3guc535y2cii

Randomized Priority Queues for Fast Parallel Access

Peter Sanders
1998 Journal of Parallel and Distributed Computing  
This is exemplified by an application to parallel branch-and-bound.  ...  Generalizations for accessing the k n smallest elements are even more efficient. A portable implementation using MPI demonstrates that our approach is already useful for medium scale parallelism.  ...  Acknowledgements I would like to thank Gerth Brodal, Sajal Das, Wolfgang Dittrich, Christina Pinotti, Thomas Worsch, and an anonymous referee for many helpful comments.  ... 
doi:10.1006/jpdc.1998.1429 fatcat:22tvzrvy6bag3dzc54k4sfiwjm
« Previous Showing results 1 — 15 out of 299 results