Filters








1,158 Hits in 5.1 sec

A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans J. Boehm, Dhruva R. Chakrabarti
2011 SIGPLAN notices  
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness.  ...  The solution has been implemented in a widely used compiler.  ...  We thank the anonymous POPL referees for their keen and valuable feedback on drafts of the abridged version, which helped improve this work.  ... 
doi:10.1145/1925844.1926457 fatcat:osqppyo7lrfrvf3vfnx6m6uhpa

A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans J. Boehm, Dhruva R. Chakrabarti
2011 Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '11  
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness.  ...  The solution has been implemented in a widely used compiler.  ...  We thank the anonymous POPL referees for their keen and valuable feedback on drafts of the abridged version, which helped improve this work.  ... 
doi:10.1145/1926385.1926457 dblp:conf/popl/JoishaSBBC11 fatcat:sgivwwvzrrcgplicoa572d5r4u

Future wireless convergence platforms

John Glossner, Stamatis Vassiliadis, Mayan Moudgill, Daniel Iancu, Gary Nacer, Sanjay Jintukar, Stuart Stanley, Michael Samori, Tanuj Raja, Michael Schulte
2005 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '05  
The processor is programmed in C with supercomputer-class compiler support for automatic vectorization, multithreading, and DSP semantic analysis.  ...  From a processor architecture perspective, support for signal processing (both audio and video), control code, and Java execution will be required in a convergent device.  ...  The compiler also automatically parallelizes and multithreads programs.  ... 
doi:10.1145/1084834.1084841 dblp:conf/codes/GlossnerMINJSSRSV05 fatcat:6gigbaib6zgqtht3356k4mhqfu

A software-defined communications baseband design

J. Glossner, D. Iancu, Jin Lu, E. Hokenek, M. Moudgill
2003 IEEE Communications Magazine  
Our solution is programmed in C and executed on a multithreaded processor in real-time.  ...  Software Defined Radios (SDRs) offer a programmable and dynamically reconfigurable method of reusing hardware to implement the physical layer processing of multiple communications systems.  ...  Instead, classical DSP architectures have developed a unique set of performance enhancing techniques that are optimized for their intended market.  ... 
doi:10.1109/mcom.2003.1166669 fatcat:ulgis2pdhzd3niw2vwvc3p5spa

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading [chapter]

Sunil Shrestha, Joseph Manzano, Andres Marquez, John Feo, Guang R. Gao
2015 Lecture Notes in Computer Science  
The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression  ...  It takes advantage of polyhedral analysis and transformation in the form of PLUTO[6], combined with a highly optimized fine grain tile runtime to exploit parallelism at all levels.  ...  In these techniques, a single thread effectively maximizes reuse from caches before heading back to the main memory.  ... 
doi:10.1007/978-3-319-17473-0_11 fatcat:z4mrrvucyrecvezcdvdkw4xlrq

A Low-Power Multithreaded Processor for Software Defined Radio

Michael Schulte, John Glossner, Sanjay Jinturkar, Mayan Moudgill, Suman Mamidi, Stamatis Vassiliadis
2006 Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology  
In this paper, we present the design of the Sandblaster Processor, a low-power multithreaded digital signal processor for software defined radio.  ...  Using a super-computer class vectorizing compiler, the SB3010 achieves real-time performance in software on a variety of communication protocols including 802.11b, GPS, AM/FM radio, Bluetooth, GPRS, and  ...  In addition to applying a number of well-know scalar and loop optimizations, the compiler applies DSP optimizations, vector optimizations, and automatic parallel multithreaded optimizations.  ... 
doi:10.1007/s11265-006-7267-1 fatcat:nfqhlyks6bhmnfyg2opfpsm4xy

Performance scalability of decoupled software pipelining

Ram Rangan, Neil Vachharajani, Guilherme Ottoni, David I. August
2008 ACM Transactions on Architecture and Code Optimization (TACO)  
To that end, this paper evaluates the performance scalability of a general-purpose PMT technique called decoupled software pipelining (DSWP) and presents a thorough analysis of the communication bottlenecks  ...  These desirable properties make PMT techniques strong candidates for program parallelization on current and future multicore processors and understanding their performance characteristics is critical to  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their valuable feedback. Their critical comments helped improve the focus and the quality of the presentation.  ... 
doi:10.1145/1400112.1400113 fatcat:5s2b7ckgynecbluwicyajswjay

Dynamic tiling for effective use of shared caches on multithreaded processors

Dimitrios S. Nikolopoulos
2004 International Journal of High Performance Computing and Networking  
The key idea is to use two tile sizes in the program, one for single-threaded execution mode and one suitable for multithreaded execution mode and switch between tile sizes at runtime.  ...  The paper presents an implementation of these transformations along with runtime mechanisms for detecting cache contention between threads and react to it on-the-fly.  ...  Acknowledgement The authors would like to thank the IJHPCN referees for several helpful suggestions. A preliminary version of this work has been published (Nikolopoulos, 2003) .  ... 
doi:10.1504/ijhpcn.2004.009265 fatcat:as5smmoulnaavnchbcokabjbmu

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

Georgios Goumas, Kornilios Kourtis, Nikos Anastopoulos, Vasileios Karakasis, Nectarios Koziris
2008 Journal of Supercomputing  
Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.  ...  However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization  ...  Acknowledgements This research is supported by the PENED 2003 Project (EPAN), co-funded by the European Social Fund (80%) and National Resources (20%).  ... 
doi:10.1007/s11227-008-0251-8 fatcat:xp4boqj3f5c3xozcdlu4slwrrq

Performance Optimization for Android Applications on x86 [chapter]

Ryan Cohen, Tao Wang
2014 Android Application Development for the Intel® Platform  
Automatic Optimization by the Compiler Modern compilers can automatically complete the most common code optimizations, and this is the preferred way to optimize.  ...  Similarly, a search can be done on a large space using the hash method and thereby eliminate the need for a comparison operation. Performance optimization is based on various techniques.  ... 
doi:10.1007/978-1-4842-0100-8_11 fatcat:da2oaugapzhxhgyvtc5snpxcx4

Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations

Daniel Langenkämper, Tobias Jakobi, Dustin Feld, Lukas Jelonek, Alexander Goesmann, Tim W. Nattkemper
2016 Frontiers in Genetics  
FIGURE 1 | Multithreading: The 4 k = 256 4-mers (depicted by the numbers 1-256) are processed on a single computer with four processors (depicted by the rectangular boxes at the bottom).  ...  Each processor computes a quarter of all k-mer counts. Frontiers in Genetics | www.frontiersin.org  ...  The automatic optimizer ROCC compiler is immature and rather limited in the code it can transform.  ... 
doi:10.3389/fgene.2016.00005 pmid:26904094 pmcid:PMC4748744 fatcat:lcrj4hxpgnfmdhl23m6rihuj4a

XFOR: Filling the Gap between Automatic Loop Optimization and Peak Performance

Imen Fassi, Philippe Clauss
2015 2015 14th International Symposium on Parallel and Distributed Computing  
We show that such a programming structure allows to fill important optimization gaps remained by automatic loop optimizers.  ...  We highlight five important gaps filled by xfor which are: insufficient data locality optimization, excess of conditional branches in the generated code, too verbose code with too many machine instructions  ...  Finally, the ever evolving hardware complexity and the nature of the codes generated by back-end compilers are also important issues preventing automatic optimizers of being wholly foolproof, since they  ... 
doi:10.1109/ispdc.2015.19 dblp:conf/ispdc/FassiC15 fatcat:zakiivulafbzdim54g2hjypwdu

Dynamic analysis of java applications for multithreaded antipatterns

S. Boroday, A. Petrenko, J. Singh, H. Hallal
2005 Software engineering notes  
We use the tracing platform of the Eclipse IDE and state-of-the-art model checker Spin.  ...  We implement and compare an ad-hoc custom approach and a formal approach to detect common bug patterns in multithreaded Java software.  ...  with a modern optimizing Java compiler.  ... 
doi:10.1145/1082983.1083247 fatcat:gtfxes2365frjoiwag5s2dfeh4

MTCrossBit: A dynamic binary translation system based on multithreaded optimization

HaiBing Guan, RuHui Ma, HongBo Yang, YinDong Yang, Liang Liu, Ying Chen
2011 Science China Information Sciences  
However, almost all the existing dynamic optimization techniques or methods employed in DBT systems for a single-threaded executive environment considerably increase the complexity of the hardware or the  ...  We propose a multithreaded DBT framework with no associated hardware called the MTCross-Bit, where a helper thread for building a hot trace is employed to significantly reduce the overhead.  ...  Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant Nos. 60970108, 60970107), the Science and Technology Commission of Shanghai Municipality (Grant Nos.  ... 
doi:10.1007/s11432-011-4414-5 fatcat:yaazmmdpfbbcbanlh5wh7hxokm

Efficient and thread-safe objects for dynamically-typed languages

Benoit Daloze, Stefan Marr, Daniele Bonetta, Hanspeter Mössenböck
2016 Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2016  
We would like to thank Kevin Menard, Chris Seaton and Andreas Wöß for their careful reviews. Stefan Marr was funded by a grant of the Austrian Science Fund (FWF), project number I2491-N31.  ...  Acknowledgments We gratefully acknowledge the support of the Virtual Machine Research Group at Oracle Labs, the Institute for System Software at JKU Linz and everyone else who has contributed to Graal  ...  It is based on the notion of self-optimizing AST interpreters that can be compiled to highly-optimized machine code by the Graal compiler [34] , which uses partial evaluation.  ... 
doi:10.1145/2983990.2984001 dblp:conf/oopsla/DalozeMBM16 fatcat:225ctir33jcf3p3x2hp5bsny7m
« Previous Showing results 1 — 15 out of 1,158 results