Filters








34,724 Hits in 4.3 sec

Compiler directed memory management policy for numerical programs

Mohammad Malkawi, Janek Patel
1985 Proceedings of the tenth ACM symposium on Operating systems principles - SOSP '85  
A Compiler Directed Memory Management Policy for numerical programs is described in this paper.  ...  Using this information, the compiler can insert some directives into the operating system for effective management of the memory hierarchy.  ...  The OS uses these directives for memory management purposes. The resulting Compiler Directed Memory Management Policy (CD) works as follows.  ... 
doi:10.1145/323647.323638 dblp:conf/sosp/MalkawiP85 fatcat:zeowi6h3mngpzklqiwobr3pz7y

Energy Management of Virtual Memory on Diskless Devices [chapter]

Jerry Hom, Ulrich Kremer
2003 Compilers and Operating Systems for Low Power  
The compiler activates and deactivates the communication card based on compile-time knowledge of the past and future memory footprint of an application.  ...  Preliminary experiments based on the SimpleScalar simulation toolset and three numerical programs indicate the potential benefits of the new technique. *  ...  In this paper, we investigate the potential benefit of compiler directed resource management for a system resource such as a wireless communication card.  ... 
doi:10.1007/978-1-4419-9292-5_6 fatcat:pupwrrau2zhjtcioue6debdc44

OpenCL vs OpenACC: Lessons from Development of Lattice QCD Simulation Code

H. Matsufuru, S. Aoki, T. Aoyama, K. Kanaya, S. Motoki, Y. Namekawa, H. Nemura, Y. Taniguchi, S. Ueda, N. Ukita
2015 Procedia Computer Science  
OpenCL and OpenACC are generic frameworks for heterogeneous programming using CPU and accelerator devices such as GPUs.  ...  In this paper, we apply these two frameworks to a general-purpose code set for numerical simulations of lattice QCD, which is a computational physics of elementary particles based on the Monte Carlo method  ...  This project is supported by Joint Institute for Computational Fundamental Science and HPCI Strategic Program Field 5 'The origin of matter and the universe'.  ... 
doi:10.1016/j.procs.2015.05.316 fatcat:7wnbfw6rfzddxdcvmjpxjhbbdi

Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture

A. E. Eichenberger, J. K. O'Brien, K. M. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang (+5 others)
2006 IBM Systems Journal  
streaming workloads, a local memory, and a globally coherent DMA (direct memory access) engine.  ...  , for which fast response times and a full-featured programming environment are critical.  ...  The compiler provides user-guided parallelization and compiler management of the underlying memories for code and data.  ... 
doi:10.1147/sj.451.0059 fatcat:x67guy5bpragrl3hookkbsqpn4

A highly flexible, parallel virtual machine: design and experience of ILDJIT

Simone Campanoni, Giovanni Agosta, Stefano Crespi Reghizzi, Andrea Di Biagio
2010 Software, Practice & Experience  
tasks; on the other hand, it provides a flexible, modular and adaptive framework for dynamic code optimization.  ...  Even when running on a single core, the ILDJIT adaptive optimization framework manages to speed up the computation with respect to other open source implementations of ECMA-335. key words: dynamic adaptation  ...  In particular we would like to credit Michele Tartara for ARM support, and Ettore Speziale for his contribution to the support of the CIL delegates.  ... 
doi:10.1002/spe.950 fatcat:jmtr5keofjacdbtyqrqczvshku

Programming the FlexRAM parallel intelligent memory system

Basilio B. Fraguela, Jose Renau, Paul Feautrier, David Padua, Josep Torrellas
2003 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03  
To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex.  ...  Such directives enable the processors in memory to execute the program in cooperation with the main processor.  ...  RELATED WORK Some proposals for programming intelligent memories [6, 10, 15] force the programmer to directly manage low-level operations such as communication via messages, cache management, data layout  ... 
doi:10.1145/781503.781505 fatcat:hojojore6faexgo3qca2lex2oq

Programming the FlexRAM parallel intelligent memory system

Basilio B. Fraguela, Jose Renau, Paul Feautrier, David Padua, Josep Torrellas
2003 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03  
To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex.  ...  Such directives enable the processors in memory to execute the program in cooperation with the main processor.  ...  RELATED WORK Some proposals for programming intelligent memories [6, 10, 15] force the programmer to directly manage low-level operations such as communication via messages, cache management, data layout  ... 
doi:10.1145/781498.781505 dblp:conf/ppopp/FraguelaRFPT03 fatcat:n35j6qdrabcqplksdsivwgwhxy

Programming the FlexRAM parallel intelligent memory system

Basilio B. Fraguela, Jose Renau, Paul Feautrier, David Padua, Josep Torrellas
2003 SIGPLAN notices  
To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex.  ...  Such directives enable the processors in memory to execute the program in cooperation with the main processor.  ...  RELATED WORK Some proposals for programming intelligent memories [6, 10, 15] force the programmer to directly manage low-level operations such as communication via messages, cache management, data layout  ... 
doi:10.1145/966049.781505 fatcat:524idgpmr5gxxhdlzahv2dcs6e

Optimizing UPC Programs for Multi-Core Systems

Yili Zheng
2010 Scientific Programming  
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems  ...  Second, we use two numerical computing kernels, parallel matrix–matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for UPC applications.  ...  PGAS programming models provide programming convenience similar to shared-memory programming models and at the same time enable users to manage data locality explicitly.  ... 
doi:10.1155/2010/646829 fatcat:q63ngpj47jblhfzbfcdehsmuyi

Memory Affinity for Hierarchical Shared Memory Multiprocessors

Christiane Pousa Ribeiro, Jean-Francois Mehaut, Alexandre Carissimi, Marcio Castro, Luiz Gustavo Fernandes
2009 2009 21st International Symposium on Computer Architecture and High Performance Computing  
However, most of these solutions did not include optimizations for numerical scientific data (array data structures) and portability issues.  ...  Besides, these solutions provide a restrict set of memory policies to deal with data placement.  ...  It provides a wide set of memory policies to manage data allocation, distribution and access for scientific HPC applications based on shared memory programming model over Linux ccNUMAs.  ... 
doi:10.1109/sbac-pad.2009.16 dblp:conf/sbac-pad/RibeiroMCCF09 fatcat:hfef65acazbnlo3iicepbszopu

Optimizing Compiler for the CELL Processor

A.E. Eichenberger, K. O'Brien, K. O'Brien, Peng Wu, Tong Chen, P.H. Oden, D.A. Prener, J.C. Shepherd, Byoungro So, Z. Sura, A. Wang, Tao Zhang (+2 others)
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
Developed for multimedia and game applications, as well as other numerically intensive workloads, the CELL processor provides support both for highly parallel codes, which have high computation and memory  ...  requirements, and for scalar codes, which require fast response time and a full-featured programming environment.  ...  In our approach, we attempt to abstract the concept of separate memories by allocating SPE program data in system memory and having the compiler automatically manage the movement of this data be-tween  ... 
doi:10.1109/pact.2005.33 dblp:conf/IEEEpact/EichenbergerOOWCOPSSSWZZG05 fatcat:e54xideiibbn7dp5tqsh34oxgq

Lightweight module isolation for sensor nodes

Nirmal Weerasinghe, Geoff Coulson
2008 Proceedings of the First Workshop on Virtualization in Mobile Computing - MobiVirt '08  
In conventional systems, isolation is achieved using standard memory management hardware; but this is not a cost-effective or energy-efficient solution for small, cheap embedded nodes.  ...  This is achieved by frontloading effort into offline compilation phases and leaving only a small amount of work to be done at load time and run time.  ...  for memory management and calling across protection domains. 2.  ... 
doi:10.1145/1622103.1629655 dblp:conf/mobisys/WeerasingheC08 fatcat:y43emd2j65f5vkdh7m57mouwb4

TARGETING HETEROGENEOUS ARCHITECTURES VIA MACRO DATA FLOW

M. ALDINUCCI, M. DANELUTTO, P. KILPATRICK, M. TORQUATI
2012 Parallel Processing Letters  
We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs.  ...  We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming  ...  Therefore, efficient caching policies may be implemented to avoid unnecessary traffic on the PCIe bus moving data to and from GPU memory.  ... 
doi:10.1142/s0129626412400063 fatcat:5pb4rp6xrzh4tf4pdzeg7obkby

Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control

Sergei Morozov, Sudhanshu Mathur
2011 Computational Economics  
This complication makes the problem a suitable target for massively-parallel computation using graphics processors (GPUs).  ...  between controlling the policy target and learning system parameters.  ...  Managing memory hierarchy is another key to high performance.  ... 
doi:10.1007/s10614-011-9297-4 fatcat:q3eemudykjfc7ie2ulqafizeay

Static Compilation Analysis for Host-Accelerator Communication Optimization [chapter]

Mehdi Amini, Fabien Coelho, François Irigoin, Ronan Keryell
2013 Lecture Notes in Computer Science  
We present an automatic, static program transformation that schedules and generates ecient memory transfers between a computer host and its hardware accelerator, addressing a well-known performance bottleneck  ...  We implemented this transformation as a middle-end compilation pass in the pips/Par4All compiler.  ...  Acknowledgments We are grateful to Béatrice Creusillet, Pierre Jouvelot, and Eugene Ressler for their numerous comments and suggestions which helped us improve our presentation, to Dominique Aubert who  ... 
doi:10.1007/978-3-642-36036-7_16 fatcat:jpftk6kotjbgtnq2pux3ep7wly
« Previous Showing results 1 — 15 out of 34,724 results