Filters








26,482 Hits in 6.1 sec

Common runtime support for high-performance parallel languages

1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing - Supercomputing '93  
The runtime system must provide support for processes to locate data in the distributed address space and to manage the local memory.  ...  language and compiler support for parallel machines.  ... 
doi:10.1145/169627.169826 dblp:conf/sc/FoxRSMBCCCCEFFGHKKLLLOPPSSWY93 fatcat:uashkzwp65goxdjpj6jim4pjqy

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony, Zbigniew Chamski, Diego Novillo, Davide Del Vento
2014 Scientific Programming  
spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material.  ...  Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications  ...  space for a given program or distribute tuning of a default compiler optimization heuristic across many machines using a set of shared benchmarks.  ... 
doi:10.1155/2014/797348 fatcat:yeo2tkbskbgdxa3tfg6zumvnqm

Charlotte: Metacomputing on the Web

A. Baratloo, M. Karaul, Z.M. Kedem, P. Wijckoff
1999 Future generations computer systems  
Charlotte provides distributed shared memory without relying on operating system or compiler support.  ...  We have designed and implemented Charlotte which goes beyond providing a set of features commonly used for a network of workstations: (1) a user can execute a parallel program on a machine she does not  ...  A novel technique for providing a distributed shared memory abstraction without relying on operating system or compiler support.  ... 
doi:10.1016/s0167-739x(99)00009-6 fatcat:v4cjsc7gwfeo7gj6qvhz3ymvzu

Is OpenMP for grids ?

R. Eigenmann, J. Hoeflinger, R.H. Kuhn, D. Padua, A. Basumallik, Seung-Jai Min, Jiajing Zhu
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
The first part of the paper discusses a prototype compiler, now under development, that will accept OpenMP and will target TreadMarks, a Software Distributed Shared Memory System (SDSM), and Message-Passing  ...  This paper presents an overview of an ongoing NSFsponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory  ...  to the helpers could only include addresses from the explicitly allocated shared address space.  ... 
doi:10.1109/ipdps.2002.1016571 dblp:conf/ipps/EigenmannHKPBMZ02 fatcat:sklzgnldmbfktosjv4c7rr6vwi

Exploiting and-or parallelism in Prolog: The OASys computational model and abstract architecture

I Vlahavas
1998 Journal of Systems and Software  
OASys is an experimental parallel Prolog system that exploits and~or-parallelism and comprises a computational model, a compiler, an abstract machine and an emulator.  ...  It is based on distributed scheduling and supports recomputation of paths as well as stack copying. The system features modular design, high distribution and minimal inter-processor communication.  ...  Benis and C. Berberidis for their help in the prototype and the collection of perfonnance data. Many thanks to Rang Yang for the help with Andorra-I.  ... 
doi:10.1016/s0164-1212(98)10021-3 fatcat:kxeznivlw5dwxcc3r3zvmi4kce

Productivity and performance using partitioned global address space languages

Katherine Yelick, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, Tong Wen, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta (+4 others)
2007 Proceedings of the 2007 international workshop on Parallel symbolic computation - PASCO '07  
Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing.  ...  The result is portable highperformance compilers that run on a large variety of shared and distributed memory multiprocessors.  ...  On a shared memory machine, accesses to the global address space translate into conventional load/store instructions, while on distributed memory machines, they translate into calls to the GASNet layer  ... 
doi:10.1145/1278177.1278183 dblp:conf/issac/YelickBCCDDGHHHIKNSWW07 fatcat:hpedjb24vvfkbpi7fbawt6xf4u

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

Mingjie Lin, Ilia Lebedev, John Wawrzynek
2010 2010 International Conference on Field Programmable Logic and Applications  
To this end, we present a combination of low-level virtual machine instruction set, execution model, many-core architecture, and associated compiler to achieve high performance and power efficiency by  ...  exploiting the FPGA's distributed memories and abundant hardware structures (such as DSP blocks, long carry-chains, and registers).  ...  Local memory is a section of the address space shared by the threads within a computing core.  ... 
doi:10.1109/fpl.2010.93 dblp:conf/fpl/LinLW10 fatcat:2gqc62hvpbe5jczac44zrtgwum

A New Compiler for Space-Time Scheduling of ILP Processors

Rajendra Kumar, P. K. Singh
2011 International Journal of Computer and Electrical Engineering  
The code generation for parallel register share architecture involves some issues that are not present in sequential code compilation and is inherently complex.  ...  In this paper, we propose a compiler RPCC for general purpose sequential programs on the raw machine.  ...  A programming model is defined by Inthreads, which share the context of threads to the maximal possible extent including most of the architectural registers and memory address space.  ... 
doi:10.7763/ijcee.2011.v3.375 fatcat:zkukpxggjvbv5nhacioojp6uma

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning [article]

Grigori Fursin
2013 arXiv   pre-print
learning based meta compiler, and unified statistical analysis and machine learning plugins in a public repository to initiate systematic, reproducible and collaborative research, development and experimentation  ...  with unified web interfaces and on-line advise system.  ...  Grigori is grateful to Francois Bodin and CAPS Entreprise for sharing codelets from the MILEPOST project and for providing an access to the latest Codelet Finder tool, to David Kuck and David Wong from  ... 
arXiv:1308.2410v1 fatcat:hhlua3tdx5hy5gac4p3igwucja

The Promise of High-Performance Reconfigurable Computing

T. El-Ghazawi, E. El-Araby, Miaoqing Huang, K. Gaj, V. Kindratenko, D. Buell
2008 Computer  
In this work, we propose unified parallel programming models for HPRCs based on the Unified Parallel C programming language (UPC).  ...  For domain scientists who lack the hardware design experience, programming these machines is near impossible.  ...  A number of threads work independently and each of them can reference any address in the shared space, and also its own private space.  ... 
doi:10.1109/mc.2008.65 fatcat:6mwj5aeb6zczdd5it5rwngfwpu

Effective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications [article]

Kamran Idrees, Tobias Fuchs, Colin W. Glass
2016 arXiv   pre-print
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory  ...  The technique of units mapping is generic and can be be adopted in other DART communication substrates and on other hardware platforms.  ...  Acknowledgments This work was supported by the project DASH which is funded by the German Research Foundation (DFG) under the priority program "Software for Exascale Computing -SPPEXA" (2013.  ... 
arXiv:1603.01536v1 fatcat:wjezgownqfcr3a2iw2axeydd7y

Optimizing UPC Programs for Multi-Core Systems

Yili Zheng
2010 Scientific Programming  
The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems  ...  Second, we use two numerical computing kernels, parallel matrix–matrix multiplication and parallel 3-D FFT, to demonstrate the end-to-end development and optimization for UPC applications.  ...  Casting shared pointers to local pointers The global address space in UPC is partitioned across all threads and each shared datum in the global address space has unique affinity to one thread.  ... 
doi:10.1155/2010/646829 fatcat:q63ngpj47jblhfzbfcdehsmuyi

A Service-Oriented Virtual Machine for Grid Applications

Hong Liu, Wei Li, Xiaoning Wang, Yili Gong, Tian Luo
2006 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06)  
It also virtualizes services and creates a virtual global system image for grid applications, thus services can be transparently distributed and shared.  ...  Grid computing is a new paradigm for distributed computing, and service has become building block of grid applications.  ...  In conventional systems, code and data sharing techniques such as shared library, components, and shared objects, greatly ease the software developing.  ... 
doi:10.1109/pdcat.2006.18 dblp:conf/pdcat/LiuLWGL06 fatcat:cxwjkwmj5jdftmksgtp5xlrygm

Combining Static and Dynamic Data Coalescing in Unified Parallel C

Michail Alvanos, Montse Farreras, Ettore Tiotto, Jose Nelson Amaral, Xavier Martorell
2016 IEEE Transactions on Parallel and Distributed Systems  
This paper addresses important limitations in the code generation for Partitioned Global Address Space (PGAS) languages.  ...  Significant progress has been made in the development of programming languages and tools that are suitable for hybrid computer architectures that group several shared-memory multicores interconnected through  ...  The inspector-executor strategy is a well-known optimization technique for global name space programs for distributed execution and it has been used [18] for global-address-space language, or language-targeted  ... 
doi:10.1109/tpds.2015.2405551 fatcat:isr4fuw6nvfpzfo4abngauwame

Scalable Dynamic Load Balancing Using UPC

Stephen Olivier, Jan Prins
2008 2008 37th International Conference on Parallel Processing  
Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [1] and MPI [2].  ...  However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of onesided reads and writes to minimize the impact of high latency communication  ...  Acknowledgment The authors thank the Renaissance Computing Institute for the use of the Kitty Hawk cluster and the University of North Carolina for the use of the Topsail cluster and the SGI Altix.  ... 
doi:10.1109/icpp.2008.19 dblp:conf/icpp/OlivierP08 fatcat:wgivv2ozofgjlm6fvlgkuxrqvm
« Previous Showing results 1 — 15 out of 26,482 results