A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
A central component of training in Reinforcement Learning (RL) is Experience: the data used for training. The mechanisms used to generate and consume this data have an important effect on the performance of RL algorithms. In this paper, we introduce Reverb: an efficient, extensible, and easy to use system designed specifically for experience replay in RL. Reverb is designed to work efficiently in distributed configurations with up to thousands of concurrent clients. The flexible API providesarXiv:2102.04736v1 fatcat:ngrmmrsv2vcivdvvgqxjx5mufi
more »... rs with the tools to easily and accurately configure the replay buffer. It includes strategies for selecting and removing elements from the buffer, as well as options for controlling the ratio between sampled and inserted elements. This paper presents the core design of Reverb, gives examples of how it can be applied, and provides empirical results of Reverb's performance characteristics.
RAMOS ET AL. Figure 14 . 14 Figure 14. Performance comparison of an I/O-intensive MPJ application using Blocking ("block") and Nonblocking ("nbc") collectives Table I . ...doi:10.1002/cpe.3279 fatcat:kkze2rl2zfaebgrjivzr7yxxri
We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not bearXiv:2105.12034v1 fatcat:xndutr66kjd5dpa7trwl5xenh4
more »... ecessary. To tackle this mostly ignored problem, we propose a number of possible proxies to the external reward. We evaluate them in an extensive empirical study (more than 10'000 agents across 9 environments) and make practical recommendations for selecting HPs. Our results show that while imitation learning algorithms are sensitive to HP choices, it is often possible to select good enough HPs through a proxy to the reward function.
The performance and scalability of communications are key for High Performance Computing (HPC) applications in the current multi-core era. Despite the significant benefits (e.g., productivity, portability, multithreading) of Java for parallel programming, its poor communications support has hindered its adoption in the HPC community. This paper presents FastMPJ, an efficient Message-Passing in Java (MPJ) library, boosting Java for HPC by: (1) providing high-performance shared memorydoi:10.1007/s10586-014-0345-4 fatcat:jxpqqaj3frbrfendjhmims72ty
more »... ns using Java threads; (2) taking full advantage of high-speed cluster networks (e.g., InfiniBand) to provide low-latency and high bandwidth communications; (3) including a scalable collective library with topology aware primitives, automatically selected at runtime; (4) avoiding Java data buffering overheads through zero-copy protocols; and (5) implementing the most widely extended MPI-like Java bindings for a highly productive development. The comprehensive performance evaluation on representative testbeds (InfiniBand, 10 Gigabit Ethernet, Myrinet, and shared memory systems) has shown that FastMPJ com-
This paper presents a scalable and efficient Message-Passing in Java (MPJ) collective communication library for parallel computing on multi-core architectures. The continuous increase in the number of cores per processor underscores the need for scalable parallel solutions. Moreover, current system deployments are usually multi-core clusters, a hybrid shared/distributed memory architecture which increases the complexity of communication protocols. Here, Java represents an attractive choice fordoi:10.1007/s11227-010-0464-5 fatcat:tjl7kfhidreunirkac2xohvsk4
more »... he development of communication middleware for these systems, as it provides built-in networking and multithreading support. As the gap between Java and compiled languages performance has been narrowing for the last years, Java is an emerging option for High Performance Computing (HPC). Our MPJ collective communication library increases Java HPC applications performance on multi-core clusters: (1) providing multi-core aware collective primitives; (2) implementing several algorithms (up to six) per collective operation, whereas publicly available MPJ libraries are usually restricted to one algorithm; (3) analyzing the efficiency of thread-based collective operations; (4) selecting at runtime the most efficient algorithm depending on the specific multi-core system architecture, and the number of cores and message length involved in the collective operation; (5) supporting the automatic performance tuning of the collectives depending on the system and communication parameters; and (6) allowing its integration in any MPJ implementation as it is based on MPJ point-to-point primitives. A performance eval-uation on an InfiniBand and Gigabit Ethernet multi-core cluster has shown that the implemented collectives significantly outperform the original ones, as well as higher speedups when analyzing the impact of their use on collective communications intensive Java HPC applications. Finally, the presented library has been successfully integrated in MPJ Express (http://mpj-express.org), and will be distributed with the next release. Keywords Message-passing in Java (MPJ) · Multi-core clusters · Scalable collective communication · High performance computing · Performance evaluation Introduction Java is the leading programming language both in academia and industry environments, and it is an alternative for High Performance Computing (HPC)  due to its appealing characteristics: built-in networking and multithreading support, object orientation, automatic memory management, platform independence, portability, security, an extensive API and a wide community of developers, and besides it is the main training language for computer science students. Moreover, performance is no longer an obstacle. Java, in its early days, was severely criticized for its poor computational performance, reported to be within a factor of four of the equivalent Fortran code in  . However, currently, thanks to advances in JVMs and Just-In-Time (JIT) compilation, which are able to generate native executable code from the platform independent bytecode, Java performance is around a 30% slower on average than natively compiled languages (e.g., C and Fortran), according to  and . This relatively low overhead trades off for the interesting features of Java. However, although this performance gap is relatively small, it can be particularly high for communicationintensive parallel applications when relying on poorly scalable Java communication libraries, which has hindered Java adoption for HPC. Thus, this paper presents a more scalable collectives communication library. Message-passing is the most widely used parallel programming paradigm as it is highly portable, scalable and usually provides good performance. It is the preferred choice for parallel programming distributed memory systems such as multi-core clusters, currently the most popular system deployments due to their scalability, flexibility and interesting cost/performance ratio. Here, Java represents an attractive alternative to languages traditionally used in HPC, such as C or Fortran together with their MPI bindings, for the development of applications for these systems as it provides builtin networking and multithreading support, key features for taking full advantage of hybrid shared/distributed memory architectures. Thus, Java can use threads in shared memory (intra-node) and its networking support for distributed memory (inter-node) communications. The increasing number of cores per system demands efficient and scalable message-passing communication middleware. However, up to now Message-Passing in Java (MPJ) implementations have been focused on providing production-quality implementations of the full MPJ specification, rather than concentrate on developing scalable collective communications. MPJ application developers use collective primitives for performing standard data movements (e.g., broadcast, scatter and gather)
The scalability of High Performance Computing (HPC) applications depends heavily on the efficient support of network communications in virtualized environments. However, Infrastructure as a Service (IaaS) providers are more focused on deploying systems with higher computational power interconnected via high-speed networks rather than improving the scalability of the communication middleware. This paper analyzes the main performance bottlenecks in HPC applications scalability on Amazon EC2doi:10.1016/j.future.2012.06.009 fatcat:jtyrscbdhfaxvmeu7xfqgmcqvm
more »... r Compute platform: (1) evaluating the communication performance on shared memory and a virtualized 10 Gigabit Ethernet network; (2) assessing the scalability of representative HPC codes, the NAS Parallel Benchmarks, using an important number of cores, up to 512; (3) analyzing the new cluster instances (CC2), both in terms of single instance performance, scalability and costefficiency of its use; (4) suggesting techniques for reducing the impact of the virtualization overhead in the scalability of communication-intensive HPC codes, such as the direct access of the Virtual Machine to the network and reducing the number of processes per instance; and (5) proposing the combination of message-passing with multithreading as the most scalable and cost-effective option for running HPC applications on Amazon EC2 Cluster Compute platform.
Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able to efficiently exploit the wholedoi:10.1109/tpds.2015.2460247 fatcat:x4xjzvjmarcwncmjpmq26mlwb4
more »... l capacity of modern clusters that contain both GPUs and Xeon Phi coprocessors. In this paper we investigate approaches to map pairwise epistasic detection on heterogeneous clusters using both types of accelerators. The runtimes to analyze the well-known WTCCC dataset consisting of about 500K SNPs and 5K samples on one and two NVIDIA K20m are reduced by 27% thanks to the use of a hybrid approach with one additional single Xeon Phi coprocessor.
Cloud computing is posing several challenges, such as security, fault tolerance, access interface singularity, and network constraints, both in terms of latency and bandwidth. In this scenario, the performance of communications depends both on the network fabric and its efficient support in virtualized environments, which ultimately determines the overall system performance. To solve the current network constraints in cloud services their providers are deploying high-speed networks, such as 10doi:10.1007/s00779-012-0605-3 fatcat:6izn55mtqrcq5p2akp3qu4obbi
more »... igabit Ethernet. This paper presents an evaluation of High Performance Computing message-passing middleware on a cloud computing infrastructure, Amazon EC2 cluster compute instances, equipped with 10 Gigabit Ethernet. The analysis of the experimental results, confronted with a similar testbed, has shown the significant impact that virtualized environments still have on communications performance, which demands more efficient communication middleware support to get over current cloud network limitations. Cloud computing is a model that enables convenient, on-demand and selfservice access to a shared pool of highly scalable, abstracted infrastructure that hosts applications, which are billed by consumption. This computing paradigm is changing rapidly the way enterprise computing is provisioned and managed, thanks to the commoditization of computing resources (e.g., networks, servers,
Current shared memory systems utilize complex memory hierarchies to maintain scalability when increasing the number of processing units. Although hardware designers aim to hide this complexity from the programmer, ignoring the detailed architectural characteristics can harm performance significantly. We propose to expose the block-based design of caches in parallel computers to middleware designers to allow semi-automatic performance tuning with the systematic translation from algorithms to andoi:10.1145/2749246.2749256 dblp:conf/hpdc/RamosH15 fatcat:ip43wafnwvg6rn2nj32puoauii
more »... nalytic performance model. For this, we design a simple interface for cache line aware (CLa) optimization, a translation methodology, and a full performance model for cache line transfers in ccNUMA systems. Algorithms developed using CLa design perform up to 14x better than vendor and open-source libraries, and 2x better than existing ccNUMA optimizations.
The rising interest in Java for High Performance Computing (HPC) is based on the appealing features of this language for programming multi-core cluster architectures, particularly the built-in networking and multithreading support, and the continuous increase in Java Virtual Machine (JVM) performance. However, its adoption in this area is being delayed by the lack of analysis of the existing programming options in Java for HPC and thorough and up-to-date evaluations of their performance, asdoi:10.1016/j.scico.2011.06.002 fatcat:c4suwxfz45cbdpvfmnzmb3rofm
more »... as the unawareness of the current research projects in this field, whose solutions are needed in order to boost the embracement of Java in HPC. This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and finally, evaluates the performance of current Java HPC solutions and research developments on two shared memory environments and two InfiniBand multi-core clusters. The main conclusions are that: (1) the significant interest in Java for HPC has led to the development of numerous projects, although usually quite modest, which may have prevented a higher development of Java in this field; (2) Java can achieve almost similar performance to natively compiled languages, both for sequential and parallel applications, being an alternative for HPC programming; and (3) the recent advances in the efficient support of Java communications on shared memory and low-latency networks are bridging the gap between Java and natively compiled applications in HPC. Thus, the good prospects of Java in this area are attracting the attention of both industry and academia, which can take significant advantage of Java adoption in HPC. development of parallel applications as it is a multithreaded language and provides built-in networking support, key features for taking full advantage of hybrid shared/distributed memory architectures. Thus, Java can use threads in shared memory (intra-node) and its networking support for distributed memory (inter-node) communication. Nevertheless, although the performance gap between Java and native languages is usually small for sequential applications, it can be particularly high for parallel applications when depending on inefficient communication libraries, which has hindered Java adoption for HPC. Therefore, current research efforts are focused on providing scalable Java communication middleware, especially on high-speed networks commonly used in HPC systems, such as InfiniBand or Myrinet. The remainder of this paper is organized as follows. Section 2 analyzes the existing programming options in Java for HPC. Section 3 describes current research efforts in this area, with special emphasis on providing scalable communication middleware for HPC. A comprehensive performance evaluation of representative solutions in Java for HPC is presented in Section 4. Finally, Section 5 summarizes our concluding remarks. Java for High Performance Computing This section analyzes the existing programming options in Java for HPC, which can be classified into: (1) shared memory programming; (2) Java sockets; (3) Remote Method Invocation (RMI); and (4) Message-passing in Java. These programming options allow the development of both high level libraries and Java parallel applications. Java Shared Memory Programming There are several options for shared memory programming in Java for HPC, such as the use of Java threads, OpenMP-like implementations, and Titanium. As Java has built-in multithreading support, the use of Java threads for parallel programming is quite extended due to its high performance, although it is a rather low-level option for HPC (work parallelization and shared data access synchronization are usually hard to implement). Moreover, this option is limited to shared memory systems, which provide less scalability than distributed memory machines. Nevertheless, its combination with distributed memory programming models can overcome this restriction. Finally, in order to partially relieve programmers from the low-level details of threads programming, Java has incorporated from the 1.5 specification the concurrency utilities, such as thread pools, tasks, blocking queues, and low-level high-performance primitives for advanced concurrent programming like CyclicBarrier. The project Parallel Java (PJ)  has implemented several high level abstractions over these concurrency utilities, such as ParallelRegion (code to be executed in parallel), ParallelTeam (group of threads that execute a ParallelRegion) and ParallelForLoop (work parallelization among threads), allowing an easy thread-base shared memory programming. Moreover, PJ also implements the message-passing paradigm as it is intended for programming hybrid shared/distributed memory systems such as multi-core clusters. There are two main OpenMP-like implementations in Java, JOMP  and JaMP  . JOMP consists of a compiler (written in Java, and built using the JavaCC tool) and a runtime library. The compiler translates Java source code with OpenMP-like directives to Java source code with calls to the runtime library, which in turn uses Java threads to implement parallelism. The whole system is "pure" Java (100% Java), and thus can be run on any JVM. Although the development of this implementation stopped in 2000, it has been used recently to provide nested parallelism on multi-core HPC systems  . Nevertheless, JOMP had to be optimized with some of the utilities of the concurrency framework, such as the replacement of the busy-wait implementation of the JOMP barrier by the more efficient java.util.concurrent.CyclicBarrier. The experimental evaluation of the hybrid Java message-passing + JOMP configuration (being the message-passing library thread-safe) showed up to 3 times higher performance than the equivalent pure message-passing scenario. Although JOMP scalability is limited to shared memory systems, its combination with distributed memory communication libraries (e.g., message-passing libraries) can overcome this issue. JaMP is the Java OpenMP-like implementation for Jackal , a software-based Java Distributed Shared Memory (DSM) implementation. Thus, this project is limited to this environment. JaMP has followed the JOMP approach, but taking advantage of the concurrency utilities, such as tasks, as it is a more recent project. The OpenMP-like approach has several advantages over the use of Java threads, such as the higher level programming model with a code much closer to the sequential version and the exploitation of the familiarity with OpenMP,
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
Ramos thanks the HiPEAC network, the University of A Coruña, the Ministry of Science and Innovation of Spain [Project TIN2010-16735], and the Xunta de Galicia CN2012/211, partially supported by FEDER funds ...doi:10.1145/2493123.2462916 fatcat:5okdd5xclzbozavpsz3yykoud4
Power and energy are critical concerns for high performance computing systems from multiple perspectives, including cost, reliability/resilience and sustainability. At the same time, data locality and the cost of data movement have become dominating concerns in scientific workflows. One potential solution for reducing data movement costs is to use a data analysis pipeline based on insitu data analysis.However, the energy-performance-quality tradeoffs impact of current optimizations and theirdoi:10.1007/s00450-014-0268-6 fatcat:guukjr2c5jfktow545gkydta2u
more »... rheads can be very hard to assess and understand at the application level.In this paper, we focus on exploring performance and power/energy tradeoffs of different data movement strategies and how to balance these tradeoffs with quality of solution and data speculation. Our experimental evaluation provides an empirical evaluation of different system and application configurations that give insights into the energy-performance-quality tradeoffs space for in-situ data-intensive application workflows. The key contribution of this work is a better understanding of the interactions between different computation, data movement, energy, and quality-of-result optimizations from a power-performance perspective, and a basis for modeling and exploiting these interactions.
Cloud computing is currently being explored by the scientific community to assess its suitability for High Performance Computing (HPC) environments. In this novel paradigm, compute and storage resources, as well as applications, can be dynamically provisioned on a pay-per-use basis. This paper presents a thorough evaluation of the I/O storage subsystem using the Amazon EC2 Cluster Compute platform and the recent High I/O instance type, to determine its suitability for I/O-intensivedoi:10.1007/s10723-013-9250-y fatcat:faydnzyldzhkxh7zwfk6mygnem
more »... The evaluation has been carried out at different layers using representative benchmarks in order to evaluate the low-level cloud storage devices available in Amazon EC2, ephemeral disks and Elastic Block Store (EBS) volumes, both on local and distributed file systems. In addition, several I/O interfaces (POSIX, MPI-IO and HDF5) commonly used by scientific workloads have also been assessed. Furthermore, the scalability of a representative parallel I/O code has also been analyzed at the application level, taking into account both performance and cost metrics. The analysis of the experimental results has shown that available cloud storage devices can have different performance characteristics and usage constraints. Our comprehensive evaluation can help scientists to increase significantly (up to several times) the performance of I/O-intensive applications in Amazon EC2 cloud. An example of optimal configuration that can maximize I/O performance in this cloud is the use of a RAID 0 of 2 ephemeral disks, TCP with 9,000 bytes MTU, NFS async and MPI-IO on the High I/O instance type, which provides ephemeral disks backed by Solid State Drive (SSD) technology.
Papeles de Son Armadans
de la tierra, saber captar eso que hace siglos el hombre conoce y hoy se llama angustia vital, Esa angustia que recorre las salas vacías del hidalgo pazo abandonado, cuando ante él cruza en silencio Sabela ... Nos hace falta un ramo y una «cunca» de vino. Y he aquí a Cunqueiro Mora, que nos trae la silvestre ofrenda de las «silveiras» con la dádiva pródiga de su primer apellido. ...
Grande do Sul) Marc Delbarge (Lessius University College, Antwerp ; College of Europe, Brugges) Sandor Albert (Université de Szeged) Icíar Alonso-Araguás (Universidad de Salamanca) Margarita Alonso Ramos ... Dunne (Kent State University) Roch Duval (Université de Montréal) Domenyk Eades (University of Salford) Álvaro Echeverri (Université de Montréal) Maureen Ehrensberger-Dow (Zürcher Fachhochschule) Sabela ...doi:10.7202/1021219ar fatcat:4vckqhg3lbckrhkn7mfip5zd4q
« Previous Showing results 1 — 15 out of 91 results