Filters








31,823 Hits in 4.6 sec

Shared memory programming for large scale machines

Christopher Barton, CĆlin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, José Nelson Amaral
2006 Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06  
• Because typically the performance of the shared memory programs lags behind and does not scale as well as the performance of MPI codes Programming Language Design and Implementation  ...  Motivation • Large scale machines (such as Blue Gene and large clusters) and parallelism (such as multi-core chips) are becoming ubiquitous • Shared memory programming is accepted as an easier programming  ...  • 64 threads on a Blue Gene/L system Outline • Outline Brief overview of UPC features • The IBM xlupc compiler and run-time system • Brief overview of the Blue Gene/L systemCompiler optimizations •  ... 
doi:10.1145/1133981.1133995 dblp:conf/pldi/BartonCAZFCA06 fatcat:q23ngninufcv5izfloumwr3ruu

KunlunTVM: A Compilation Framework for Kunlun Chip Supporting Both Training and Inference

Jun Zeng, Mingyang Kou, Hailong Yao
2022 Proceedings of the Great Lakes Symposium on VLSI 2022  
This paper presents KunlunTVM, the first end-to-end compiler based on TVM, supporting both training and inference tasks on Kunlun Chip.  ...  With the rapid development of deep learning, training big neural network models demands huge amount of computing power. Therefore, many accelerators are designed to meet the performance requirements.  ...  The memory management algorithm in TVM is not friendly to the hierarchical memory system of Kunlun chip.  ... 
doi:10.1145/3526241.3530316 fatcat:osjrk7bribekzjnxk7zv4q6bee

Overview of the 4S Project

G. Smit, E. Schuler, J. Becker, J. Quevremont, W. Brugger
2005 2005 International Symposium on System-on-Chip  
In this paper an overview of the EU-FP6 "Smart Chips for Smart Surroundings" (4S) [7] project is given.  ...  The overall mission of the 4S project is to define and develop efficient (ultra low-power), flexible, reconfigurable core building blocks, including the supporting tools, for future ambient systems.  ...  In this section we give an overview of the design methodology and of the existing and developed tools. A.  ... 
doi:10.1109/issoc.2005.1595647 dblp:conf/issoc/SmitSBQB05 fatcat:qoe2gnfwnzblxa3qslbvwslhby

Multilevel MPSOC simulation using an MDE approach

Rabie Ben Atitallah, Eric Piel, Smail Niar, Philippe Marquet, Jean-Luc Dekeyser
2007 2007 IEEE International SOC Conference  
In this paper, we first present an efficient Multi-Processor Systems-on-Chip design methodology based on Model-Driven Engineering.  ...  The effectiveness of the methodology is illustrated by the development of an H.263 encoder.  ...  An overview of our compilation chain is available in Fig. 2 . The models in the Y shape compose the high level MP-SoC model.  ... 
doi:10.1109/socc.2007.4545457 dblp:conf/socc/AtitallahPNMD07 fatcat:nbhprcmc4faongolnqxojlkq2q

A Hierarchical Architecture Description for Flexible Multicore System Simulation

Thomas Bruckschloegl, Oliver Oey, Michael Ruckauer, Timo Stripf, Jurgen Becker
2014 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications  
The platform information is provided by an architecture description language designed for the purpose of a flexible system description as well as simulation.  ...  As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task.  ...  , flexible, and high performance systems on chip.  ... 
doi:10.1109/ispa.2014.33 dblp:conf/ispa/BruckschloglORSB14 fatcat:vbqtbjoqcbd4xiiwtsql5v7yqm

Dynamic Co-Processor Architecture for Software Acceleration on CSoCs

Abhishek Mitra, Zhi Guo, Anirban Banerjee, Walid Najjar
2006 Computer Design (ICCD '99), IEEE International Conference on  
The system designer is left with the task of interfacing the IP Cores to the CPU and also for realizing partial reconfiguration across the cores.  ...  By integrating one or more (hard or soft) CPU core on the chip, new generation platform FPGAs have become configurable systems on a chip (CSoC) that support a combined software and hardware execution model  ...  ROCCC Overview An overview of the ROCCC framework is depicted in Figure 3 .  ... 
doi:10.1109/iccd.2006.4380805 dblp:conf/iccd/MitraGBN06 fatcat:6rertl62x5egxced2cgwzcawyu

Reconfigurable computing: its concept and a practical embodiment using newly developed dynamically reconfigurable logic (DRL) LSI

Masakazu Yamashina, Masato Motomura
2000 Proceedings of the 2000 conference on Asia South Pacific design automation - ASP-DAC '00  
This paper first outlines a broad range of reconfigurable computing research activities from a perspective of system LSI designs.  ...  Then, the paper focuses onto dynamically reconfigurable logic (DRL) LSI, a prototype chip that we developed to evaluate the reconfigurable computing concept.  ...  Taro Fujii for their indispensable efforts they have devoted in the DRL prototype LSI development. We would like to appreciate Mr. K. Wakabayashi for the stimulating discussions on RC compilers.  ... 
doi:10.1145/368434.368666 dblp:conf/aspdac/YamashinaM00 fatcat:5ic2zr6xufamfahwwhpeyuyw2q

Memory Architectures for Embedded Systems-On-Chip [chapter]

Preeti Ranjan Panda, Nikil D. Dutt
2002 Lecture Notes in Computer Science  
The memory subsystem will continue to present significant bottlenecks in the design of future embedded systems-on-chip.  ...  In this paper we present an overview of recent research in the area of memory architecture customization for embedded systems.  ...  We first present an overview of different memory architectures used in embedded systems, and then survey some of the ways in which these architectures have been customized.  ... 
doi:10.1007/3-540-36265-7_61 fatcat:mgk3773mmvdqrmgp2r7i2cd6ya

ARCHITECT-R

R. A. Gonçalves, P. A. Moraes, J. M. P. Cardoso, D. F. Wolf, M. M. Fernandes, R. A. F. Romero, E. Marques
2003 Proceedings of the 2003 ACM symposium on Applied computing - SAC '03  
Current approaches often involve the design and implementation of hardwired solutions, with the associated problems of a long development cycle and inflexibility.  ...  An increasing interest in the design of mobile robots has been observed in recent years, which is mainly motivated by technologic al advances that may allow their application to consumer markets, in addition  ...  The design requires 1,723 logic cells (74% of the chip).  ... 
doi:10.1145/952532.952665 dblp:conf/sac/GoncalvesMCWFRM03 fatcat:listfpiarneifgpv3uu7tug2r4

OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers [chapter]

Keiji Kimura, Masayoshi Mase, Hiroki Mikami, Takamichi Miyamoto, Jun Shirako, Hironori Kasahara
2010 Lecture Notes in Computer Science  
In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OS-CAR API is designed on a subset of the OpenMP  ...  Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.  ...  Systems Leading Research."  ... 
doi:10.1007/978-3-642-13374-9_13 fatcat:n75vuldrrzcfnpq76er5pwv6dq

ARCHITECT-R

R. A. Gonçalves, P. A. Moraes, J. M. P. Cardoso, D. F. Wolf, M. M. Fernandes, R. A. F. Romero, E. Marques
2003 Proceedings of the 2003 ACM symposium on Applied computing - SAC '03  
Current approaches often involve the design and implementation of hardwired solutions, with the associated problems of a long development cycle and inflexibility.  ...  An increasing interest in the design of mobile robots has been observed in recent years, which is mainly motivated by technologic al advances that may allow their application to consumer markets, in addition  ...  The design requires 1,723 logic cells (74% of the chip).  ... 
doi:10.1145/952660.952665 fatcat:lvrwtwok7zgnlkd6ahgkzwu6uu

A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules [article]

Xinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou
2021 arXiv   pre-print
One such problem is the multi-chip partitioning problem where compilers determine the optimal partitioning and placement of operations in tensor computation graphs on chiplets in MCMs.  ...  Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip.  ...  ., 2018) , on a real MCM system with 36 chips to demonstrate real system performance. We use a greedy heuristic from the production compiler as the baseline of throughput improvement.  ... 
arXiv:2112.04041v1 fatcat:2m64g7rdabevdpoc4xi6io6fa4

CentOS Linux for the ATLAS MUCTPI Upgrade [article]

R. Spiwoks, A. Armbruster, P. Czodrowski, N. Ellis, P. Farthouat, S. Haas, A. Kulinska, A. Marzin, P. Papageorgiou, T. Pauly, S. Perrella, M. Saimpert (+2 others)
2020 arXiv   pre-print
A System-on-Chip (SoC) is used for the control, configuration and monitoring of the hardware and the operation of the MUCTPI. The SoC consists of an FPGA part and a processor system.  ...  Cross-compilation together with the existing framework for building of the ATLAS trigger and data acquisition (TDAQ) software is being used in order to allow the deployment of the TDAQ software directly  ...  Wittgen, SLAC, USA, for the original idea of cross installing the CentOS root file system, and S. Kolos, University of California Irvine, USA, for providing the ATLAS TDAQ gateway application.  ... 
arXiv:2010.08105v1 fatcat:xzy7yuqluzhwtijgsza45ia3ni

CentOS Linux for the ATLAS MUCTPI Upgrade

R. Spiwoks, A. Armbruster, P. Czodrowski, N. Ellis, P. Farthouat, S. Haas, A. Kulinska, A. Marzin, P. Papageorgiou, T. Pauly, S. Perrella, M. Saimpert (+2 others)
2021 IEEE Transactions on Nuclear Science  
A System-on-Chip (SoC) is used for the control, configuration and monitoring of the hardware and the operation of the MUCTPI. The SoC consists of an FPGA part and a processor system.  ...  Cross-compilation together with the existing framework for building of the ATLAS trigger and data acquisition (TDAQ) software is being used in order to allow the deployment of the TDAQ software directly  ...  Wittgen, SLAC, USA, for the original idea of cross installing the CentOS root file system, and S. Kolos, University of California Irvine, USA, for providing the ATLAS TDAQ gateway application.  ... 
doi:10.1109/tns.2021.3084246 fatcat:otrgfljylrcbpkvnzcvrxqteji

A Survey of Different Approaches for Overcoming the Processor - Memory Bottleneck

Danijela Efnusheva, Ana Cholakoska, Aristotel Tentov
2017 International Journal of Computer Science & Information Technology (IJCSIT)  
The given development of processor's technology has brought performance improvements in computer systems, but not for all the types of applications.  ...  Within this analysis we discuss the advantages, disadvantages and the application (purpose) of several well-known memory-centric systems.  ...  in RОМ outside of chip); uses an optimized C compiler; System Features • Drawbacks: limited amount of memory in the chip (16Mb); slow memory access to the ROM outside of the chip; intended only for  ... 
doi:10.5121/ijcsit.2017.9214 fatcat:u6gztzqgyzam3np5fdyzd2sotu
« Previous Showing results 1 — 15 out of 31,823 results