A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
Shared memory programming for large scale machines
2006
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation - PLDI '06
• Because typically the performance of the shared memory programs lags behind and does not scale as well as the performance of MPI codes Programming Language Design and Implementation ...
Motivation • Large scale machines (such as Blue Gene and large clusters) and parallelism (such as multi-core chips) are becoming ubiquitous • Shared memory programming is accepted as an easier programming ...
• 64 threads on a Blue Gene/L system Outline • Outline Brief overview of UPC features • The IBM xlupc compiler and run-time system • Brief overview of the Blue Gene/L system • Compiler optimizations • ...
doi:10.1145/1133981.1133995
dblp:conf/pldi/BartonCAZFCA06
fatcat:q23ngninufcv5izfloumwr3ruu
KunlunTVM: A Compilation Framework for Kunlun Chip Supporting Both Training and Inference
2022
Proceedings of the Great Lakes Symposium on VLSI 2022
This paper presents KunlunTVM, the first end-to-end compiler based on TVM, supporting both training and inference tasks on Kunlun Chip. ...
With the rapid development of deep learning, training big neural network models demands huge amount of computing power. Therefore, many accelerators are designed to meet the performance requirements. ...
The memory management algorithm in TVM is not friendly to the hierarchical memory system of Kunlun chip. ...
doi:10.1145/3526241.3530316
fatcat:osjrk7bribekzjnxk7zv4q6bee
Overview of the 4S Project
2005
2005 International Symposium on System-on-Chip
In this paper an overview of the EU-FP6 "Smart Chips for Smart Surroundings" (4S) [7] project is given. ...
The overall mission of the 4S project is to define and develop efficient (ultra low-power), flexible, reconfigurable core building blocks, including the supporting tools, for future ambient systems. ...
In this section we give an overview of the design methodology and of the existing and developed tools.
A. ...
doi:10.1109/issoc.2005.1595647
dblp:conf/issoc/SmitSBQB05
fatcat:qoe2gnfwnzblxa3qslbvwslhby
Multilevel MPSOC simulation using an MDE approach
2007
2007 IEEE International SOC Conference
In this paper, we first present an efficient Multi-Processor Systems-on-Chip design methodology based on Model-Driven Engineering. ...
The effectiveness of the methodology is illustrated by the development of an H.263 encoder. ...
An overview of our compilation chain is available in Fig. 2 . The models in the Y shape compose the high level MP-SoC model. ...
doi:10.1109/socc.2007.4545457
dblp:conf/socc/AtitallahPNMD07
fatcat:nbhprcmc4faongolnqxojlkq2q
A Hierarchical Architecture Description for Flexible Multicore System Simulation
2014
2014 IEEE International Symposium on Parallel and Distributed Processing with Applications
The platform information is provided by an architecture description language designed for the purpose of a flexible system description as well as simulation. ...
As processors and systems on chip in the embedded world increasingly become multicore, parallel programming remains a difficult, time-consuming and complicated task. ...
, flexible, and high performance systems on chip. ...
doi:10.1109/ispa.2014.33
dblp:conf/ispa/BruckschloglORSB14
fatcat:vbqtbjoqcbd4xiiwtsql5v7yqm
Dynamic Co-Processor Architecture for Software Acceleration on CSoCs
2006
Computer Design (ICCD '99), IEEE International Conference on
The system designer is left with the task of interfacing the IP Cores to the CPU and also for realizing partial reconfiguration across the cores. ...
By integrating one or more (hard or soft) CPU core on the chip, new generation platform FPGAs have become configurable systems on a chip (CSoC) that support a combined software and hardware execution model ...
ROCCC Overview An overview of the ROCCC framework is depicted in Figure 3 . ...
doi:10.1109/iccd.2006.4380805
dblp:conf/iccd/MitraGBN06
fatcat:6rertl62x5egxced2cgwzcawyu
Reconfigurable computing: its concept and a practical embodiment using newly developed dynamically reconfigurable logic (DRL) LSI
2000
Proceedings of the 2000 conference on Asia South Pacific design automation - ASP-DAC '00
This paper first outlines a broad range of reconfigurable computing research activities from a perspective of system LSI designs. ...
Then, the paper focuses onto dynamically reconfigurable logic (DRL) LSI, a prototype chip that we developed to evaluate the reconfigurable computing concept. ...
Taro Fujii for their indispensable efforts they have devoted in the DRL prototype LSI development. We would like to appreciate Mr. K. Wakabayashi for the stimulating discussions on RC compilers. ...
doi:10.1145/368434.368666
dblp:conf/aspdac/YamashinaM00
fatcat:5ic2zr6xufamfahwwhpeyuyw2q
Memory Architectures for Embedded Systems-On-Chip
[chapter]
2002
Lecture Notes in Computer Science
The memory subsystem will continue to present significant bottlenecks in the design of future embedded systems-on-chip. ...
In this paper we present an overview of recent research in the area of memory architecture customization for embedded systems. ...
We first present an overview of different memory architectures used in embedded systems, and then survey some of the ways in which these architectures have been customized. ...
doi:10.1007/3-540-36265-7_61
fatcat:mgk3773mmvdqrmgp2r7i2cd6ya
ARCHITECT-R
2003
Proceedings of the 2003 ACM symposium on Applied computing - SAC '03
Current approaches often involve the design and implementation of hardwired solutions, with the associated problems of a long development cycle and inflexibility. ...
An increasing interest in the design of mobile robots has been observed in recent years, which is mainly motivated by technologic al advances that may allow their application to consumer markets, in addition ...
The design requires 1,723 logic cells (74% of the chip). ...
doi:10.1145/952532.952665
dblp:conf/sac/GoncalvesMCWFRM03
fatcat:listfpiarneifgpv3uu7tug2r4
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
[chapter]
2010
Lecture Notes in Computer Science
In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OS-CAR API is designed on a subset of the OpenMP ...
Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode. ...
Systems Leading Research." ...
doi:10.1007/978-3-642-13374-9_13
fatcat:n75vuldrrzcfnpq76er5pwv6dq
ARCHITECT-R
2003
Proceedings of the 2003 ACM symposium on Applied computing - SAC '03
Current approaches often involve the design and implementation of hardwired solutions, with the associated problems of a long development cycle and inflexibility. ...
An increasing interest in the design of mobile robots has been observed in recent years, which is mainly motivated by technologic al advances that may allow their application to consumer markets, in addition ...
The design requires 1,723 logic cells (74% of the chip). ...
doi:10.1145/952660.952665
fatcat:lvrwtwok7zgnlkd6ahgkzwu6uu
A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules
[article]
2021
arXiv
pre-print
One such problem is the multi-chip partitioning problem where compilers determine the optimal partitioning and placement of operations in tensor computation graphs on chiplets in MCMs. ...
Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip. ...
., 2018) , on a real MCM system with 36 chips to demonstrate real system performance. We use a greedy heuristic from the production compiler as the baseline of throughput improvement. ...
arXiv:2112.04041v1
fatcat:2m64g7rdabevdpoc4xi6io6fa4
CentOS Linux for the ATLAS MUCTPI Upgrade
[article]
2020
arXiv
pre-print
A System-on-Chip (SoC) is used for the control, configuration and monitoring of the hardware and the operation of the MUCTPI. The SoC consists of an FPGA part and a processor system. ...
Cross-compilation together with the existing framework for building of the ATLAS trigger and data acquisition (TDAQ) software is being used in order to allow the deployment of the TDAQ software directly ...
Wittgen, SLAC, USA, for the original idea of cross installing the CentOS root file system, and S. Kolos, University of California Irvine, USA, for providing the ATLAS TDAQ gateway application. ...
arXiv:2010.08105v1
fatcat:xzy7yuqluzhwtijgsza45ia3ni
CentOS Linux for the ATLAS MUCTPI Upgrade
2021
IEEE Transactions on Nuclear Science
A System-on-Chip (SoC) is used for the control, configuration and monitoring of the hardware and the operation of the MUCTPI. The SoC consists of an FPGA part and a processor system. ...
Cross-compilation together with the existing framework for building of the ATLAS trigger and data acquisition (TDAQ) software is being used in order to allow the deployment of the TDAQ software directly ...
Wittgen, SLAC, USA, for the original idea of cross installing the CentOS root file system, and S. Kolos, University of California Irvine, USA, for providing the ATLAS TDAQ gateway application. ...
doi:10.1109/tns.2021.3084246
fatcat:otrgfljylrcbpkvnzcvrxqteji
A Survey of Different Approaches for Overcoming the Processor - Memory Bottleneck
2017
International Journal of Computer Science & Information Technology (IJCSIT)
The given development of processor's technology has brought performance improvements in computer systems, but not for all the types of applications. ...
Within this analysis we discuss the advantages, disadvantages and the application (purpose) of several well-known memory-centric systems. ...
in RОМ outside of chip); uses an optimized C compiler;
System Features • Drawbacks: limited amount of memory in the chip (16Mb); slow memory access to the ROM outside of the chip; intended only for ...
doi:10.5121/ijcsit.2017.9214
fatcat:u6gztzqgyzam3np5fdyzd2sotu
« Previous
Showing results 1 — 15 out of 31,823 results