8,714 Hits in 3.7 sec

A Domain-Specific On-Chip Network Design for Large Scale Cache Systems

Yuho Jin, Eun Jung Kim, Ki Hwan Yum
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
However, using a general on-chip network for a specific domain may cause underutilization of the network resources and huge network delays because the interconnects are not optimized for the domain.  ...  Motivated by our observations, we investigate how to optimize cache operations and design the network in large scale cache systems.  ...  Concluding Remarks We have presented in this paper a domain-specific on-chip network design for large scale L2 cache systems.  ... 
doi:10.1109/hpca.2007.346209 dblp:conf/hpca/JinKY07 fatcat:uqppu4bfl5dy5agnjkckigrh6m

Piton: A Manycore Processor for Multitenant Clouds

Michael McKeown, Yaosheng Fu, Tri Nguyen, Yanqi Zhou, Jonathan Balkind, Alexey Lavrov, Mohammad Shahrad, Samuel Payne, David Wentzlaff
2017 IEEE Micro  
It is designed not only as a single chip, but also as a large, scalable system of up to 8, 192 Piton chips (204,800 cores) connected together.  ...  The conventional architecture of such large-scale systems is relatively hierarchical and rigid, with boundaries between chips, boards, nodes, and racks, and it follows a scale-out approach.  ... 
doi:10.1109/mm.2017.36 fatcat:4wc35ryzufbk3br3xc4fglz374

Design and Analysis of On-Chip Networks for Large-Scale Cache Systems

Yuho Jin, Eun Jung Kim, Ki Hwan Yum
2010 IEEE transactions on computers  
Index Terms-On-chip interconnection networks, nonuniform cache architecture, domain-specific design.  ...  Motivated by our observations, we investigate both router architecture and network topology for communication behaviors in large-scale cache systems.  ...  A preliminary version of this paper [33] was presented at the 13th International Symposium on High-Performance Computer Architecture (HPCA-13), February 2007.  ... 
doi:10.1109/tc.2009.130 fatcat:g2t3w5uenjhrvcr3uqmt53jgje

Extreme-scale computer architecture: Energy efficiency from the ground up

Josep Torrellas
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014  
To construct such a chip, we need to rethink the whole compute stack from the ground up for energy efficiency -and attain Extreme-Scale Computing.  ...  Moreover, we also need techniques to reduce the leakage of on-chip memories and to lower the voltage guardbands of logic.  ...  Minimizing Energy in the On-Chip Network The on-chip interconnection network in a large chip is another significant source of energy consumption.  ... 
doi:10.7873/date.2014.213 dblp:conf/date/Torrellas14 fatcat:ddx74wgoajgvvdq24jt2uv24ny

Abstract Machine Models and Proxy Architectures for Exascale Computing

J.A. Ang, R.F. Barrett, R.E. Benner, D. Burke, C. Chan, J. Cook, D. Donofrio, S.D. Hammond, K.S. Hemmert, S.M. Kelly, H. Le, V.J. Leung (+6 others)
2014 2014 Hardware-Software Co-Design for High Performance Computing  
The most significant consequence of this assertion is the impact on the scientific applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain  ...  knowledge and refinements for contemporary computer systems.  ...  For large parallel systems, the inter-node network is the dominant factor in determining how well an application will scale.  ... 
doi:10.1109/co-hpc.2014.4 dblp:conf/sc/AngBBBCCDHHKLLR14 fatcat:sot6sfvdhbcwfbspps77auhwum

Extreme-scale computer architecture

Josep Torrellas
2016 National Science Review  
Moreover, we also need techniques to reduce the leakage of on-chip memories and to lower the voltage guardbands of logic.  ...  Hence, we need to design it from the ground up for energy efficiency. First of all, we want to operate at low voltage, since this is the point of maximum energy efficiency.  ...  Minimizing Energy in the On-Chip Network The on-chip interconnection network in a large chip is another significant source of energy consumption.  ... 
doi:10.1093/nsr/nwv085 fatcat:myxfovlfhfeonfcpxpmlr2vmuy


Rafael K. V. Maeda, Peng Yang, Xiaowen Wu, Zhe Wang, Jiang Xu, Zhehui Wang, Haoran Li, Luan H. K. Duong, Zhifei Wang
2016 Proceedings of the 1st International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems - AISTECS '16  
for complex systems evaluation.  ...  While there exist quite a few multiprocessor simulators available, they often rely on individual input specification, demanding extensive input enumeration and simulation runs, diminishing their effectiveness  ...  Also, Ruby is coupled with a domain specific language to implement cache cohere coherence protocols called SLICC (Specification Language for Implementing Cache Coherence), providing an outstanding environment  ... 
doi:10.1145/2857058.2857066 dblp:conf/hipeac/MaedaYWW0WLDW16 fatcat:ogxjh6ztovh2zbsycxt76dnctq

Editorial: Networks on chips

D. Bertozzi, K. Goossens
2009 IET Computers & Digital Techniques  
NoCs are already used for Multi-Processor Systems-on-Chip (MPSoC) in the embedded systems domain, where multiple programmable processors are accompanied by large numbers of hardware accelerators.  ...  By distilling the most applicable concepts from this domain and by applying them in a way that suits the constraints of semiconductor design, Networks-on-chip (NoCs) have been proposed as the communication  ...  Non-uniform cache architectures (NUCA) have been proposed as a novel design paradigm for large last-level on-chip caches in order to reduce the effects of wire delays, which significantly limit the performance  ... 
doi:10.1049/iet-cdt.2009.9039 fatcat:u7ijbpjsvnfxzb3rtskmtf7bpa

Cache-Coherent Heterogeneous Multiprocessing as Basis for Streaming Applications [chapter]

Jos van Eijndhoven, Jan Hoogerbrugge, M.N. Jayram, Paul Stravers, Andrei Terechko
2005 Philips Research  
These chips will support the execution of a mix of concurrent applications that are not known in detail at chip design time.  ...  New generation System-on-Chips will be extremely complex devices, composed from complex subsystems, relying on abstraction from implementation details.  ...  This is an important aspect, as for domain-specific SoCs the silicon design cost will not be negligible in comparison with the production cost.  ... 
doi:10.1007/1-4020-3454-7_3 fatcat:f65fpatwlrdipi27dc3ongnxze

High-Performance Energy-Efficient Multicore Embedded Computing

A. Munir, S. Ranka, A. Gordon-Ross
2012 IEEE Transactions on Parallel and Distributed Systems  
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance.  ...  Finally, we present design challenges and future research directions for HPEEC system development.  ...  On a large scale, networked embedded systems can enable HPEC for solving complex large problems traditionally handled only by supercomputers (e.g., climate research, weather forecasting, molecular modeling  ... 
doi:10.1109/tpds.2011.214 fatcat:vagqmojdsjevvc2u2ewqrcjjpq

Exascale Computing Technology Challenges [chapter]

John Shalf, Sudip Dosanjh, John Morrison
2011 Lecture Notes in Computer Science  
This article will describe the technology challenges on the road to exascale, their underlying causes, and their effect on the future of HPC system design.  ...  Consequently computer companies are dramatically increasing on-chip parallelism to improve performance.  ...  Creating a chip that has a large coherence domain and minimal NUMA effects would require a substantial increase in power budget to over-design the on-chip interconnection network.  ... 
doi:10.1007/978-3-642-19328-6_1 fatcat:abivlcxkpzchrn2dgbg5qbzt4e

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards [article]

Rajeev Muralidhar and Renata Borovica-Gajic and Rajkumar Buyya
2020 arXiv   pre-print
We have now entered the era of domain-specific architectures for new workloads like AI and ML.  ...  This survey aims to bring these domains together and is composed of a systematic categorization of key aspects of building energy efficient systems - (a) specification - the ability to precisely specify  ...  SOC and full system simulators Largely, accelerators are integrated with processors on the same chip or on a system-on-chip (SoC).  ... 
arXiv:2007.09976v2 fatcat:enrfj2qgerhyteapwykxcb5pni

Energy-Efficient Computing for Extreme-Scale Science

David Donofrio, Leonid Oliker, John Shalf, Michael F. Wehner, Chris Rowen, Jens Krueger, Shoaib Kamil, Marghoob Mohiyuddin
2009 Computer  
This A many-core processor design for high-performance systems draws from embedded computing's low-power architectures and design processes, providing a radical alternative to cluster solutions.  ...  To test our design philosophy, we chose a truly exascale problem: kilometer-scale models of the global atmosphere system requiring simulations 1,000 times faster than real time.  ...  We thank the Berkeley Wireless Research Center for early and ongoing assistance with the RAMP platform.  ... 
doi:10.1109/mc.2009.353 fatcat:qerwoknemnaivcn2j55l2oc7iu

Emerging Accelerator Platforms for Data Centers

Muhammet Mustafa Ozdal
2018 IEEE design & test  
Today's server architectures are designed considering the needs of a wide range of applications.  ...  For example, a workload that exhibits a regular execution pattern (e.g. a dense linear algebra kernel) may not require the expensive ILP control logic for parallelism.  ...  It is possible to design domain-specific hardware to achieve significant power and performance improvements for specific workloads.  ... 
doi:10.1109/mdat.2017.2779742 fatcat:uf32ils7incthma5kju74wzwr4

EMC2: Extending Magny-Cours coherence for large-scale servers

Alberto Ros, Blas Cuesta, Ricardo Fernandez-Pascual, Maria E. Gomez, Manuel E. Acacio, Antonio Robles, Jose M. Garcia, Jose Duato
2010 2010 International Conference on High Performance Computing  
Evaluation results for up to a 32-node system show how the performance offered by our solution scales with the increment in the number of nodes, enhancing the Probe Filter effectiveness by filtering additional  ...  They include a directory cache (Probe Filter) that increases the scalability of the coherence protocol applied by Opterons, based on coherent HyperTransport interconnect (cHT). cHT limits up to 8 the number  ...  Antonio Robles is taking a sabbatical granted by the Universidad Politcnica de Valencia for updating his teaching and research activities.  ... 
doi:10.1109/hipc.2010.5713176 dblp:conf/hipc/RosCPGARGD10 fatcat:zj767a4rnnccxc543a47icpy4u
« Previous Showing results 1 — 15 out of 8,714 results