7,866 Hits in 8.3 sec

Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors

D.A. Penry, D. Fay, D. Hodgdon, R. Wells, G. Schelle, D.I. August, D. Connors
The Twelfth International Symposium on High-Performance Computer Architecture, 2006.  
This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models.  ...  CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model.  ...  We would also like to acknowledge Krista Marks and Glenn Steiner of Xilinx Corporation for the donation of FPGA prototyping equipment and software.  ... 
doi:10.1109/hpca.2006.1598110 dblp:conf/hpca/PenryFHWSAC06 fatcat:5zwiqfb4bjgprcr6flvbtoetqa

High-Performance Embedded Architecture and Compilation Roadmap [chapter]

Koen De Bosschere, Wayne Luk, Xavier Martorell, Nacho Navarro, Mike O'Boyle, Dionisios Pnevmatikatos, Alex Ramirez, Pascal Sainrat, André Seznec, Per Stenström, Olivier Temam
2007 Lecture Notes in Computer Science  
The roadmap details several of the key challenges that need to be tackled in the coming decade, in order to achieve scalable performance in multi-core systems, and in order to make them a practical mainstream  ...  It concisely describes the key research challenges ahead of us and it will be used to steer the HiPEAC research efforts.  ...  In the multi-core roadmap, the processor becomes the functional unit, and just like floating-point units were added to singlecore processors to accelerate scientific computations, special-purpose computing  ... 
doi:10.1007/978-3-540-71528-3_2 fatcat:ywmebvj7wrfb3ojghsjs4w3fy4

A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators

Rainer Buchty, Vincent Heuveline, Wolfgang Karl, Jan-Philipp Weiss
2011 Concurrency and Computation  
In particular, we characterize the discrepancy to conventional parallel platforms with respect to hierarchical memory sub-systems, fine-grained parallelism on several system levels, and chip-and system-level  ...  This problem is impaired by increasing heterogeneity of hardware platforms on both, processor level, and by adding dedicated accelerators.  ...  Acknowledgements The Shared Research Group 16-1 received financial support by the Concept for the Future of Karlsruhe Institute of Technology in the framework of the German Excellence Initiative and the  ... 
doi:10.1002/cpe.1904 fatcat:fwg2vjaobral3b2v46vq4x2c3q

81.6 GOPS Object Recognition Processor Based on a Memory-Centric NoC

Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, Se-Joong Lee, Hoi-Jun Yoo
2009 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Based on an analysis of the target application, the chip architecture and hardware features are decided. The proposed processor aims to support both task-level and data-level parallelism.  ...  Ten processing elements are integrated for the task-level parallelism and single instruction multiple data (SIMD) instruction is added to exploit the data-level parallelism.  ...  Meanwhile, a single instruction is applied repeatedly to the all pixels in each Gaussian filtering task, which can be accelerated by an SIMD structure exploiting data-level parallelism.  ... 
doi:10.1109/tvlsi.2008.2011226 fatcat:3q5aphon3vgpbfi5oi6lsqvl4u

Heterogeneous Multi-core Architectures

Tulika Mitra
2015 IPSJ Transactions on System LSI Design Methodology  
In this context, heterogeneous multi-core architectures combining functionality and performance-wise divergent mix of processing cores (CPU, GPU, special-purpose accelerators, and reconfigurable computing  ...  This article presents an overview of the state-of-the-art in heterogeneous multi-core landscape.  ...  The cores can operate in coupled mode when they act as a VLIW processor that will help exploit the hybrid forms of parallelism found in the code.  ... 
doi:10.2197/ipsjtsldm.8.51 fatcat:wgiuptlmvvgnhdt2bjrcio6oqi

Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting

Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, Luca Benini
2012 Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5  
Modern system-on-chips are evolving towards complex and heterogeneous platforms with general purpose processors coupled with massively parallel manycore accelerator fabrics (e.g. embedded GPUs).  ...  More specifically, QEMU runs on the host CPU and the simulation of manycore accelerators is offloaded, through semi-hosting, to the host GPU.  ...  It uses semihosting technique to interface between QEMU and manycore simulator by offloading the parallel section of the program to the co-processor simulator.  ... 
doi:10.1145/2159430.2159442 dblp:conf/asplos/RaghavMPARB12 fatcat:57upavxshfdv3a4ozy4uf5izsu

Designing Application-Specific Heterogeneous Architectures from Performance Models

Thanh Cong, Francois Charot
2019 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)  
These models are implemented on FPGAs to take advantage of their parallelism and speed up the simulation when architecture complexity increases.  ...  This approach aims to ease the design of multi-core multi-accelerator architecture, consequently contributes to explore the design space by automating the design steps.  ...  ACKNOWLEDGMENTS The authors would like to thank Bluespec for providing us the Bluespec tools and also Intel Labs for giving us access to a cluster of the integrated BDW/FPGAs, within IL's vLab academic  ... 
doi:10.1109/mcsoc.2019.00045 dblp:conf/mcsoc/CongC19 fatcat:nxpx56t2yna7rhpev4qhutu6ei

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Nachiket Kapre, Andre DeHon
2009 2009 International Conference on Field Programmable Logic and Applications  
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel  ...  Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in  ...  SPICE does not parallelize easily on conventional processors due to the irregular structure of the computation, limited peak floating-point capacities and scarce memory bandwidth.  ... 
doi:10.1109/fpl.2009.5272548 dblp:conf/fpl/KapreD09 fatcat:jvbqmtrwlva25fqnnomv7f42lu

Accelerating large-scale DEVS-based simulation on the cell processor

Qi Liu, Gabriel Wainer
2010 Proceedings of the 2010 Spring Simulation Multiconference on - SpringSim '10  
By taking a performance-centered approach, the technique allows for exploitation of multi-dimensional parallelism to overcome the bottlenecks in the simulation process.  ...  Our preliminary experiments have already produced promising results, accelerating the baseline PPE-only simulation of a fire model and a flood model by a factor of up to 70.6 and 83.32 respectively.  ...  New techniques are required to exploit multi-dimensional parallelism in large-scale PDES on the Cell processor.  ... 
doi:10.1145/1878537.1878667 fatcat:eexfdkiqnzd6povb3yw3zvdm24

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor [chapter]

Liu Peng, Guangming Tan, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta, Dongrui Fan, Ninghui Sun
2011 Lecture Notes in Computer Science  
; (3) an on-chip locality-aware parallel algorithm to enhance data reuse.  ...  We propose three incremental optimizations: (1) a divide-andconquer algorithm adaptive to on-chip memory; (2) a novel data-layout to re-organize linked-list cell data structures to improve data locality  ...  To maximally exploit parallelism in a multi-core cluster, our EDC-STEP-HCD scheme has employed a multi-level parallelization strategy [2, 3] .  ... 
doi:10.1007/978-3-642-21878-1_43 fatcat:ypkg7p44tjehxpteza63b34vxy

Some essential techniques for developing efficient petascale applications

L V Kalé
2008 Journal of Physics, Conference Series  
Multiple PetaFLOPS class machines will appear during the coming year, and many multi-PetaFLOPS machines are on the anvil.  ...  I will review a set of techniques that have proved useful in my work on multiple parallel applications that have scaled to tens of thousands of processors, on machines like Blue Gene/L, Blue Gene/P, Cray  ...  The third trend is the development and incorporation of specialized accelerator chips and accelerator-like features in mainstream chips.  ... 
doi:10.1088/1742-6596/125/1/012036 fatcat:i57rn27hjnf7vontusg47p74km

Performance analysis and optimization of molecular dynamics simulation onGodson-Tmany-core processor

Liu Peng, Aiichiro Nakano, Guangming Tan, Priya Vashishta, Dongrui Fan, Hao Zhang, Rajiv K. Kalia, Fenglong Song
2011 Proceedings of the 8th ACM International Conference on Computing Frontiers - CF '11  
This paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T -like many-core architecture.  ...  Then we propose three incremental optimization strategies: (1) a novel data-layout to re-organize linked-list cell data structures to improve data locality; (2) an on-chip locality-aware parallel algorithm  ...  To maximally exploit parallelism in a multi-core cluster, our EDC-STEP-HCD scheme has employed a multi-level parallelization strategy, that is, inter-node parallelism using spatial decomposition and onchip  ... 
doi:10.1145/2016604.2016643 dblp:conf/cf/PengNTVFZKS11 fatcat:rdfduwfryrffvb7jpzfic6qhnu

The SARC Architecture

Alex Ramirez, Felipe Cabarcas, Ben Juurlink, Mauricio Alvarez Mesa, Friman Sanchez, Arnaldo Azevedo, Cor Meenderinck, Catalin Ciobanu, Sebastian Isaza, Gerogi Gaydadjiev
2010 IEEE Micro  
This issue is attributable to the use of inadequate parallel programming abstractions and the lack of runtime support to manage and exploit parallelism.  ...  the SARC architecture's potential for a broad range of parallel computing scenarios, and its performance scalability to hundreds of on-chip processors.  ...  Acknowledgments We thank the rest of the team that developed the TaskSim  ... 
doi:10.1109/mm.2010.79 fatcat:xle4zkaarnbdvlryyq7f674544

Hardware accelerators for biocomputing: A survey

Souradip Sarkar, Turbo Majumder, Ananth Kalyanaraman, Partha Pratim Pande
2010 Proceedings of 2010 IEEE International Symposium on Circuits and Systems  
Various hardware platforms, such as FPGA, Graphics Processing Unit (GPU), the Cell Broadband Engine (CBE) and multi-core processors are being explored.  ...  An emerging area is the investigation of hardware accelerators for speeding up the massive scale of computation needed in large-scale biocomputing applications.  ...  An effective way to address this would be to integrate huge number of PEs on a single chip for exploiting the massive scale of fine-grain parallelism inherent in bioinformatics applications.  ... 
doi:10.1109/iscas.2010.5537736 dblp:conf/iscas/SarkarMKP10 fatcat:abxldburrvcgxe2rs4uhgqr2ka

Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations

Qi Liu, Gabriel Wainer
2010 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation  
We propose a computing technique for efficient parallel simulation of compute-intensive DEVS models on the IBM Cell processor, combining multi-grained parallelism and various optimizations to speed up  ...  Together, the parallelization and optimization strategies produced promising experimental results, accelerating the simulation of a 3D environmental model by a factor of up to 33.06.  ...  Nevertheless, new parallelization strategies are still needed in order to exploit multi-grained parallelism for PDES systems on the Cell processor.  ... 
doi:10.1109/pads.2010.5471652 dblp:conf/pads/LiuW10 fatcat:azanuptxwrcgxiktzukp6ix4ya
« Previous Showing results 1 — 15 out of 7,866 results