1,848 Hits in 6.2 sec

Intel Embedded Hardware Platform [chapter]

Ryan Cohen, Tao Wang
2014 Android Application Development for the Intel® Platform  
Due to the embedded system's specialized features, constrained resources, and integration of hardware and software, you need to understand the working principles and mechanisms of the hardware and hardware  ...  As the world's leader in silicon innovation, Intel has been designing high-performance processors and related hardware for general-purpose computers and embedded systems.  ...  Medfield SoC uses a 32 nm processor; integrates a single-core Intel Atom processor, 512 KB L2 cache, PowerVR SGX540 GPU by Imagination Technologies, and dual-channel LPDDR2 memory controller; and supports  ... 
doi:10.1007/978-1-4842-0100-8_2 fatcat:7edur5ngtvannclprqpceijfgq

HPC Accelerators with 3D Memory

Manuel Ujaldon
2016 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES)  
After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have conquered the top500 and green500 lists, providing us unprecedented levels of computational power and memory  ...  This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory  ...  In contrast, a typical CPU operating on a 4 channel motherboard hardly reaches 64 GB/s (see also Another similarity between memory and processor is the multi-core evolution.  ... 
doi:10.1109/cse-euc-dcabes.2016.203 dblp:conf/cse/Ujaldon16 fatcat:6am4onanlvfsndet4ahjzemlka

A Low-Power Integrated x86–64 and Graphics Processor for Mobile Computing Devices

D. Foley, P. Bansal, D. Cherepacha, R. Wasmuth, A. Gunasekar, S. Gutta, A. Naini
2012 IEEE Journal of Solid-State Circuits  
, a media accelerator, an integrated NorthBridge (NB), integrated DisplayPort, LVDS, and VGA display interfaces, a PCIe ® Gen1 or Gen2 I/O interface, and a single 64-bit memory channel at up to DDR3-1066  ...  on a single die implemented in a 40 nm bulk CMOS process.  ...  FUSION BASICS The traditional model of a processor chip (with integrated NB) coupled with an integrated graphics processor has a number of shortfalls.  ... 
doi:10.1109/jssc.2011.2167776 fatcat:c7lyh6gemfbenokg5ybdnt47la

AMD Fusion APU: Llano

Alexander Branover, Denis Foley, Maurice Steinman
2012 IEEE Micro  
Llano represents the combined effort of many talented AMD engineers across multiple locations in the US, Canada, India, and Germany.  ...  Acknowledgments We thank the remaining authors of the LN APU presentation at Hot Chips: Antonio Asaro (AMD fellow), Greg Smaus (AMD principal member of technical staff), Ljubisa Bajic (senior manager at AMD), and  ...  The Llano variant combines four x86 processor cores, a Unified Video Decoder, an integrated DirectX11 Graphics core, and an integrated two-head display controller.  ... 
doi:10.1109/mm.2012.2 fatcat:t7p6vuydp5grlm3vs2crktxdyi

45-year CPU evolution: one law and two equations [article]

Daniel Etiemble
2018 arXiv   pre-print
Moore's law and two equations allow to explain the main trends of CPU evolution since MOS technologies have been used to implement microprocessors.  ...  On current SoC circuits can be found CPUs, DSP, GPU, Memory controllers, crypto components, specialized interfaces for the different standards of transmission, graphics, media, etc. III.  ...  SIMD instructions support 8-16-32 integer types according to SIMD register size and simple and double precision floating point numbers.The graphics processors (GPU) launched by the end of 90' use the SIMT  ... 
arXiv:1803.00254v1 fatcat:mquk7tfhjrfz7ldokb2alpuevq


Henry Wong, Hong Wang, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H. Wang, Gautham Chinya, Ankur Khandelwal Groen, Hong Jiang
2008 Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08  
On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores.  ...  extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores.  ...  Henry Wong and Tor Aamodt are partly supported by the Natural Sciences and Engineering Research Council of Canada.  ... 
doi:10.1145/1454115.1454125 dblp:conf/IEEEpact/WongBSACWCGJW08 fatcat:p37zbpaobza7pngzkxogk37fyy

Heterogeneous Multi-core Architectures

Tulika Mitra
2015 IPSJ Transactions on System LSI Design Methodology  
In this context, heterogeneous multi-core architectures combining functionality and performance-wise divergent mix of processing cores (CPU, GPU, special-purpose accelerators, and reconfigurable computing  ...  ) offer a promising option.  ...  First, as mentioned before, CPU and GPU share a unified memory address space. A system bus connects CPU, GPU, and the memory controller together as shown in Fig. 6 .  ... 
doi:10.2197/ipsjtsldm.8.51 fatcat:wgiuptlmvvgnhdt2bjrcio6oqi

Exploring the Vision Processing Unit as Co-Processor for Inference

Sergio Rivas-Gomez, Antonio J. Pena, David Moloney, Erwin Laure, Stefano Markidis
2018 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
Preliminary results indicate that a multi-VPU configuration provides similar performance compared to reference CPU and GPU implementations, while reducing the thermal-design power (TDP) up to 8× in comparison  ...  In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations.  ...  In addition, we integrate the specific Caffe project forks optimized for Intel processors and NVIDIA graphics cards to conduct our experiments.  ... 
doi:10.1109/ipdpsw.2018.00098 dblp:conf/ipps/Rivas-GomezPMLM18 fatcat:o7iusdh4bfbppakyyvrupek76u

Exploring the Vision Processing Unit as Co-processor for Inference [article]

Sergio Rivas-Gomez, Antonio J. Peña, David Moloney, Erwin Laure, Stefano Markidis
2018 arXiv   pre-print
Preliminary results indicate that a multi-VPU configuration provides similar performance compared to reference CPU and GPU implementations, while reducing the thermal-design power (TDP) up to 8x in comparison  ...  In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations.  ...  In addition, we integrate the specific Caffe project forks optimized for Intel processors and NVIDIA graphics cards to conduct our experiments.  ... 
arXiv:1810.04150v1 fatcat:z5mm4kpv6fag7iutgtctyol5z4

Evaluating the Efficiency of CPUs, GPUs and FPGAs on a Near-Duplicate Document Detection Via OpenCL

Ercan Canhasi
2018 Journal of Computer Science  
In this study, a real-time solution for a simhash calculation in OpenCL is presented. We also show how it can be utilized by multi-CPUs, GPUs and FPGAs.  ...  Simhash is a widely used technique, able to attribute a bit-string identity to a text, such that similar texts have similar identities.  ...  Some of the most promising alternative technologies include multi-core CPUs, Graphics Processing Units (GPUs) and FPGAs.  ... 
doi:10.3844/jcssp.2018.699.704 fatcat:ocb3uf2gavfstkhshqrz5kryuy

Efficient complex operators for irregular codes

Jack Sampson, Ganesh Venkatesh, Nathan Goulding-Hotta, Saturnino Garcia, Steven Swanson, Michael Bedford Taylor
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance.  ...  They are up to 2.5× faster than a general-purpose processor and reduce energy consumption by up to 8× for a variety of irregular applications including several SPECINT benchmarks.  ...  Acknowledgements This research was funded by the US National Science Foundation under NSF CAREER Awards 06483880 and 0846152, and under NSF CCF Award 0811794.  ... 
doi:10.1109/hpca.2011.5749754 dblp:conf/hpca/SampsonVGGST11 fatcat:yqjxqk44jba4tjjtwweqcwpypi

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym
2008 IEEE Micro  
Parallel granularity Higher levels of parallelism use multiple GPUs per CPU and clusters of multi-GPU nodes.  ...  ......The modern 3D graphics processing unit (GPU) has evolved from a fixedfunction graphics pipeline to a programmable parallel processor with computing power exceeding that of multicore CPUs.  ... 
doi:10.1109/mm.2008.31 fatcat:dfatzl4dwzcjvg7e5ozkbygrli

A Survey of Neural Network Hardware Accelerators in Machine Learning

Fatimah Jasem, Manar AlSaraf
2021 Machine Learning and Applications An International Journal  
accelerators for heterogeneous multi-cores may become a main micro-architecture research issue.  ...  For ease of use and privacy restrictions, the requested image processing should be limited to a local embedded computer platform and with a high accuracy. Furthermore, less energy should be consumed.  ...  2017.The TITAN Xp is a highend graphics card by NVIDIA.Created on a 16 nm process, and based on the GP102 graphics processor, in its GP102-450-A1 variant, the chip supports DirectX 12.0.The GP102 graphics  ... 
doi:10.5121/mlaij.2021.8402 fatcat:vaya6cwywjaq3jefppxt6w2nuu

A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

Bruno Bougard, Bjorn De Sutter, Diederik Verkest, Liesbet Van der Perre, Rudy Lauwereins
2008 IEEE Micro  
This accelerator exploits the high ILP available in SDR kernels, combined with simple and effective DLP support. Its programming flow is fully integrated with that of the main CPU.  ...  The main CPU in this processor is a three-issue VLIW. The processor has an asynchronous reset, a single external system clock, and a half-speed (AMBA) bus clock.  ...  He has an MSc and a PhD in computer science from Ghent University, Belgium.  ... 
doi:10.1109/mm.2008.49 fatcat:j3lcc5uscrfjfegerkmgctinf4

Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM

Donghyuk Lee, Lavanya Subramanian, Rachata Ausavarungnirun, Jongmoo Choi, Onur Mutlu
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
In our DDMA design, main memory has two independent data channels, of which one is connected to the processor (CPU channel) and the other to the IO devices (IO channel), enabling CPU and IO accesses to  ...  By effectively decoupling accesses for CPU-GPU communication and in-memory communication from CPU accesses, our DDMA-based design achieves significant performance improvement across a wide variety of system  ...  Donghyuk Lee is supported in part by a Ph.D. scholarship from Samsung and the John and Claire Bertucci Graduate Fellowship.  ... 
doi:10.1109/pact.2015.51 dblp:conf/IEEEpact/LeeSACM15 fatcat:sm7bb67vnneyrkqerox66ck7ve
« Previous Showing results 1 — 15 out of 1,848 results