Filters








103 Hits in 8.9 sec

An Intelligent Task Scheduling Mechanism for Autonomous Vehicles via Deep Learning

Gomatheeshwari Balasekaran, Selvakumar Jayakumar, Rocío Pérez de Prado
2021 Energies  
The single-layer feedforward neural network (SLFN) and lightweight learning approaches are designed to distribute each task to the appropriate processor based on their emergency and CPU utilization.  ...  We developed this intelligent task management module in python and experimentally tested it on multicore SoCs (Odroid Xu4 and NVIDIA Jetson embedded platforms).Connected Autonomous Vehicles (CAV) and Internet  ...  Supervision, project administration, and final approval of the version to be published were conducted by R.P.d.P. All authors have read and agreed to the published version of the manuscript.  ... 
doi:10.3390/en14061788 fatcat:q7ksrt7tdbbgfckyklcdrjlq5e

Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach

Shuangde Fang, Chengyong Wu, Zidong Du, Yuntan Fang, Yuanjie Huang, Yang Chen, Lieven Eeckhout, Olivier Temam, Huawei Li, Yunji Chen
2014 ACM Transactions on Architecture and Code Optimization (TACO)  
Using a set of benchmarks run on a real heterogeneous SoC composed of a multicore processor and a GPU, we show that the runtime overhead is fairly small at 5.1% for the GPU and 6.4% for the multi-core.  ...  Because of tight power and energy constraints, industry is progressively shifting toward heterogeneous system-on-chip (SoC) architectures composed of a mix of general-purpose cores along with a number  ...  While programming multicores is already a challenging task, programming a heterogeneous SoC multicore architecture with accelerators is even more complex.  ... 
doi:10.1145/2608253 fatcat:ekgjnxiy6jdoxim3t2snisx2ly

Boosting Single Thread Performance in Mobile Processors via Reconfigurable Acceleration [chapter]

Geoffrey Ndu, Jim Garside
2012 Lecture Notes in Computer Science  
Mobile processors, a subclass of embedded processors, are increasingly employing multicore designs to improve performance.  ...  This paper presents the design of an architecture with such accelerators and evaluates the cost/performance implications of the design.  ...  The Configurable Compute Array (CCA) [11] is a matrix of simple, coarse-grained functional units coupled to a host CPU.  ... 
doi:10.1007/978-3-642-28365-9_10 fatcat:qyhbvqw7gvdafijct7oimidfku

Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Inputs

Junio C. R. da Silva, Lorena Leao, Vinicius Petrucci, Abdoulaye Gamatie, Fernando M. Q. Pereira
2020 2020 X Brazilian Symposium on Computing Systems Engineering (SBESC)  
Heterogeneous multicore systems, such as ARM big.LITTLE, use different types of processors to conciliate high performance with low energy consumption.  ...  A question that concerns such systems is how to find the best hardware configuration (type and frequency of processors) for a program.  ...  Among these technologies, two stand out today: dynamic voltage & frequency scaling [2] and heterogeneous architectures in which different processors are combined into the same chip.  ... 
doi:10.1109/sbesc51047.2020.9277863 fatcat:hsyhtdugindytpos5qi3r6u5n4

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Dominik Göddeke, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, Sven H.M. Buijssen, Matthias Grajewski, Stefan Turek
2007 Parallel Computing  
The first part of this paper surveys co-processor approaches for commodity based clusters in general, not only with respect to raw performance, but also in view of their system integration and power consumption  ...  We then extend previous work on a small GPU cluster by exploring the heterogeneous hardware approach for a large-scale system with up to 160 nodes.  ...  Thanks to NVIDIA and ATI for donating hardware that was used in developing the serial version of the GPU backend, and thanks to Mark Harris and Mike Houston for clarifying hardware details.  ... 
doi:10.1016/j.parco.2007.09.002 fatcat:s7z3hudamjds5bxbebsdzkmyb4

Arnold: an eFPGA-Augmented RISC-V SoC for Flexible and Low-Power IoT End-Nodes [article]

Pasquale Davide Schiavone, Davide Rossi, Alfio Di Mauro, Frank Gurkaynak, Timothy Saxe, Mao Wang, Ket Chong Yap, Luca Benini
2020 arXiv   pre-print
The proposed SoC provides 3.4x better performance and 2.9x better energy efficiency than other fabricated heterogeneous re-configurable SoCs of the same class.  ...  We demonstrate the flexibility of the System-OnChip (SoC) to tackle the challenges of many emerging IoT applications, such as (i) interfacing sensors and accelerators with non-standard interfaces, (ii)  ...  [28] implemented a 180 nm 2 0mm 2 SoC, where eFPGA is integrated with the CPU pipeline to implement a reconfigurable Application Specific Instruction Processor (ASIP) SoC, with the eFPGA implementing  ... 
arXiv:2006.14256v1 fatcat:e7zuiqpiinesjco4t6cizhklya

FPGA-Based Processor Acceleration for Image Processing Applications

Fahad Siddiqui, Sam Amiri, Umar Minhas, Tiantai Deng, Roger Woods, Karen Rafferty, Daniel Crookes
2019 Journal of Imaging  
The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the  ...  We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded  ...  Funding: This work has been undertaken in collaboration with Heriot-Watt University in a project funded by the Engineering and Physical Science Research Council (EPSRC) through the EP/K009583/1 grant.  ... 
doi:10.3390/jimaging5010016 pmid:34465705 fatcat:rjevyyjetjfllofqb3b4qnqmse

Processing Panorama Video in Real-time

Håkon Kvale Stensland, Vamsidhar Reddy Gaddam, Marius Tennøe, Espen Helgedagsrud, Mikkel Næss, Henrik Kjus Alstad, Carsten Griwodz, Pål Halvorsen, Dag Johansen
2014 International Journal of Semantic Computing (IJSC)  
The P2G framework is designed for multimedia workloads and supports heterogeneous architectures. To demonstrate the feasibility of the framework, we construct a proof-of-concept implementation.  ...  To overcome this problem, processors were designed with reduced clock frequencies but with multiple cores and, later, heterogeneous processing elements.  ...  Mobile devices have followed the same trend and several processor designs, such as Nvidia's Tegra 4 mobile system on a chip (SoC) [91] , have a quad-core general-purpose processor.  ... 
doi:10.1142/s1793351x14400054 fatcat:hafewx3ekrcfpat2osb67fjugi

Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster

Dominik Göddeke, Dimitri Komatitsch, Markus Geveler, Dirk Ribbrock, Nikola Rajovic, Nikola Puzovic, Alex Ramirez
2013 Journal of Computational Physics  
We evaluate weak and strong scalability on a cluster of 96 ARM Cortex-A9 dual-core processors and demonstrate that the ARM-based cluster can be more efficient in terms of energy to solution when executing  ...  Power consumption and energy efficiency are becoming critical aspects in the design and operation of large scale HPC facilities, and it is unanimously recognised that future exascale supercomputers will  ...  Buijssen for help with debugging and compilers, Thomas Rohkämper for setting up some of the FEAST coarse grids, Manh Ha Nguyen and Harald Servat for help with ParaVer, Harald Servat for detailed explanations  ... 
doi:10.1016/j.jcp.2012.11.031 fatcat:74qyyt5duvaa5nw2ude3rpmqea

OpenMDSP: Extending OpenMP to Program Multi-Core DSP

Jiangzhou He, Wenguang Chen, Guangri Chen, Weimin Zheng, Zhizhong Tang, Handong Ye
2011 2011 International Conference on Parallel Architectures and Compilation Techniques  
We implement the compiler and runtime system for Open-MDSP on FreeScale MSC8156. Benchmarking result shows that seven out of nine benchmarks achieve a speedup of more than 5 with 6 threads.  ...  Comparing with general purpose multi-processors, the multicore DSPs normally have more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory.  ...  We would also thank Dehao Chen, Jidong Zhai and Tianwei Sheng for their useful comments on this paper. Finally, we would thank Ziang Hu, Qian Tan and Libin Sun for their help on experiments.  ... 
doi:10.1109/pact.2011.60 dblp:conf/IEEEpact/HeCCZTY11 fatcat:4rwjfzd7ofg5receklzmznekvq

Modeling and mitigation of extra-SoC thermal coupling effects and heat transfer variations in mobile devices

Francesco Paterna, Tajana Simunic Rosing
2015 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)  
This influences the SoC thermal envelope, because of the absence of a fan. Thus the thermal conditions of the SoC cannot simply be modeled as only a function of the SoC component power.  ...  In smartphones and tablets, a number of components, such as display and communication subsystem, dissipate a significant amount of heat.  ...  ACKNOWLEDGMENT The research has been supported by NSF SHF: Small: Cooling, energy and performance management in computing systems.  ... 
doi:10.1109/iccad.2015.7372657 dblp:conf/iccad/PaternaR15 fatcat:o6r55mdb75gqfjfz5bb2gprwfq

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems

Graeme M. Bragg, Charles Leech, Domenico Balsamo, James J. Davis, Eduardo Wachter, Geoff V. Merrett, George A. Constantinides, Bashir M. Al-Hashimi
2018 Proceedings of the 8th International Joint Conference on Pervasive and Embedded Computing and Communication Systems  
The operation of the proposed framework is experimentally validated using a basic runtime controller and two heterogeneous platforms, to show how it is application-and platform-agnostic and easy to use  ...  Heterogeneous multiprocessor systems have increased in complexity to provide both high performance and energy efficiency for a diverse range of applications.  ...  An open source implementation of the framework can be found at https://github.com/PRiMEproject/PRiME-Framework. The authors would like to thank Joshua M. Levine and James R. B.  ... 
doi:10.5220/0006939101950204 dblp:conf/peccs/BraggLBDWMCA18 fatcat:owdqxm3wl5dkphqcejqebwjwz4

Qualitative Precipitation Estimation from Satellite Data Based on Distributed Domain-Specific Architecture

Sethakarn Prongnuch, Theerayod Wiangtong, Suchada Sitjongsataporn, Angelos Markopoulos
2021 Modelling and Simulation in Engineering  
The aim of this research is to decrease the QPE processing time by using distributed domain-specific architecture (DDSA), with 9 small computing boards are connected to a gigabit switch.  ...  The QPE process consists of receiving and managing the raw data from the satellite every 10 minutes and calculating the rain-temperature relationship.  ...  We applied the Microserver Parallella board [22] in the form of DSA. Each contains a ZYNQ SoC processor and a 16-core Epiphany RISC coprocessor.  ... 
doi:10.1155/2021/8827900 fatcat:3nmh4e7esfdd3psfjmqitqa6fy

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning [article]

Joshua Klein, Irem Boybat, Yasir Qureshi, Martino Dazzi, Alexandre Levisse, Giovanni Ansaloni, Marina Zapater, Abu Sebastian, David Atienza
2022 arXiv   pre-print
We showcase and analyze a variety of mappings of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional  ...  With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems  ...  ACKNOWLEDGMENTS We thank Geethan Karunaratne, Pier Andrea Francese and Riduan Khaddam-Aljameh for technical discussions.  ... 
arXiv:2205.10042v1 fatcat:qiirzfbcxzcrbhdtyjdsss4krq

D2.3 Power models, energy models and libraries for energy-efficient concurrent data structures and algorithms [article]

Phuong Hoai Ha, Vi Ngoc-Nha Tran, Ibrahim Umar, Aras Atalar, Anders Gidenstam, Paul Renaud-Goud, Philippas Tsigas, Ivan Walulya
2018 arXiv   pre-print
The work has been conducted on two main EXCESS platforms: Intel platforms with recent Intel multicore CPUs and Movidius Myriad platforms.  ...  It reports i) the latest results of Task 2.2-2.4 on providing programming abstractions and libraries for developing energy-efficient data structures and algorithms and ii) the improved results of Task  ...  This prediction is validated for the two SpMV algorithms on two HPC platform with nine different input matrix types from Florida matrix collection.  ... 
arXiv:1801.10556v2 fatcat:y5n53z4gz5de3lzrmvd7soa26y
« Previous Showing results 1 — 15 out of 103 results