Filters








957 Hits in 7.7 sec

Rapid performance re-engineering of distributed embedded systems via latency analysis and -level diagonal search

Jungkeun Park, Minsoo Ryu, Seongsoo Hong, Lucia Lo Bello
2006 Journal of Parallel and Distributed Computing  
This paper presents a systematic methodology aimed at rapid and cost-effective re-engineering of distributed embedded systems.  ...  We also propose a k-level diagonal search algorithm that allows us to trade optimality for search time. Our experimental results show the effectiveness of the proposed re-engineering approach.  ...  We are focusing on performance re-engineering of distributed embedded systems which requires strict latency analysis and efficient latency reduction techniques.  ... 
doi:10.1016/j.jpdc.2005.06.004 fatcat:jycs5va6ebdfnpknqcx7ggcp5e

Embedded Content Based Image Retrieval Method on FPGA Card

2020 International Journal of Engineering and Advanced Technology  
FPGAs have shown very high performance in spite of their low operational frequency and high parallelism in applications.  ...  We are conducting a study on the advantages of this embedded approach and discussed its effectiveness for a set of benchmark images publically available dataset PH2.  ...  This provides the applications-specific solution's re-programmability while maintaining the performance benefit. In real-time applications as size of image. In 2016. S. Warghade, M. Vyas, M. K.  ... 
doi:10.35940/ijeat.c5527.029320 fatcat:24dgrvsf25h2zobevxyg6d7qqi

GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis

Hasindu Gamaarachchi, Chun Wai Lam, Gihan Jayatilaka, Hiruna Samarakoon, Jared T. Simpson, Martin A. Smith, Sri Parameswaran
2020 BMC Bioinformatics  
By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in  ...  We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs.  ...  Acknowledgements We thank NVIDIA for providing the Jetson TX2 board (to University of New South Wales) and Tesla K40 GPU (to University of Peradeniya) through the GPU donation programme. We thank Dr.  ... 
doi:10.1186/s12859-020-03697-x pmid:32758139 fatcat:k4ylm4rwy5ejnip3lgrlq4qh2a

Performance Portable Solid Mechanics via Matrix-Free p-Multigrid [article]

Jed Brown, Valeria Barra, Natalie Beams, Leila Ghaffari, Matthew Knepley, William Moses, Rezgar Shakeri, Karen Stengel, Jeremy L. Thompson, Junchao Zhang
2022 arXiv   pre-print
Finite element analysis of solid mechanics is a foundational tool of modern engineering, with low-order finite element methods and assembled sparse matrices representing the industry standard for implicit  ...  We use performance models and numerical experiments to demonstrate that high-order methods greatly reduce the costs to reach engineering tolerances while enabling effective use of GPUs; these data structures  ...  applications, hardware, advanced system engineering and early testbed platforms, in support of the nation's exascale computing imperative.  ... 
arXiv:2204.01722v3 fatcat:4wmiw475tndila4lpk27iurxrm

Space and High Energy Experiments Advanced Electronic Systems 2012

Ryszard S. Romaniuk
2012 International Journal of Electronics and Telecommunications  
and Internet Engineering.  ...  The symposium is an annual summary in the development of numerable Ph.D. theses carried out in this country in the area of advanced electronic and photonic systems.  ...  The odd event packets at the FEEE level are first routed via the backplane and then outbound to the CPU level via a proper trunk line.  ... 
doi:10.2478/v10177-012-0060-0 fatcat:2nrsjwpharcuva4642wi3zmbim

D1.1 - State of the Art Analysis

Danilo Ardagna
2021 Zenodo  
In the last part of the deliverable, we report an overview of the performance modelling solutions, security, and privacy problems for AI applications in edge environments.  ...  ), providing resource efficiency, performance, data privacy, and security guarantees.  ...  Re-build and re-run the make_plan program on the target system to generate a plan file www.ai-sprint-project.eu PyCOMPSs Developing applications able to efficiently use distributed infrastructure has  ... 
doi:10.5281/zenodo.6372377 fatcat:f6ldfuwivbcltew4smiiwphfty

Applications and Techniques for Fast Machine Learning in Science

Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik (+35 others)
2022 Frontiers in Big Data  
This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.  ...  training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms.  ...  A thorough analysis of system-level drivers for FPGA is out of the scope of our white paper.  ... 
doi:10.3389/fdata.2022.787421 pmid:35496379 pmcid:PMC9041419 fatcat:5w2exf7vvrfvnhln7nj5uppjga

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.  ...  training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms.  ...  to overwhelm system-level performance of (much denser) ReRAM based circuits [551] .  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

ESSEX: Equipping Sparse Solvers For Exascale [chapter]

Christie L. Alappat, Andreas Alvermann, Achim Basermann, Holger Fehske, Yasunori Futamura, Martin Galgon, Georg Hager, Sarah Huber, Akira Imakura, Masatoshi Kawai, Moritz Kreutzer, Bruno Lang (+6 others)
2020 Lecture Notes in Computational Science and Engineering  
Starting without the burden of legacy code, a holistic performance engineering process could be deployed across the traditional software layers to identify efficient implementations and guide sustainable  ...  The ESSEX project has investigated programming concepts, data structures, and numerical algorithms for scalable, efficient, and robust sparse eigenvalue solvers on future heterogeneous exascale systems  ...  We are grateful for computer time granted on the LRZ SuperMUC and SuperMUC-NG, the CSCS Piz Daint, and the OakForest PACS systems.  ... 
doi:10.1007/978-3-030-47956-5_7 fatcat:srs4qavwcvezbovdhpbhf6d2ni

Self-adapting numerical software (SANS) effort

J. Dongarra, G. Bosilca, Z. Chen, V. Eijkhout, G. E. Fagg, E. Fuentes, J. Langou, P. Luszczek, J. Pjesivac-Grbovic, K. Seymour, H. You, S. S. Vadhiyar
2006 IBM Journal of Research and Development  
Self-Adapting Numerical Software (SANS) systems are intended to meet this significant challenge.  ...  Additionally, at any of these levels we can decide to rearrange the user's data. In this paper we look at a number of efforts at the University of Tennessee that are investigating these areas.  ...  In search space of R n , we start n+1 searches. The initial simplexes are uniformly distributed along the diagonal of the search space.  ... 
doi:10.1147/rd.502.0223 fatcat:uklej3fgarguzktd7rhv6aqoya

Gradient Descent Effects on Differential Neural Architecture Search: A Survey

Santanu Santra, Jun-Wei Hsieh, Chi-Fang Lin
2021 IEEE Access  
Gradient Descent, an effective way to search for the local minimum of a function, can minimize training and validation loss of neural architectures and also be incited in an appropriate order to decrease  ...  In view of this, an in-depth survey is necessary to cover the usefulness of gradient descent method and how this can benefit neural architecture search.  ...  TABLE 4 . 4 Performance analysis of gradient based architecture search approaches on Cifar10, Cifar100, and ImageNet datasets.  ... 
doi:10.1109/access.2021.3090918 fatcat:v5yrxbpzjvdozauevowlxsjeya

Building the second generation of parallel/distributed virtual reality systems

Shaun Bangay, James Gain, Greg Watkins, Kevan Watkins
1997 Parallel Computing  
This thesis describes the development of a performance analysis technique that is able to provide measures of both interaction latency and cycle time for a model of a . Virtual Reality system.  ...  Most Virtual Reality systems emplQy some' form of parallel processing, making use of multiple processors which are often distrIbuted over large areas geographically, and which communicate via various forms  ...  Discussions with members of the virtual reality community across the world, both via e-mail, and through public forums have been an important source of information.  ... 
doi:10.1016/s0167-8191(97)00040-9 fatcat:txe7b7xatbcbpdxrolsn4rs6ky

A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra

Biji George, Om ji Omer, Ziaul Choudhury, Anoop V, Sreenivas Subramoney
2022 ACM Transactions on Embedded Computing Systems  
Aggressive and novel microarchitecture techniques along with block-level sparsity support optimize compute and data-reuse to minimize bandwidth and power requirements enabling ultra-low latency applications  ...  We submit MxCore as the generalized approach to facilitate the flexible acceleration of multiple Matrix Algebra and Deep-learning applications across a range of sparsity levels.  ...  values of diagonal and non-diagonal elements, respectively.  ... 
doi:10.1145/3524453 fatcat:miqhwzep3fey5admehib4md5ly

A Survey on Green Deep Learning [article]

Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li
2021 arXiv   pre-print
In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV)  ...  This paper focuses on presenting a systematic review of the development of Green deep learning technologies.  ...  Then, attention takes Q, K, and V as inputs and is responsible for generating vector via the following equation: Attention(Q, K, V ) = softmax( QK T √ d k )V (2.1) where Q, K, V are 3-dimension tensors  ... 
arXiv:2111.05193v2 fatcat:t2blz24y2jakteeeawqqogbkpy

Software challenges in extreme scale systems

Vivek Sarkar, William Harrod, Allan E Snavely
2009 Journal of Physics, Conference Series  
them, and scaling multiple chips to complete systems, for a range of real system applications, from highly scalable deep space exploration to trans-petaflops level supercomputing.  ...  Carlson is a member of the research staff at the IDA Center for Computing Sciences where, since 1990, his focus has been on applications and system tools for large-scale parallel and distributed computers  ...  On-line collection and analysis of data steers the search for the most appropriate mapping, with a tiny subset of results added to the performance database.  ... 
doi:10.1088/1742-6596/180/1/012045 fatcat:iukutry2dvbitfdh6ng7kgz564
« Previous Showing results 1 — 15 out of 957 results