Filters








159,975 Hits in 4.5 sec

Efficient Realization of Householder Transform Through Algorithm-Architecture Co-Design for Acceleration of QR Factorization

Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S. K. Nandy, Ranjani Narayan
2018 IEEE Transactions on Parallel and Distributed Systems  
We present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt  ...  Theoretical and experimental analysis of classical HT is performed for opportunities to exhibit higher degree of parallelism where parallelism is quantified as a number of parallel operations per level  ...  We use this PE as a CFU for REDEFINE for parallel realization to show scalability of algorithms and architecture.  ... 
doi:10.1109/tpds.2018.2803820 fatcat:axzkxm5q7nc6jfr6rnz7hrupee

ADVANCED FEATURES OF NVIDIA KEPLER ARCHITECTURE AND PARALLEL COMPUTATION PLATFORM CUDA FOR DEVELOPING SCIENTIFIC COMPUTE-INTENSIVE APPLICATIONS

V.A. Dudnik, V.I. Kudryavtsev, S.A. Us, M.V. Shestakov
2019 Problems of Atomic Science and Technology  
New capabilities of the parallel computation platform CUDA are also described, in particular, regarding a set of program development tool extensions for the Fortran, C and C++ languages.  ...  The paper describes additional features offered by new Kepler architecture of NVIDIA graphic processors, and their usage for creating high performance programs in a wide range of scientific compute-intensive  ...  CONCLUSIONS provided simultaneous execution of up to 16 inde- The GPU Kepler GK110 architecture much con- pendent streams, but in this case a simple hardware tributes to a high performance computing efficiency  ... 
doi:10.46813/2019-121-105 fatcat:k6dxzetqvjb25chihqqccvbs6m

Conceptual design upgrade on hybrid powertrains resulting from electric improvements

M. Passalacqua, D. Lanzarotto, M. Repetto, M. Marchesoni
2017 International Journal on Transport Development and Integration  
hybrid vehicles have experienced a great boom in recent years thanks to the increasing spread of 'parallel' architectures, often realized by a planetary gear train (hybrid Synergy Drive). at the same time  ...  In the current scenario, this architecture could benefit from the above-mentioned technology, becoming a competitive alternative to the actual powertrain configurations.  ...  INTrODuCTION The development of hybrid power trains in the automotive industry has been characterized by a large spread of parallel architectures, mostly realized thanks to a planetary gear train [1-3]  ... 
doi:10.2495/tdi-v2-n2-146-154 fatcat:qzrt3xf5ujekhl7qz27xazzma4

Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design [article]

Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan
2016 arXiv   pre-print
For efficient sequential realization of BLAS, we present design of a Processing Element (PE) and perform micro-architectural enhancements in the PE to achieve up-to 74% of the theoretical peak performance  ...  In this paper, we present acceleration of Level-1 (vector operations), Level-2 (matrix-vector operations), and Level-3 (matrix-matrix operations) BLAS through algorithm architecture co-design on a Coarse-grained  ...  Considering inability of GPGPU and multicore architectures in exploiting parallelism available in BLAS, we recommend algorithm-architecture co-design for BLAS as a solution for efficient realization of  ... 
arXiv:1610.06385v5 fatcat:4c4fmt3qszhvlif752o3jkclja

Design of a Massively Parallel Vision Processor based on Multi-SIMD Architecture

Kota Yamaguchi, Yoshihiro Watanabe, Takashi Komuro, Masatoshi Ishikawa
2007 2007 IEEE International Symposium on Circuits and Systems  
to process on a conventional massively parallel SIMD architecture.  ...  The proposed architecture consists of two SIMD parallel processing modules and a shared memory, allowing highly parallelized and flexible computation of complicated recognition tasks, which were difficult  ...  A massively parallel SIMD architecture usually has a memory only locally in each PE.  ... 
doi:10.1109/iscas.2007.378381 dblp:conf/iscas/YamaguchiWKI07 fatcat:duwuysukpvawtdfapw27sachaa

Optimal design of power-split transmissions for hydraulic hybrid passenger vehicles

Kai Loon Cheong, Perry Y. Li, Thomas R. Chase
2011 Proceedings of the 2011 American Control Conference  
Power-split or hydro-mechanical transmissions (HMT) have advantages over series and parallel architectures.  ...  This captures different architectures such as input coupled, output coupled and compound configurations. Generic kinematic relations are shown to be mechanically realizable.  ...  Parallel hybrids, in contrast, are advantageous that a significant portion of the power is transferred through the highly efficient mechanical path.  ... 
doi:10.1109/acc.2011.5991509 fatcat:go2zzil4orde5ecfnm5a5dxcl4

Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations

Ren Chen, Viktor K. Prasanna
2015 2015 25th International Conference on Field Programmable Logic and Applications (FPL)  
(a) Parallel Architecture Processing Element Bank 1 Bank 2 Bank r Shared memory (b) Shared memory Architecture Data Permutation in Streaming Architectures  Streaming architecture  High  ...   Multistage network to realize all !  ... 
doi:10.1109/fpl.2015.7293944 dblp:conf/fpl/ChenP15 fatcat:sh5ba7pezfdlhm7ohtecn323gq

Total Eclipse - an Efficient Architectural Realization of the Parallel Random Access Machine [chapter]

Martti Forsell
2010 Parallel and Distributed Computing  
Architectural Realization of the Parallel Random Access Machine 45 Fig. 7.  ...  Conclusion We have introduced the TOTAL ECLIPSE CMP architecture providing an efficient realization of PRAM.  ...  Total Eclipse -an Efficient Architectural Realization of the Parallel Random Access Machine, Parallel and Distributed Computing, Alberto Ros (Ed.), ISBN: 978-953-307-057-5, InTech, Available from: http  ... 
doi:10.5772/9446 fatcat:xc6rf24okjcdzodaia72aey7ky

Design of cloud computing architecture for DIOT

Liang Chen, Xueping Gu, Jing Qiu
2012 2012 First National Conference for Engineering Sciences (FNCES 2012)  
At last, the performances of data storage, data analysis and data mining of proposed architecture are tested and the results prove its validity and efficiency.  ...  Besides, MapReduce technology is applied to realize data analysis and data mining. Accordingly, a complete cloud computing framework for the domestic internet of things is established.  ...  And this requires an efficient communication management strategy, a storage platform, which can store vast amounts of data, and a powerful cloud computing platform.  ... 
doi:10.1109/nces.2012.6543861 fatcat:bs2q2yinvvfqbayk7gon3dhuba

Quality-driven methodology for demanding accelerator design

Lech Jozwiak, Yahya Jan
2010 2010 11th International Symposium on Quality Electronic Design (ISQED)  
This paper focuses on mastering the architecture development of hardware accelerators for demanding applications.  ...  Based on the results of our analysis, we formulate the main requirements that have to be satisfied by an adequate methodology for demanding accelerator design, and propose an architecture design methodology  ...  Effectiveness is the degree to which a solution attains its goals. Efficiency is the degree to which a solution uses resources in order to realize its aims.  ... 
doi:10.1109/isqed.2010.5450546 dblp:conf/isqed/JozwiakJ10 fatcat:bjw3ejdtf5bcnkkcbnm5peifsi

SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators

Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, Shimpei SATO, Hiroki NAKAHARA
2020 IEICE transactions on information and systems  
The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture.  ...  To address this problem, we present SENTEI * , filterwise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy.  ...  To realize high-speed calculation, we must utilize the above three types of parallelisms flexibly based on network architectures.  ... 
doi:10.1587/transinf.2020pap0013 fatcat:ezpelksin5c4lkqy73hoocc4mq

Exploiting Coarse-grained Parallelism in Multi-transform Architectures for H.264/AVC High Profile Codecs

Tiago Dias, Nuno Roma, Leonel Sousa
2014 Procedia Technology - Elsevier  
A parallel Multi-Transform Architecture (MTA) for the computation of the 2-D transforms adopted in modern digital video standards is proposed in this paper.  ...  The advantages offered by the proposed parallel architecture were assessed by implementing in a Xilinx Virtex-7 FPGA a proof-of-concept transform core compliant with the High Profiles of the H.264/AVC  ...  Acknowledgements This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) under the project "HELIX: Heterogeneous e Multi-Core Architecture for Biological Sequence  ... 
doi:10.1016/j.protcy.2014.10.223 fatcat:sifgqlyrhzbnlffulpjfef6tyq

A multi-keyword parallel ciphertext retrieval scheme based on inverted index under the robot distributed system

Jiyue Wang, Xi Zhang, Yonggang Zhu
2021 EAI Endorsed Transactions on Scalable Information Systems  
By combining the characteristics of distribution, the traditional single-machine retrieval architecture is extended and multi-keyword parallel retrieval is realized.  ...  In this paper, a multi-keyword parallel ciphertext retrieval system based on inverted index is proposed.  ...  On this basis, this paper uses the inverted index segmentation and the distributed model architecture to realize the multikeyword parallel ciphertext retrieval scheme in the distributed environment.  ... 
doi:10.4108/eai.17-12-2021.172438 fatcat:ixjgjgqu3fgzjkhf4hajn3w6m4

New conception and algorithm of allocation mapping for processor arrays implemented into multi-context FPGA devices

Piotr Ratuszniak, Oleg Maslennikow
2009 2009 International Multiconference on Computer Science and Information Technology  
Processor matrix efficiency depends on both allocation and schedule mapping.  ...  In the paper authors present new concept of realization of algorithms with regular graphs of information dependencies, in form of systolic arrays realized in multi-context programmable devices.  ...  One of the models of parallel architectures created for linear algebra algorithms, is a parallel architecture model with a virtual topology.  ... 
doi:10.1109/imcsit.2009.5352752 dblp:conf/imcsit/RatuszniakM09 fatcat:zrqzorliprazrashjqygd5eqyy

Design of Polyphase Channelization Algorithm Based on CUDA Stream Architecture

Yongqiang Chen, Hong Ma, Yiwen Jiao, Hongjie Dang
2021 DEStech Transactions on Materials Science and Engineering  
In order to improve the reconfigurability and computing efficiency of the polyphase channelization system, a new algorithm based on CUDA stream architecture was designed and optimized.  ...  Firstly, the principle of parallel channelization algorithm without blind zones is introduced.  ...  It should be noted that, the parallel channelization algorithm based on streaming architecture can effectively improve the processing efficiency from the structural level, but it cannot affect the efficiency  ... 
doi:10.12783/dtmse/ameme2020/35540 fatcat:jjdundcqqjbcfdowqdrm5q2ory
« Previous Showing results 1 — 15 out of 159,975 results