64,158 Hits in 4.4 sec

The Bit-Nibble-Byte MicroEngine (BnB) for Efficient Computing on Short Data

Dilip Vasudevan, Andrew A. Chien
2015 Proceedings of the 25th edition on Great Lakes Symposium on VLSI - GLSVLSI '15  
We propose and evaluate a new microarchitecture called "Bit-Nibble-Byte" (BnB). We describe our design which includes both long fixed point vectors and as well as novel variable length instructions.  ...  We evaluate BnB with a detailed design of 5 vector sizes (128,256,512,1024,2048) mapped into 32nm and 7nm transistor technologies, and in combination with a variety of memory systems (DDR3 and HMC).  ...  METHODOLOGY AND EXPERIMENT Design Configurations: BNB micro-architecture was implemented using the Synopsys CAD flow.  ... 
doi:10.1145/2742060.2742106 dblp:conf/glvlsi/VasudevanC15 fatcat:76fg6v3jjnb5vmlpmwapcnoxtq

Single-Issue 1500MIPS Embedded DSP with Ultra Compact Codes

Li-Chun Lin, Shih-Hao Ou, Tay-Jyi Lin, Siang-Den Deng, Chih-Wei Liu
2007 2007 Asia and South Pacific Design Automation Conference  
In this paper, we borrow some ideas from old vector machines and propose a novel DSP architecture with very compact codes.  ...  The performance of single-issue RISC cores can be improved significantly with multi-issue architectures (i.e. superscalar or VLIW) by activating the parallel functional units concurrently.  ...  Micro-architecture Design & Performance Evaluation The micro-architecture design begins after the ISA is finalized.  ... 
doi:10.1109/aspdac.2007.357965 dblp:conf/aspdac/LinOLDL07 fatcat:k6xsj6yydjdr3g3yheqyftlhju

Evaluating architecture and compiler design through static loop analysis

Yuriy Kashnikov, Pablo de Oliveira Castro, Emmanuel Oseret, William Jalby
2013 2013 International Conference on High Performance Computing & Simulation (HPCS)  
The distributions of these features on a large representative code corpus can be used to evaluate compilers and architectures and tune them for the most frequently used assembly patterns.  ...  We evaluate register allocation and vectorization on two compilers and propose a method to tune loop buffer size and stream prefetcher based on static analysis of benchmarks.  ...  This paper is a result of work performed in the Application Characterization team from the Exascale Computing Research Lab with support provided by CEA, GENCI, Intel, and UVSQ.  ... 
doi:10.1109/hpcsim.2013.6641465 dblp:conf/ieeehpcs/KashnikovCOJ13 fatcat:mc5rwflgivfszgivrhgj4tg4a4

Experiences with Mobile Processors for Energy Efficient HPC

Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, Alex Ramirez
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013  
In this paper we present our first experiences with the use of mobile processors and accelerators for the HPC domain based on the research that was performed in the project.  ...  We show initial evaluation of NVIDIA Tegra 2 and Tegra 3 mobile SoCs and the NVIDIA Quadro 1000M GPU with a set of HPC microbenchmarks to evaluate their potential for energy-efficient HPC.  ...  Another way for increasing the usability of ARM-based mobile MPSoCs in the HPC domain is the use of integrated GPU: most ARM-based mobile MPSoCs have an integrated GPU, such as NVIDIA ULP GeForce in Tegra  ... 
doi:10.7873/date.2013.103 dblp:conf/date/RajovicRVGPR13 fatcat:sg5tfqttdbathf7xf4legdroey

Meta-implementation of vectorized logarithm function in binary floating-point arithmetic

Hugues de Lassus Saint-Genies, Nicolas Brunie, Guillaume Revy
2018 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Third it shows how to automate this implementation process using the MetaLibm framework, on SSE/AVX and AVX2 supporting micro-architectures.  ...  This article focuses on the design of vectorized implementation of log(x) function, and more particularly on its automation for different formats and micro-architectures.  ...  To evaluate these performances, we used either automated micro-benchmarks provided by MetaLibm or custom microbenchmarks for the handmade version and Libmvec routines.  ... 
doi:10.1109/asap.2018.8445102 dblp:conf/asap/Saint-GeniesBR18 fatcat:ylt3ewupirhslcatpn4e4iwyoe

Pruning hardware evaluation space via correlation-driven application similarity analysis

Rosario Cammarota, Arun Kejariwal, Paolo D'Alberto, Sapan Panigrahi, Alexander V. Veidenbaum, Alexandru Nicolau
2011 Proceedings of the 8th ACM International Conference on Computing Frontiers - CF '11  
We evaluate the proposed methodology on three different micro-architectures, viz., Intel's Harpertown, Nehalem and Westmere, using industry-standard SPEC CINT2006.  ...  System evaluation is routinely performed in industry to select one amongst a set of different systems to improve performance of proprietary applications.  ...  The micro-architectures under evaluation are referred as candidate systems, whereas the microarchitecture currently in use is referred as current system.  ... 
doi:10.1145/2016604.2016610 dblp:conf/cf/CammarotaKDPVN11 fatcat:mfqdasfdgrau7nf4rrot5pbriq

Performance Optimizations of Recursive Electronic Structure Solvers targeting Multi-Core Architectures (LA-UR-20-26665) [article]

Adetokunbo A. Adedoyin, Christian F. A. Negre, Jamaludin Mohd-Yusof, Nicolas Bock, Daniel Osei-Kuffuor, Jean-Luc Fattebert, Michael E. Wall, Anders M. N. Niklasson, Susan M. Mniszewski
2021 arXiv   pre-print
After introducing these optimizations, we benchmark the micro-kernels and compare the run-time before and after optimization for several target architectures.  ...  With that in mind, we proceed with this investigation by performing a survey of the entirety of the BML code-base, and extract, in form of micro-kernels, common snippets of code.  ...  This work was performed under the U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), U.S.  ... 
arXiv:2102.08505v1 fatcat:z24bxjsjzzgxnlg3fzskoqjnm4

Area-Power Efficient Multi Staged Pipelined CORDIC Using Micro-Rotation Selection

Mahendra kumar
2013 IOSR Journal of VLSI and Signal processing  
The CORDIC method is the most versatile of all the algorithms that can be used to evaluate elementary functions.  ...  Synthesis and implementation results are shown and also we use Most-significant-1 bit detection technique for micro-rotation sequence generation to reduce the number of iteration.  ...  Pipelined architecture having shift register that perform fixed number of shifts every time. Registers are used to store the angle for a particular micro rotation.  ... 
doi:10.9790/4200-0262936 fatcat:t43ifrv755f6dkhna4otmfswie

Page 1031 of IEEE Transactions on Computers Vol. 52, Issue 8 [page]

2003 IEEE Transactions on Computers  
Her research interests include high performance microprocessor architecture, memory systems, computer performance evaluation and bench- marking, workload characterization, and optimi- zation of architectures  ...  (ICS), IEEE Micro Symposium (MICRO), IEEE High Performance Computer Architecture Symposium (HPCA), etc., and has a patent for a field programmable memory cell array chip.  ... 

ADvISE: Architectural Decay in Software Evolution

Salima Hassaine, Yann-Gaël Guéhéneuc, Sylvie Hamel, Giuliano Antoniol
2012 2012 16th European Conference on Software Maintenance and Reengineering  
Therefore, stability or resilience is a primary criterion for evaluating an architecture.  ...  We use these triplets as basic unit to measure the stability of an architecture.  ...  Step 5: Architectural Decay Evaluation We perform a pairwise matching of subsequent program architectures to identify the sets of stable triplets and stable micro-architectures.  ... 
doi:10.1109/csmr.2012.34 dblp:conf/csmr/HassaineGHA12 fatcat:6aze66oa4jhyrbsxx6cweqqrde

The BLIS Framework

Field G. Van Zee, Vernon Austel, John A. Gunnels, Lee Killough, Tyler M. Smith, Bryan Marker, Tze Meng Low, Robert A. Van De Geijn, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler
2016 ACM Transactions on Mathematical Software  
We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures.  ...  BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries.  ...  This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S.  ... 
doi:10.1145/2755561 fatcat:yrv7amzpyvexdiimqutxtij5zm

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads [article]

Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Zhen Jia, Daoyi Zheng, Chen Zheng, Xiwen He, Hainan Ye, Haibin Wang, Rui Ren
2018 arXiv   pre-print
The evaluation results show that our proxy benchmarks shorten the execution time by 100s times on real systems while maintaining the average system and micro-architecture performance data accuracy above  ...  For the architecture community, reasonable simulation time is a strong requirement in addition to performance data accuracy.  ...  Accuracy We evaluate the accuracy of proxy benchmarks from system and micro-architecture perspectives, using all metrics listed in Table V . System and Micro-architecture Data Accuracy.  ... 
arXiv:1810.09376v1 fatcat:ingrwyjpobavzhujbukj2gwfkm

Designand Analysis of Digital Wave Generator using CORDIC Algorithm with Pipelining and Angle Recoding Technique

Navdeep Prashar
2012 Computer Science & Engineering An International Journal  
Pipeline architectures are used in CORDIC algorithm to reduce the critical path, increases the clock speed.  ...  In this paper, Hardware efficient Digital sine and cosine wave generator is designed and implemented by using pipelined CORDIC architecture.  ...  It contains registers at every stage to store the angle for a particular micro rotation. Every stage performs single micro rotation hence; i th stage performs i th micro rotation.  ... 
doi:10.5121/cseij.2012.2310 fatcat:3q5un5akdrfdbacxvy4ntklioa

A Dwarf-based Scalable Big Data Benchmarking Methodology [article]

Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Zhen Jia, Biwei Xie, Chen Zheng, Qiang Yang, Haibin Wang
2017 arXiv   pre-print
Our proxy benchmarks preserve the micro-architecture, memory, and I/O characteristics, and they shorten the simulation time by 100s times while maintain the average micro-architectural data accuracy above  ...  For the purpose of architecture simulation, we construct and tune big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic  ...  We can find that the memory bandwidth with sparse vectors is nearly half of the memory bandwidth with dense vectors, which confirms the data input's impacts on micro-architectural performance.  ... 
arXiv:1711.03229v1 fatcat:t6vgjqxomrhbdju225pfc5yreu

Multi-Scale Characterization for Micro-Architectures

David Raymont, Liang Hao, Philippe G Young
2011 Procedia Engineering  
Synthetic micro-architectures such as metal/polyurethane foams, computationally generated micro-architectures and composites are becoming increasingly popular for applications requiring tailored material  ...  For the case of large inhomogeneous micro-architectures direct computational simulations become intractable due to the resolution required to sufficiently represent the geometry.  ...  However, in an iterative optimization process where the 'performance' of the structure may be evaluated thousands of times the use of full FEA simulations becomes highly impractical.  ... 
doi:10.1016/j.proeng.2011.04.521 fatcat:kavrbbpgtnb5raifj2knwsgrxy
« Previous Showing results 1 — 15 out of 64,158 results