Filters








46 Hits in 5.6 sec

Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures

Evgeni Krimer, Patrick Chiang, Mattan Erez
2012 SIGARCH Computer Architecture News  
The proposed lane decoupling enables each SIMD lane to tolerate timing errors independent of other adjacent lanes, resulting in higher throughput and improved scalability.  ...  Unfortunately, applying the same timing-speculative approach to wide-SIMD architectures, such as those used in highlyefficient GPUs, may not provide similar gains.  ...  We thank Robert Pawlowski, Joseph Crop, and Jacob Postman from the Oregon State University for their effort to implement the test-chip and provide the measurements.  ... 
doi:10.1145/2366231.2337187 fatcat:szgtciutjzbsdims4ngglsrtxu

Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures

Evgeni Krimer, Patrick Chiang, Mattan Erez
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
The proposed lane decoupling enables each SIMD lane to tolerate timing errors independent of other adjacent lanes, resulting in higher throughput and improved scalability.  ...  Unfortunately, applying the same timing-speculative approach to wide-SIMD architectures, such as those used in highlyefficient GPUs, may not provide similar gains.  ...  We thank Robert Pawlowski, Joseph Crop, and Jacob Postman from the Oregon State University for their effort to implement the test-chip and provide the measurements.  ... 
doi:10.1109/isca.2012.6237021 dblp:conf/isca/KrimerCE12 fatcat:ljoj7h6hcbfcvh46bwdpwdrzx4

Energy-Efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-Based Computing

Abbas Rahimi, Amirali Ghofrani, Miguel Angel Lastras-Montano, Kwang-Ting Cheng, Luca Benini, Rajesh K. Gupta
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
Together, we enable finegrained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback  ...  Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving.  ...  For instance, [3] decouples the SIMD lanes through private queues that prevent error events in any single lane from stalling all other lanes.  ... 
doi:10.1145/2593069.2593132 dblp:conf/dac/RahimiGLCBG14 fatcat:ssnbhugtebe55bq4cb24wc2kei

Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing

Abbas Rahimi, Amirali Ghofrani, Miguel Angel Lastras-Montano, Kwang-Ting Cheng, Luca Benini, Rajesh K. Gupta
2014 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)  
Together, we enable finegrained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback  ...  Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving.  ...  For instance, [3] decouples the SIMD lanes through private queues that prevent error events in any single lane from stalling all other lanes.  ... 
doi:10.1109/dac.2014.6881522 fatcat:hw5e3uj4zfa77dhytiqboj5ceq

Runtime-Aware Architectures [chapter]

Marc Casas, Miquel Moreto, Lluc Alvarez, Emilio Castillo, Dimitrios Chasapis, Timothy Hayes, Luc Jaulmes, Oscar Palomar, Osman Unsal, Adrian Cristal, Eduard Ayguade, Jesus Labarta (+1 others)
2015 Lecture Notes in Computer Science  
The runtime system of the parallel programming model has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores  ...  When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA).  ...  Different maximum vector lengths (MVL) and lanes are considered. Fig. 5 . 5 CG execution example with a single error occurring at the same time for all implemented mechanisms.  ... 
doi:10.1007/978-3-662-48096-0_2 fatcat:mx7lemvbwvgflotiwgstl4jzmy

Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software

Abbas Rahimi, Luca Benini, Rajesh K. Gupta
2016 Proceedings of the IEEE  
We provide a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software.  ...  First, we provide a review of key concepts with particular emphasis on timing errors caused by various variability sources.  ...  Recently, a 45 nm decoupled 10-lane SIMD processor utilizes Razor for every lane in the specific context of data-level parallel architectures [19] .  ... 
doi:10.1109/jproc.2016.2518864 fatcat:sxrsu3excbdg5p7sk4iczz262y

Near-Threshold Voltage Design Techniques for Heterogenous Manycore System-on-Chips

Sriram Vangal, Somnath Paul, Steven Hsu, Amit Agarwal, Ram Krishnamurthy, James Tschanz, Vivek De
2020 Journal of Low Power Electronics and Applications  
We discuss application of NTV design techniques, necessary for reliable operation over a wide supply voltage range—from nominal down to the NTV regime, and for a variety of IPs.  ...  the threshold voltage (VT) of the CMOS transistors.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/jlpea10020016 fatcat:wuwirnk4ljc7tjllpzc3ng7jei

Runtime Aware Architectures

Mateo Valero Cortes
2018 Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation - SIGSIM-PADS '18  
The runtime of the parallel application has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have.  ...  When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA).  ...  Acknowledgments This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council  ... 
doi:10.1145/3200921.3204479 dblp:conf/pads/Cortes18 fatcat:ctgvsceil5cgxpba7hhoy5f3ae

Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging

Abbas Rahimi, Luca Benini, Rajesh K. Gupta
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013  
This paper proposes a new model of functional units for variation-induced timing errors due to PVT variations and device Aging (PVTA).  ...  We demonstrate the effectiveness of HFG on GPU architecture at two granularities of observation and adaptation: (i) fine-grained instruction-level; and (ii) coarse-grained kernel-level.  ...  Decoupling SIMD queues in [8] prevent error events in any single lane from stalling all other lanes, thus enables each lane to tolerate errors independently.  ... 
doi:10.7873/date.2013.342 dblp:conf/date/RahimiBG13 fatcat:weh35ouuyba4dh2cr6d5mg5lym

Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP

Alexandru Voicu
2014 International Journal of Computer Applications  
This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment  ...  Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing  ...  register file divided into 512 32-bit entries for each16- wide SIMD, where the SU can write, for example, the results of a comparison for each element in a batch.  ... 
doi:10.5120/17529-8100 fatcat:vc3r5elwpjek5kkj4xyxzap4xm

On-Board Decision Making in Space with Deep Neural Networks and RISC-V Vector Processors

Stefano Di Mascio, Alessandra Menicucci, Eberhard Gill, Gianluca Furano, Claudio Monteleone
2021 Journal of Aerospace Information Systems  
The workload of DNNs for on-board image and telemetry analysis is analyzed, and the results are used to drive the preliminary design of a RISC-V vector processor to be employed as a generic platform to  ...  The use of deep neural networks (DNNs) in terrestrial applications went from niche to widespread in a few years, thanks to relatively inexpensive hardware for both training and inference, and large datasets  ...  Acknowledgments This work was supported by the European Space Agency under the NPI Program, Cobham Gaisler AB, and Delft University of Technology.  ... 
doi:10.2514/1.i010916 fatcat:u4kjrzl7ozaihoswvdsbn2ezoa

Fast and Accurate Error Simulation for CNNs against Soft Errors [article]

Cristiana Bolchini and Luca Cassano and Antonio Miele and Alessandro Toschi
2022 arXiv   pre-print
We compared our methodology against SASSIFI for the accuracy of functional error simulation w.r.t. fault injection, and against TensorFI in terms of speedup for the error simulation strategy.  ...  The great quest for adopting AI-based computation for safety-/mission-critical applications motivates the interest towards methods for assessing the robustness of the application w.r.t. not only its training  ...  When considering the high regularity of the SIMD architecture of a GPU, the setup allows one to obtain the same statistical results of an almost exhaustive injection campaign in the entire architecture  ... 
arXiv:2206.02051v1 fatcat:2iiyrlm7qncbro3u2n7t4j74ly

GPU-Based, LDPC Decoding for 5G and Beyond

Chance Tarver, Matthew Tonnemacher, Hao Chen, Jianzhong Zhang, Joseph R. Cavallaro
2021 IEEE Open Journal of Circuits and Systems  
In vRAN, the hardware computational resources will become decoupled from the specific computational functions in the RAN through virtualization, allowing for benefits such as load-balancing, improved scalability  ...  In 5G New Radio (NR), low-density parity-check (LDPC) codes are included as the error correction codes (ECC) for the data channel.  ...  However, this work also takes advantage of the AVX SIMD instructions for the architecture.  ... 
doi:10.1109/ojcas.2020.3042448 fatcat:dxiaef7eijarrlwzwo2h5mb6ri

Exa-Dune—Flexible PDE Solvers, Numerical Methods and Applications [chapter]

Peter Bastian, Mirco Altenbernd, Nils-Arne Dreier, Christian Engwer, Jorrit Fahlke, René Fritze, Markus Geveler, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, Jan Mohring, Steffen Müthing (+4 others)
2020 Lecture Notes in Computational Science and Engineering  
Continuous improvement of the underlying hardware-oriented numerical methods have included GPU-based sparse approximate inverses, matrix-free sum-factorisation for high-order discontinuous Galerkin discretisations  ...  In order to cope with the increased probability of hardware failures, one aim of the project was to add flexible, applicationoriented resilience capabilities into the framework.  ...  Timings on a Haswell-EP (E5-2698v3, 16 cores, AVX2, 4 lanes).  ... 
doi:10.1007/978-3-030-47956-5_9 fatcat:iwfk3gsln5endcqe3uq42fzxwa

Hardware Developments I - A Survey Of State-Of-The-Art Hardware And Software

Daniel Borgis, Liang Liang, Leon Petit, Michael Lysaght, Alan O'Cais
2016 Zenodo  
Review of actual hardware and software solutions and recommendations to software vendors  ...  using SIMD instructions and to describe how to create versions of functions that can be invoked across SIMD lanes.  ...  At the time of writing, the Intel Xeon Phi product line so far includes two generations of architecture.  ... 
doi:10.5281/zenodo.929532 fatcat:cpuc7mplurcqtkunarlbitvdqu
« Previous Showing results 1 — 15 out of 46 results