A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures
2012
SIGARCH Computer Architecture News
The proposed lane decoupling enables each SIMD lane to tolerate timing errors independent of other adjacent lanes, resulting in higher throughput and improved scalability. ...
Unfortunately, applying the same timing-speculative approach to wide-SIMD architectures, such as those used in highlyefficient GPUs, may not provide similar gains. ...
We thank Robert Pawlowski, Joseph Crop, and Jacob Postman from the Oregon State University for their effort to implement the test-chip and provide the measurements. ...
doi:10.1145/2366231.2337187
fatcat:szgtciutjzbsdims4ngglsrtxu
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures
2012
2012 39th Annual International Symposium on Computer Architecture (ISCA)
The proposed lane decoupling enables each SIMD lane to tolerate timing errors independent of other adjacent lanes, resulting in higher throughput and improved scalability. ...
Unfortunately, applying the same timing-speculative approach to wide-SIMD architectures, such as those used in highlyefficient GPUs, may not provide similar gains. ...
We thank Robert Pawlowski, Joseph Crop, and Jacob Postman from the Oregon State University for their effort to implement the test-chip and provide the measurements. ...
doi:10.1109/isca.2012.6237021
dblp:conf/isca/KrimerCE12
fatcat:ljoj7h6hcbfcvh46bwdpwdrzx4
Energy-Efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-Based Computing
2014
Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14
Together, we enable finegrained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback ...
Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving. ...
For instance, [3] decouples the SIMD lanes through private queues that prevent error events in any single lane from stalling all other lanes. ...
doi:10.1145/2593069.2593132
dblp:conf/dac/RahimiGLCBG14
fatcat:ssnbhugtebe55bq4cb24wc2kei
Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing
2014
2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)
Together, we enable finegrained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback ...
Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving. ...
For instance, [3] decouples the SIMD lanes through private queues that prevent error events in any single lane from stalling all other lanes. ...
doi:10.1109/dac.2014.6881522
fatcat:hw5e3uj4zfa77dhytiqboj5ceq
Runtime-Aware Architectures
[chapter]
2015
Lecture Notes in Computer Science
The runtime system of the parallel programming model has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores ...
When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). ...
Different maximum vector lengths (MVL) and lanes are considered.
Fig. 5 . 5 CG execution example with a single error occurring at the same time for all implemented mechanisms. ...
doi:10.1007/978-3-662-48096-0_2
fatcat:mx7lemvbwvgflotiwgstl4jzmy
Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software
2016
Proceedings of the IEEE
We provide a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. ...
First, we provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. ...
Recently, a 45 nm decoupled 10-lane SIMD processor utilizes Razor for every lane in the specific context of data-level parallel architectures [19] . ...
doi:10.1109/jproc.2016.2518864
fatcat:sxrsu3excbdg5p7sk4iczz262y
Near-Threshold Voltage Design Techniques for Heterogenous Manycore System-on-Chips
2020
Journal of Low Power Electronics and Applications
We discuss application of NTV design techniques, necessary for reliable operation over a wide supply voltage range—from nominal down to the NTV regime, and for a variety of IPs. ...
the threshold voltage (VT) of the CMOS transistors. ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/jlpea10020016
fatcat:wuwirnk4ljc7tjllpzc3ng7jei
Runtime Aware Architectures
2018
Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation - SIGSIM-PADS '18
The runtime of the parallel application has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. ...
When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). ...
Acknowledgments This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council ...
doi:10.1145/3200921.3204479
dblp:conf/pads/Cortes18
fatcat:ctgvsceil5cgxpba7hhoy5f3ae
Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging
2013
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013
This paper proposes a new model of functional units for variation-induced timing errors due to PVT variations and device Aging (PVTA). ...
We demonstrate the effectiveness of HFG on GPU architecture at two granularities of observation and adaptation: (i) fine-grained instruction-level; and (ii) coarse-grained kernel-level. ...
Decoupling SIMD queues in [8] prevent error events in any single lane from stalling all other lanes, thus enables each lane to tolerate errors independently. ...
doi:10.7873/date.2013.342
dblp:conf/date/RahimiBG13
fatcat:weh35ouuyba4dh2cr6d5mg5lym
Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP
2014
International Journal of Computer Applications
This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment ...
Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing ...
register file divided into 512 32-bit entries for each16- wide SIMD, where the SU can write, for example, the results of a comparison for each element in a batch. ...
doi:10.5120/17529-8100
fatcat:vc3r5elwpjek5kkj4xyxzap4xm
On-Board Decision Making in Space with Deep Neural Networks and RISC-V Vector Processors
2021
Journal of Aerospace Information Systems
The workload of DNNs for on-board image and telemetry analysis is analyzed, and the results are used to drive the preliminary design of a RISC-V vector processor to be employed as a generic platform to ...
The use of deep neural networks (DNNs) in terrestrial applications went from niche to widespread in a few years, thanks to relatively inexpensive hardware for both training and inference, and large datasets ...
Acknowledgments This work was supported by the European Space Agency under the NPI Program, Cobham Gaisler AB, and Delft University of Technology. ...
doi:10.2514/1.i010916
fatcat:u4kjrzl7ozaihoswvdsbn2ezoa
Fast and Accurate Error Simulation for CNNs against Soft Errors
[article]
2022
arXiv
pre-print
We compared our methodology against SASSIFI for the accuracy of functional error simulation w.r.t. fault injection, and against TensorFI in terms of speedup for the error simulation strategy. ...
The great quest for adopting AI-based computation for safety-/mission-critical applications motivates the interest towards methods for assessing the robustness of the application w.r.t. not only its training ...
When considering the high regularity of the SIMD architecture of a GPU, the setup allows one to obtain the same statistical results of an almost exhaustive injection campaign in the entire architecture ...
arXiv:2206.02051v1
fatcat:2iiyrlm7qncbro3u2n7t4j74ly
GPU-Based, LDPC Decoding for 5G and Beyond
2021
IEEE Open Journal of Circuits and Systems
In vRAN, the hardware computational resources will become decoupled from the specific computational functions in the RAN through virtualization, allowing for benefits such as load-balancing, improved scalability ...
In 5G New Radio (NR), low-density parity-check (LDPC) codes are included as the error correction codes (ECC) for the data channel. ...
However, this work also takes advantage of the AVX SIMD instructions for the architecture. ...
doi:10.1109/ojcas.2020.3042448
fatcat:dxiaef7eijarrlwzwo2h5mb6ri
Exa-Dune—Flexible PDE Solvers, Numerical Methods and Applications
[chapter]
2020
Lecture Notes in Computational Science and Engineering
Continuous improvement of the underlying hardware-oriented numerical methods have included GPU-based sparse approximate inverses, matrix-free sum-factorisation for high-order discontinuous Galerkin discretisations ...
In order to cope with the increased probability of hardware failures, one aim of the project was to add flexible, applicationoriented resilience capabilities into the framework. ...
Timings on a Haswell-EP (E5-2698v3, 16 cores, AVX2, 4 lanes). ...
doi:10.1007/978-3-030-47956-5_9
fatcat:iwfk3gsln5endcqe3uq42fzxwa
Hardware Developments I - A Survey Of State-Of-The-Art Hardware And Software
2016
Zenodo
Review of actual hardware and software solutions and recommendations to software vendors ...
using SIMD instructions and to describe how to create versions of functions that can be invoked across SIMD lanes. ...
At the time of writing, the Intel Xeon Phi product line so far includes two generations of architecture. ...
doi:10.5281/zenodo.929532
fatcat:cpuc7mplurcqtkunarlbitvdqu
« Previous
Showing results 1 — 15 out of 46 results