Eliminating redundancies in sum-of-product array computations

2001
Proceedings of the 15th international conference on Supercomputing - ICS '01
doi:10.1145/377792.377807
dblp:conf/ics/DeitzCS01
fatcat:hphzyfjawnhjlmejhrrcug2yii
Using of Redundant Signed-Digit Numeral System for Accelerating and Improving the Accuracy of Computer Floating-Point Calculations

2020
*
International Journal of Advanced Computer Science and Applications
*

The effect

doi:10.14569/ijacsa.2020.0110942
fatcat:45qklytg6bbzbhizkrabykwbwy
The article proposes a method for software implementation of floating-point computations on a graphics processing unit (GPU) with an increased accuracy, which eliminates sharp increase in rounding errors. If the array contains k numbers, then this summation method requires k-1 synchronizations in the process of summing this array.
###
Subregion Analysis and Bounds Check Elimination for High Level Arrays
[chapter]

2011
*
Lecture Notes in Computer Science
*

For example, high-level

doi:10.1007/978-3-642-19861-8_14
fatcat:vvondsmiknes3a6ha2opwe27vu
For example, high-level arrays in the X10 language support rank-independent specification of multidimensional loop and array computations using regions and points. For simplicity, an additional loop is introduced to compute the weighted sum using the elements in the stencil, but this loop could be replaced by a high level array sum() operation as well.
###
Fast multiplication without carry-propagate addition

1990
*
IEEE transactions on computers
*

Abstract-Conventional schemes for fast multiplication accumulate the partial

doi:10.1109/12.61047
fatcat:njn6aqomh5cvldx7qnnhn4xvtm
products in redundant form (carry-save or signed-digit) and convert the result to conventional representation in the last step. Since a product of n bits is to be computed, those digits of the array that do not influence the result can be eliminated.
###
GLORE: generalized loop redundancy elimination upon LER-notation

2017
*
Proceedings of the ACM on Programming Languages
*

This paper presents GLORE, a novel approach to enabling the detection and removal

doi:10.1145/3133898
dblp:journals/pacmpl/DingS17
fatcat:jeam5uobn5cincs2lbqcp2jfby
of large-scope redundant computations in nested loops. GLORE works on LER-notation, a new representation of computations in both regular and irregular loops.
###
Algorithm-Based Fault Tolerance for Matrix Operations

1984
*
IEEE transactions on computers
*

The rapid progress

doi:10.1109/tc.1984.1676475
fatcat:esqcnwz4nff7xbbxbaisezj2jm
in VLSI technology has reduced the cost of hardware, allowing multiple copies of low-cost processors to provide a large amount of computational capability for a small cost. In addition to achieving high performance, high reliability is also important to ensure that the results of long computations are valid.
###
Compiling stencils in high performance Fortran

1997
*
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97
*

from the translation

doi:10.1145/509593.509605
dblp:conf/sc/RothMKB97
fatcat:f5vd27hdvvhu7cdbvay6y57tym
of Fortran90 array constructs. For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils.
###
Fast Multiplication Based on Different Compressors

IJIREEICE - Electrical, Electronics, Instrumentation and Control

2015
*
IJIREEICE
*

IJIREEICE - Electrical, Electronics, Instrumentation and Control

In many

of digital systems like graphic processors, digital signal processors fast parallel multiplication using adder trees are present. To speed up the

computation like addition is very important. This approach is defined

in parameterizable HDL code, which makes it compatible with any FPGA family. Figure 2 show the CSA

compute flow and Table 1 will show the CSA working. The

computation can be

in two steps, first we

compute S and C using a CSA, and then we use CPA to

compute the total

sum.

##
###
FPGA-Based Data Storage System on NAND Flash Memory in RAID 6 Architecture for In-Line Pipeline Inspection Gauges

2018
*
IEEE transactions on computers
*

At the hardware level, we interleaved 8 NAND flash chips

doi:10.1109/tc.2018.2794986
fatcat:p5uqu2wpifeizgefxzqrhey2qi
in a Redundant Array of Independent Disks (RAID) type-6 architecture. Our controller computes the ECC and redundancy bytes while it transfers the information to the cache register of the selected die in the memory chips.
###
Nanofabric PLA Architecture with Double Variable Redundancy

2007
*
2007 IEEE Region 5 Technical Conference
*

It has been shown that fundamental electronic crosspoint can be programmed ON or OFF by applying a voltage Differential

doi:10.1109/tpsd.2007.4380347
fatcat:sn2f5lhzhrbzzdngtmeaub4cvq
of 3.6V. We have successfully simulated the configuration of this DVR in MATLAB to verify the product-sum terms using DVR-based PLA.
###
Space-time trade-off optimization for a class of electronic structure calculations

2002
*
Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
*

Its utility is demonstrated by applying it to a

doi:10.1145/512529.512551
dblp:conf/pldi/CociorvaBLSRNBH02
fatcat:hdy6zbuuhrggjf7kwlalazbcmi
computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory. In this paper, we present an algorithm that starts with an operationminimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost of the product of several input arrays.
###
###
###
Arithmetic operators based on the binary stored-carry-or-borrow representation

2010
*
2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers
*

In the latter design, the conventional initial AND matrix is transformed and expressed with a

redundant radix-2 representation. Several BSCB arithmetic elements, including full-adder, ripple-carry adder, and carry-lookahead adder are presented, followed by detailed design

of an

array multiplier. Introduction

Redundant number representations allow fast addition by

eliminating the carry propagation chains [Aviz61].

##
###
Optimizing array bound checks using flow analysis

1993
*
ACM Letters on Programming Languages and Systems
*

The optimizations reduce the program execution time through

doi:10.1145/176454.176507
fatcat:r4wbngdpfvhb5jzq5u4rjwzsm4
elimination of checks and propagation of checks out of loops. Bound checks are introduced in programs for the run-time detection of array bound violations. Compile-time optimizations are employed to reduce the execution-time overhead due to bound checks. The range information is used to eliminate redundant bound checks on array subscripts.
