Filters

29,569 Hits in 6.4 sec

Eliminating redundancies in sum-of-product array computations

Steven J. Deitz, Bradford L. Chamberlain, Lawrence Snyder
2001 Proceedings of the 15th international conference on Supercomputing - ICS '01
) S Number of Processors 16 1 1 1 1 1 1 1 1 1 1 1 1 S V 1 + 1 1 ...+ 1 1 + 1 1 S V 2 Sweep 1 Sweep 2 1 1 ...  ...  3.1 Normalized array statement sequences c ¥ 0 3 ) 1 U T @ ¦ 5 % ) U % ) g i ' % d @ ¤ ! T Q X 6 ¡ ¤ £ ( ¤ ª R 6 ¤ ¦ £ p £ e g ¡ g £ ¤ c i ¢ ¦ 0 t @ !  ...

Using of Redundant Signed-Digit Numeral System for Accelerating and Improving the Accuracy of Computer Floating-Point Calculations

Otsokov Sh. A, Magomedov Sh.G
2020 International Journal of Advanced Computer Science and Applications
The effect of accelerating computations is obtained for the problems of calculating the sum of an array of numbers and determining the dot product of vectors.  ...  The article proposes a method for software implementation of floating-point computations on a graphics processing unit (GPU) with an increased accuracy, which eliminates sharp increase in rounding errors  ...  If the array contains k numbers, then this summation method requires k-1 synchronizations in the process of summing this array.  ...

Subregion Analysis and Bounds Check Elimination for High Level Arrays [chapter]

Mackale Joyner, Zoran Budimlić, Vivek Sarkar
2011 Lecture Notes in Computer Science
For example, high-level arrays in the X10 language support rank-independent specification of multidimensional loop and array computations using regions and points.  ...  | R ::= restriction of array onto region R A.sum(), A.max() ::= sum/max of elements in array A1 A2 ::= result of applying point-wise op on A1 and A2, when A1.region = A2. region ( can include +, -, *  ...  For simplicity, an additional loop is introduced to compute the weighted sum using the elements in the stencil, but this loop could be replaced by a high level array sum() operation as well.  ...

M.D. Ercegovac, T. Lang
1990 IEEE transactions on computers
4] [5] [6] [7] Abstract-Conventional schemes for fast multiplication accumulate the partial products in redundant form (carry-save or signed-digit) and convert the result to conventional representation  ...  in the last step.  ...  Since a product of n bits is to be computed, those digits of the array that do not influence the result can be eliminated.  ...

GLORE: generalized loop redundancy elimination upon LER-notation

Yufei Ding, Xipeng Shen
2017 Proceedings of the ACM on Programming Languages
This paper presents GLORE, a novel approach to enabling the detection and removal of large-scope redundant computations in nested loops.  ...  GLORE works on LER-notation, a new representation of computations in both regular and irregular loops.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DOE or NSF.  ...

Algorithm-Based Fault Tolerance for Matrix Operations

Kuang-Hua Huang, Abraham
1984 IEEE transactions on computers
The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple copies of low-cost processors to provide a large amount of computational capability for a small cost.  ...  In addition to achieving high performance, high reliability is also important to ensure that the results of long computations are valid.  ...  of the computed sum of the row or column data elements and the checksum to the erroneous element in the information part, (ii) or by replacing the checksum by the computed sum of the information elements  ...

Compiling stencils in high performance Fortran

Gerald Roth, John Mellor-Crummey, Ken Kennedy, R. Gregg Brickner
1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97
from the translation of Fortran90 array constructs.  ...  For many F ortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of k ernels kno wn as stencils.  ...  Acknowledgments This work has been supported in part by the IBM Corporation, the Center for Research on Parallel Computation (an NSF Science and Technology Center), and DARPA Contract DABT63-92-C-0038.  ...

Fast Multiplication Based on Different Compressors IJIREEICE - Electrical, Electronics, Instrumentation and Control

Shalu George, Jinu Isaac Kuruvilla
2015 IJIREEICE
In many of digital systems like graphic processors, digital signal processors fast parallel multiplication using adder trees are present. To speed up the computation like addition is very important.  ...  This approach is defined in parameterizable HDL code, which makes it compatible with any FPGA family.  ...  Figure 2 show the CSA compute flow and Table 1 will show the CSA working. The computation can be in two steps, first we compute S and C using a CSA, and then we use CPA to compute the total sum.  ...

FPGA-Based Data Storage System on NAND Flash Memory in RAID 6 Architecture for In-Line Pipeline Inspection Gauges

N. A. Rodriguez-Olivares, A. Gomez-Hernandez, L. Nava-Balanzar, H. Jimenez-Hernandez, J. A. Soto-Cajiga
2018 IEEE transactions on computers
At the hardware level, we interleaved 8 NAND flash chips in a Redundant Array of Independent Disks (RAID) type-6 architecture.  ...  Our controller computes the ECC and redundancy bytes while it transfers the information to the cache register of the selected die in the memory chips.  ...  All authors would like to thank Joseph Moeller for his help in improving the English manuscript.  ...

Nanofabric PLA Architecture with Double Variable Redundancy

2007 2007 IEEE Region 5 Technical Conference
It has been shown that fundamental electronic crosspoint can be programmed ON or OFF by applying a structures such as Diodes, and FET's can be constructed using voltage Differential of 3.6V.  ...  OUR APPROACH: DOUBLE VARIABLE REDUNDANCY determines the working of the array as AND array or OR array, as seen in We have successfully simulated the configuration of this DVR in MATLAB to verify the  ...  We product-sum terms using DVR-based PLA. allocate two vertical Nanowires per Product (or Sum) term in i=l Pcp= 0.05 to 0.2 (Probability that a single crosspoint is non programmable) Expression (3) gives  ...

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.  ...  In this paper, we present an algorithm that starts with an operationminimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost  ...  of the product of several input arrays.  ...

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 SIGPLAN notices
Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.  ...  In this paper, we present an algorithm that starts with an operationminimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost  ...  of the product of several input arrays.  ...

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.  ...  In this paper, we present an algorithm that starts with an operationminimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost  ...  of the product of several input arrays.  ...

Arithmetic operators based on the binary stored-carry-or-borrow representation

Daniel Torno, Behrooz Parhami
2010 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers
In the latter design, the conventional initial AND matrix is transformed and expressed with a redundant radix-2 representation.  ...  Several BSCB arithmetic elements, including full-adder, ripple-carry adder, and carry-lookahead adder are presented, followed by detailed design of an array multiplier.  ...  Introduction Redundant number representations allow fast addition by eliminating the carry propagation chains [Aviz61] .  ...

Optimizing array bound checks using flow analysis

Rajiv Gupta
1993 ACM Letters on Programming Languages and Systems
The optimizations reduce the program execution time through elimination of checks and propagation of checks out of loops.  ...  Bound checks are introduced in programs for the run-time detection of array bound violations. Compile-time optimizations are employed to reduce the execution-time overhead due to bound checks.  ...  The range information is used to eliminate redundant bound checks on array subscripts.  ...
« Previous Showing results 1 — 15 out of 29,569 results