Filters








6,206 Hits in 5.3 sec

Controlling and sequencing a heavily pipelined floating-point operator

André Seznec, Karl Courtel
1992 ACM SIGMICRO Newsletter  
HAL Id: inria-00075007 https://hal.inria.fr/inria-00075007 Submitted on 24 May 2006 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents,  ...  The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.  ... 
doi:10.1145/144965.145008 fatcat:ehecjdcpkjbnfmwj573b6arbxu

A practical measure of FPGA floating point acceleration for High Performance Computing

John D. Cappello, Dave Strenski
2013 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors  
by the Xilinx Floating Point Operator tool.  ...  Similarly, a floating point adder requires three DSP48s. Together, these operators make up a floating point MAC unit encompassing 14 DSP48s.  ... 
doi:10.1109/asap.2013.6567570 dblp:conf/asap/CappelloS13 fatcat:5q4x7j3isfasrkvqtziwaepv4u

Formally verified 32- and 64-bit integer division using double-precision floating-point arithmetic [article]

David Monniaux
2022 arXiv   pre-print
We advocate instead using the processor's floating-point unit, and propose code that the compiler can easily interleave with other computations.  ...  We fully proved the correctness of our algorithm, which mixes floating-point and fixed-bitwidth integer computations, using the Coq proof assistant and successfully integrated it into the CompCert formally  ...  Even if all other operations (integer, floating-point, memory. . . ) are fully pipelined, division is typically handled differently: only one division can be handled at a given time (no pipelining), and  ... 
arXiv:2207.08420v1 fatcat:7phbzwxucjdedkblynmqtbvkia

A closer look at GPUs

Kayvon Fatahalian, Mike Houston
2008 Communications of the ACM  
A GPU texture-filtering unit accepts a point within the texture's parameterization (represented by a floating-point tuple, such as {.5,.75}) and loads array values surrounding the coordinate from memory  ...  A GPU's processing resources and accompanying memory system are heavily optimized to execute large numbers of operations in parallel.  ... 
doi:10.1145/1400181.1400197 fatcat:alxras5gpbagnbwkmbxgcumoyu

Intel® Itanium® floating-point architecture

Marius Cornea, John Harrison, Ping Tak Peter Tang
2003 Proceedings of the 2003 workshop on Computer architecture education Held in conjunction with the 30th International Symposium on Computer Architecture - WCAE '03  
The present paper focuses on the floating-point architecture of the Itanium processor family, and points out a few remarkable features suitable to be the focus of a lecture, lab session, or project in  ...  Launched in 2001, the Intel Itanium processor was followed in 2002 by the Itanium 2 processor, with increased integer and floating-point performance.  ...  A 64-bit Floating-Point Status Register (FPSR) controls floating-point operations and records exceptions that occur.  ... 
doi:10.1145/1275521.1275526 dblp:conf/wcae/CorneaHT03 fatcat:bdfttqljtfewbctaja3ed5rtey

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads [article]

Florian Zaruba, Fabian Schuiki, Torsten Hoefler, Luca Benini
2020 arXiv   pre-print
The FREP extension decouples the floating-point and integer pipeline by sequencing instructions from a micro-loop buffer.  ...  : stream semantic registers (SSR) and a floating-point repetition instruction (FREP).  ...  ACKNOWLEDGMENTS This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement number 732631, project "OPRECOMP".  ... 
arXiv:2002.10143v1 fatcat:jrugjgr4yzdyro4tka3czt6x64

GPUs

Kayvon Fatahalian, Mike Houston
2008 Queue  
Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D lookand-feel of a modern window manager.  ...  between GPUs and CPUs begins to blur, it's important to understand what makes GPUs tick.  ...  A GPU texture-filtering unit accepts a point within the texture's parameterization (represented by a floating-point tuple, such as {.5,.75}) and loads array values surrounding the coordinate from memory  ... 
doi:10.1145/1365490.1365498 fatcat:kj6hngzwpzayjimauglzxna2xy

A floating-point accumulator for FPGA-based high performance computing applications

Song Sun, Joseph Zambreno
2009 2009 International Conference on Field-Programmable Technology  
A floating-point accumulator for FPGA-based high performance computing applications is proposed and evaluated.  ...  In this paper, we describe how the adder accumulator operator can be heavily pipelined to achieve a high clock speed when mapped to FPGA technology, while still maintaining the original input ordering.  ...  Designing a floating-point accumulator with pipeline stalls is trivial. One straightforward way is to use a standard k-stage floating-point adder as shown in Figure 2 (b).  ... 
doi:10.1109/fpt.2009.5377624 fatcat:pqbx74i6ina3dknxmu6mdtcwvq

Transparent mode flip-flops for collapsible pipelines

Eric L. Hill, Mikko H. Lipasti
2007 2007 25th International Conference on Computer Design  
Transparency is achieved either by decoupling the master and slave clocks to keep both latches transparent, or by using a bypass mux that routes around the flip-flop.  ...  Detailed analysis shows that the decoupled clock flipflop is the most attractive in terms of energy and delay.  ...  It is possible for a workload to asymmetrically issue operations to the floating point unit such that the overall utilization is low, but the majority of operations arrive in back-toback cycles (meaning  ... 
doi:10.1109/iccd.2007.4601952 dblp:conf/iccd/HillL07 fatcat:gzk44o6yzbgrpfukfleulmfvjy

The reconfigurable arithmetic processor

S. Fiske, W. J. Dally
1988 SIGARCH Computer Architecture News  
Its datapath is designed to sustain high rates of floating-point operations, while requiring only a fraction of the I/O bandwidth required by a conventional floating-point datapath.  ...  Average floating-point performance is 3.40 Millions of Floating-point operations per second (MFlops).  ...  Pipelining RAPs Reorganizing Control and Operation of the Data Path Reorganizing how the RAP datapath is controlled can lead to both improved bandwidth performance, and to improved floating-point performance  ... 
doi:10.1145/633625.52404 fatcat:vawacqazgrhz7atywocqlrsyfy

Characterization of simultaneous multithreading (SMT) efficiency in POWER5

H. M. Mathis, A. E. Mericas, J. D. McCalpin, R. J. Eickemeyer, S. R. Kunkel
2005 IBM Journal of Research and Development  
Because SMT has the potential of increasing processor efficiency and correspondingly increasing the amount of work done for a given time span, the reader might suppose that SMT would exhibit a performance  ...  In SMT mode, the processor resources-register sets, caches, queues, translation buffers, and the system memory nest-must be shared by both threads, and conditions can occur that degrade or even obviate  ...  Fixed-point execution (FXU) pipelines The POWER5 processor contains two fixed-point execution pipelines, and both are capable of multiplication and basic arithmetic, logical and shifting operations.  ... 
doi:10.1147/rd.494.0555 fatcat:ksza4m4i2zay3ja2x334mr672i

Alpha AXP architecture

Richard L. Sites
1993 Communications of the ACM  
The DECchip 21064 runs multiple operating systems and runs native-compiled programs that were translated from the VAX and MIPS architectures.  ...  Capability to run both VMS and UNIX operating systems Easy migration from VAX and MIPS architectures These goals directly influenced our key decisions in designing the architecture.  ...  A combined file is somewhat more flexible, especially for programs that are heavily skewed toward integer-only or floating-point-only computation.  ... 
doi:10.1145/151220.151226 fatcat:b7rg4jqlejgkbaoiof2leaenea

Examining the Viability of FPGA Supercomputing

Stephen Craven, Peter Athanas
2007 EURASIP Journal on Embedded Systems  
This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floatingpoint performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general  ...  For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) can produce significant performance improvements over processors, leading some in academia and  ...  The design is pipelined to a depth of 12, permitting operation at a frequency up to 200 MHz.  ... 
doi:10.1155/2007/93652 fatcat:zod32z5cdjbk7ntcidcwildgcq

Examining the Viability of FPGA Supercomputing

Stephen Craven, Peter Athanas
2007 EURASIP Journal on Embedded Systems  
This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floatingpoint performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general  ...  For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) can produce significant performance improvements over processors, leading some in academia and  ...  The design is pipelined to a depth of 12, permitting operation at a frequency up to 200 MHz.  ... 
doi:10.1186/1687-3963-2007-093652 fatcat:3dwi6pyj6bcmzcfmcq7o3pdkhu

Scientific Computing on the Itanium® Processor

Bruce Greer, John Harrison, Greg Henry, Wei Li, Peter Tang
2002 Scientific Programming  
Features such as extensive arithmetic support, predication, speculation, and explicit parallelism can be used to provide a sound infrastructure for supercomputing.  ...  A large number of high-performance computer companies are offering Itanium® -based systems, some capable of peak performance exceeding 50 GFLOPS.  ...  x × y + z in a single floating-point operation with no intermediate rounding of the product.  ... 
doi:10.1155/2002/193478 fatcat:zyskz2m4efgsnifoiivhhmgeca
« Previous Showing results 1 — 15 out of 6,206 results