16,904 Hits in 2.9 sec

Improving the throughput of synchronization by insertion of delays

R. Rajwar, A. Kagi, J.R. Goodman
Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550)  
Efficiency of synchronization mechanisms can limit the parallel performance of many shared-memory applications.  ...  Making use of the pervasiveness of the Load-Linked/Store-Conditional primitives, we present a series of hardware mechanisms to optimize performance for sharing patterns exhibited by locks and associated  ...  Finally, this research was supported in part by an NSF Grant No. CCR-9810114, NSF Grant No. CDA-9623632 (MID-SHIPS), and various support from Intel Corporation.  ... 
doi:10.1109/hpca.2000.824348 dblp:conf/hpca/RajwarKG00 fatcat:5lcpko7dxvfcrcpa42jir3r2om

Performance enhancement in phased logic circuits using automatic slack-matching buffer insertion

Kenneth Fazel, Lun Li, Mitch Thornton, Robert B. Reese, Cherrice Traver
2004 Proceedins of the 14th ACM Great Lakes symposium on VLSI - GLSVLSI '04  
Examples of the buffer insertion technique are given and the effectiveness of the technique is evaluated through a set of experimental results.  ...  Based on the analysis of results provided by the simulator and the topological characteristics of a PL circuit, an algorithm for automatic slack matching buffer placement is devised.  ...  The chief advantage offered by PL designs is the capability to provide performance improvement when comparing average case throughput to synchronous circuit throughput.  ... 
doi:10.1145/988952.989051 dblp:conf/glvlsi/FazelLTRT04 fatcat:vy2fnpy24bbtxc6ox7pgkthpfy

Performance optimization of elastic systems using buffer resizing and buffer insertion

Dmitry Bufistov, Jorge Julvez, Jordi Cortadella
2008 2008 IEEE/ACM International Conference on Computer-Aided Design  
Both techniques increase the storage capacity and can potentially contribute to improve the throughput of the system. Each technique offers a different tradeoff between area cost and latency.  ...  Buffer resizing and buffer insertion are two transformation techniques for the performance optimization of elastic systems.  ...  This research has been funded by a grant from Intel Corp., research projet CICYT TIN2007-66523, FPU grant AP2005-4866, and a Juan de la Cierva fellowship from the Spanish Ministry of Education and Science  ... 
doi:10.1109/iccad.2008.4681613 dblp:conf/iccad/BufistovJC08 fatcat:w3sjptrgxrgujppsu53nc5telq

Analysis and optimization of pausible clocking based GALS design

Xin Fan, Milos Krstic, Eckhard Grass
2009 2009 IEEE International Conference on Computer Design  
In this paper, we analyze the throughput reduction and synchronization failures introduced by the widely used pausible clocking scheme, and propose an optimized scheme for higher throughput and more reliable  ...  The local clock generator is improved to minimize the acknowledge latency, and a novel input port is applied to maximize the safe timing region for the clock tree insertion.  ...  ACKNOWLEDGEMENT This work has been supported by the European Project GALAXY under grant reference number FP7-ICT-214364 (  ... 
doi:10.1109/iccd.2009.5413130 dblp:conf/iccd/FanKG09 fatcat:byy7dpbm5jf7zltlc46i6pqpnu

Resynchronization for multiprocessor DSP systems

S.S. Bhattacharyya, S. Sriram, E.A. Lee
2000 IEEE Transactions on Circuits and Systems I Fundamental Theory and Applications  
requirements are ensured by other synchronizations in the system.  ...  The goal of resynchronization is to introduce new synchronizations in such a way that the number of original synchronizations that become redundant exceeds the number of new synchronizations that are added  ...  We refer to the source and sink actors of a DFG edge by and , we denote the delay on by delay , and we frequently represent by the ordered pair .  ... 
doi:10.1109/81.895327 fatcat:i2vc3kxgqrazxpo6as4n5f3g2i

Analysis of Min Sum Iterative Decoder using Buffer Insertion

Saravanan Swapna, M. Anbuselvi, S. Salivahanan
2012 International Journal of Computer Applications  
The designed architecture is optimized using Wave pipelining, specifically buffer insertion. Timing optimization is done with the proper placement of buffer, at the various paths of the architecture.  ...  The maximum and minimum delay path is analyzed in the architecture. The performance metrics such as the clock frequency, power and delay are analyzed.  ...  Pipelining of a circuit into N stages can result in speedup in throughput up to a factor of N. The inserted synchronizing elements increase the area and power consumption of the logic.  ... 
doi:10.5120/5601-7855 fatcat:yla6oe7kbvcxtitofbcovm42g4

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays

Jabulani Nyathi, Ray Robert Rydberg, Jose G. Delgado-Frias
2006 The ... Midwest Symposium on Circuits and Systems conference proceedings  
This paper explores some potential methods for reducing global interconnect delays and improving throughput between communicating modules.  ...  The design of the communication channel is based on the assumption that the computing elements employ synchronous clocking while the communication channels are driven by locally generated clocks.  ...  In addition to being able to break the RC delay of long interconnects the scheme supports multiple data waves and thus improves throughput.  ... 
doi:10.1109/mwscas.2006.382033 fatcat:2gn7t6n6bfa67p4ig467m4vlky

Fast Universal Synchronizers [chapter]

Rostislav Dobkin, Ran Ginosar
2009 Lecture Notes in Computer Science  
The most well-known synchronizer consists of two sequentially connected flip-flops that should eliminate the propagation of metastability into the receiver clock domain.  ...  Novel faster synchronizers are described next and their use and improved performance are explained. The fast synchronizer enable shorter data cycles, measuring only 2 to 4 clock cycles.  ...  Forward latencies of the simple and fast synchronizers are shorter than the FIFO's. In Fig. 12 , the throughput and latency depend linearly on the interconnect delay.  ... 
doi:10.1007/978-3-540-95948-9_20 fatcat:ndpze7wsejhnhljpukw7nmjvbm

Link pipelining strategies for an application-specific asynchronous NoC

Daniel Gebhardt, Junbok You, Kenneth S. Stevens
2011 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip - NOCS '11  
Results show improved large-message network latency and output buffer delay of the network adapter.  ...  There was a slight power increase with the addition of pipeline buffers, but our proposal is a complexity-effective improvement by the power*latency product metric.  ...  Acknowledgments This research is supported by the U.S. National Science Foundation under grant CCF-0810408, CCF-0702539, and Semiconductor Research Corporation under task 1817.001.  ... 
doi:10.1145/1999946.1999976 dblp:conf/nocs/GebhardtYS11 fatcat:uj4pjbdbubavlem27as7ar6msy

Improving RWA-OBS Formulation and Solution

Thomas Coutelen, Brigitte Jaumard, Gérard Hébuterne
2009 Proceedings of the 6th International ICST Conference on Broadband Communications, Networks, and Systems  
The efficiency of the column generation approach allows the integration of an additional constraint to control the compromise between the burst insertion delay and the throughput.  ...  We first enhance the original RWA-OBS formulation and then describe an iterative greedy heuristic and a column generation approach to improve the computing time and the scalability of the model.  ...  By the way, the impact of the insertion delay must be compared with the end-to-end delay achieved with retransmission of the dropped bursts.  ... 
doi:10.4108/icst.broadnets2009.7870 dblp:conf/broadnets/CoutelenJH09a fatcat:tr5oplfy3vb7dnnl3w7jdhf4rq

Wave-pipelined intra-chip signaling for on-FPGA communications

Terrence Mak, Pete Sedcole, Peter Y.K. Cheung, Wayne Luk
2010 Integration  
Based on the model, throughput and power consumption of a wave-pipelined link have been derived analytically and compared to the conventional synchronous links.  ...  It is shown that the wave-pipelined approach can achieve up to 5.7 times improvement in throughput and 13% improvement in power consumption versus conventional delay-based on-chip communication schemes  ...  Comparing wave-pipelining with delay-based signaling It is well know that interconnection throughput can be increased by inserting registers into the long line.  ... 
doi:10.1016/j.vlsi.2010.01.002 fatcat:pkbbox7cxzb4rn65uivmhd4gvi

How to Live with Uncertainties: Exploiting the Performance Benefits of Self-Timed Logic In Synchronous Design

G. Paci, A. Nackaerts, F. Catthoor, L. Benini, P. Marchal
2008 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools  
These systems require the highest possible energy efficiency of logic, which can only be achieved by operating in moderate inversion.  ...  Experimental results of our approach demonstrate performance benefits up to 2x and significant energy savings at low throughput rates.  ...  In a synchronous design, the system's throughput is defined by the clock frequency, set by the critical path for the worst-case corner.  ... 
doi:10.1109/dsd.2008.114 dblp:conf/dsd/PaciNCBM08 fatcat:pl5bnopk65fztn5crwv4axtwx4

Two-phase synchronization with sub-cycle latency

Rostislav (Reuven) Dobkin, Ran Ginosar
2009 Integration  
Simulations of best-and worst-case scenarios are presented which demonstrate the improved performance of the novel synchronizers.  ...  Synchronizers typically incur long latency of multiple-clock cycles, resulting in low throughput.  ...  Fig. 4 exemplifies the timing of such a synchronizer: the TX data is sampled on the first RX clock following the transfer. Interconnect delay affects both latency and throughput of the synchronizer.  ... 
doi:10.1016/j.vlsi.2008.11.006 fatcat:6gn5th56avht3jz7or3krjyshe

Elastic Circuits

J. Carmona, J. Cortadella, M. Kishinevsky, A. Taubin
2009 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design.  ...  Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.  ...  Recycling cannot always attain the same throughput improvement as buffer sizing, because a zero-marked forward arc is inserted that may degrade the ratio of the cycle involved. C.  ... 
doi:10.1109/tcad.2009.2030436 fatcat:6anbrdoea5hjhf3gsjp3myxvuq

An implementation of an asychronous FPGA based on LEDR/four-phase-dual-rail hybrid architecture

Yoshiya Komatsu, Shota Ishihara, Masanori Hariyama, Michitaka Kameyama
2011 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)  
The proposed FPGA is designed using the e-Shuttle 65nm CMOS process and the simulation result shows that the throughput is 3.91 GHz.  ...  Each logic block consists of LEDR-FPDR protocol converter, FPDR-LEDR protocol converter and two pipelined FPDR LUTs that alternately operate.  ...  Also, this work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with STARC, e-Shuttle, Inc., Fujitsu Ltd., Cadence Design Systems, Inc. and Synopsys, Inc  ... 
doi:10.1109/aspdac.2011.5722311 dblp:conf/aspdac/KomatsuIHK11 fatcat:ttheydymdnfrhlygh2egbetyda
« Previous Showing results 1 — 15 out of 16,904 results