Filters








2,117 Hits in 10.2 sec

Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Antonio Flores, Manuel E. Acacio, Juan L. Aragón
2010 Journal of systems architecture  
Moreover, wires used in such interconnect can be designed with varying latency, bandwidth and power characteristics.  ...  In particular, our proposal is based on applying an address compression scheme that dynamically compresses the addresses within coherence messages allowing for a significant area slack.  ...  Acknowledgments This work has been jointly supported by the Spanish MEC and European Commission FEDER funds under grants ''Consolider Ingenio-2010 CSD2006-00046" and ''TIN2006-15516-C4-03", and also by  ... 
doi:10.1016/j.sysarc.2010.05.006 fatcat:rf35lf4tcjhbfcimkcdwywwqj4

Address Compression and Heterogeneous Interconnects for Energy-Efficient High-Performance in Tiled CMPs

Antonio Flores, Manuel E. Acacio, Juan L. Aragón
2008 2008 37th International Conference on Parallel Processing  
In this work, we present a proposal for performance-and energy-efficient message management in tiled CMPs that combines both address compression with a heterogeneous interconnect.  ...  Moreover, wires used in such interconnect can be designed with varying latency, bandwidth and power characteristics.  ...  Acknowledgments This work has been jointly supported by the Spanish MEC and European Commission FEDER funds under grants "Consolider Ingenio-2010 CSD2006-00046" and "TIN2006-15516-C4-03".  ... 
doi:10.1109/icpp.2008.33 dblp:conf/icpp/FloresAA08 fatcat:icsnub6qoba2fapgvr2lbe425a

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps [article]

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Saugata Ghose, Abhishek Bhowmick, Rachata Ausavarangnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, Onur Mutlu
2016 arXiv   pre-print
Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive  ...  We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck.  ...  Special thanks to Evgeny Bolotin and Kevin Hsieh for their feedback during various stages of this project.  ... 
arXiv:1602.01348v1 fatcat:qbzuknzcyncrticap55x4i5dhi

Highly compressed multi-pattern string matching on the cell broadband engine

Xinyan Zha, Daniele Paolo Scarpazza, Sartaj Sahni
2011 2011 IEEE Symposium on Computers and Communications (ISCC)  
To counter that, we propose a technique that employs compressed Aho-Corasick automata to perform fast, exact multipattern string matching with very large dictionaries.  ...  With its 9 cores per chip, the IBM Cell/Broadband Engine (Cell) can deliver an impressive amount of compute power and benefit the string-matching kernels of network security, business analytics and natural  ...  We present an optimized software design that exploits compressed AC automata to perform high-throughput multipattern string matching on the IBM Cell Broadband Engine.  ... 
doi:10.1109/iscc.2011.5983850 dblp:conf/iscc/ZhaSS11 fatcat:jkgqr5y3enenhfrmvrohhe4wtq

IBM POWER7 multicore server processor

B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner (+3 others)
2011 IBM Journal of Research and Development  
The IBM POWER A processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years.  ...  A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed.  ...  Acknowledgments This paper is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.  ... 
doi:10.1147/jrd.2011.2127330 fatcat:kztcasllyvgs5cuvzyf54myeyy

Practical Data Compression for Modern Memory Hierarchies [article]

Gennady Pekhimenko
2016 arXiv   pre-print
In this thesis, we describe a new, practical approach to integrating hardware-based data compression within the memory hierarchy, including on-chip caches, main memory, and both on-chip and off-chip interconnects  ...  First, we propose a new compression algorithm, Base-Delta-Immediate Compression (BDI), that achieves high compression ratio with very low compression/decompression latency.  ...  We then discuss prior work that aims to address different challenges in efficiently applying data compression. Low Power DRAM and Interconnects.  ... 
arXiv:1609.02067v1 fatcat:i4z7m2ydtjgwvlwmglno26nb54

Energy-aware Reprogramming of Sensor Networks Using Incremental Update and Compression

Milosh Stolikj, Pieter J.L. Cuijpers, Johan J. Lukkien
2012 Procedia Computer Science  
In this paper, we investigate the problem of improving energy efficiency and delay of reprogramming by using data compression and incremental updates.  ...  Our results show that the classic Lempel-Ziv-77 compression algorithm with Bsdiff for delta encoding has the best overall performance compared to other compression algorithms; on average reducing energy  ...  Acknowledgement The authors would like to thank Martijn van den Heuvel and Richard Verhoeven for their valuable discussions and improvements on this article.  ... 
doi:10.1016/j.procs.2012.06.026 fatcat:3ef7g7w55jcovnfcwwnnxxqepi

Impact of SCHC Compression and Fragmentation in LPWAN: A Case Study with LoRaWAN

Jesus Sanchez-Gomez, Jorge Gallego-Madrid, Ramon Sanchez-Iborra, José Santa, Antonio Fernando Skarmeta Gómez
2020 Sensors  
However, certain deployments such as thoseemploying Low-Power Wide-Area Network (LPWAN)-based technologies may present severenetwork restrictions in terms of throughput and supported packet length.  ...  For thatreason, the IETF's LPWAN working group has proposed a Static Context Header Compression (SCHC)scheme that permits compression and fragmentation of and IPv6/UDP/CoAP packets with the aimof making  ...  Then, three different fragment sizes were employed to test the performance of these levels of fragmentation: high, medium, and low.  ... 
doi:10.3390/s20010280 pmid:31947852 pmcid:PMC6982818 fatcat:m26lvftprnaenf55squolzinku

Energy-Efficient Design of the Reorder Buffer [chapter]

Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose
2002 Lecture Notes in Computer Science  
These techniques are: 1) dynamic ROB resizing; 2) the use of low-power comparators that dissipate energy mainly on a full match of the comparands and, 3) the use of zero-byte encoding.  ...  This paper proposes three relatively independent techniques for the ROB power reduction with no or minimal impact on the performance.  ...  Third, we noticed that high percentage of bytes within the data items travelling on the result, dispatch and commit buses contain all zeroes.  ... 
doi:10.1007/3-540-45716-x_29 fatcat:242nzaheyrcmbccys2xvsghvuq

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 SIGARCH Computer Architecture News  
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems.  ...  Targeted at the "low-end" of the architecture spectrum, the SS-5 contains a single-scalar MicroSparc CPU with single-level, small, on-chip caches (16KByte instruction, 8KByte data).  ...  Schenfeld, Sanjay Vishin, the engineers of the Sparc Technology Business organization and the reviewers.  ... 
doi:10.1145/232974.232984 fatcat:w5c3hi3725dpdpc76725f5pyqq

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems.  ...  Targeted at the "low-end" of the architecture spectrum, the SS-5 contains a single-scalar MicroSparc CPU with single-level, small, on-chip caches (16KByte instruction, 8KByte data).  ...  Schenfeld, Sanjay Vishin, the engineers of the Sparc Technology Business organization and the reviewers.  ... 
doi:10.1145/232973.232984 dblp:conf/isca/SaulsburyPN96 fatcat:ut72ah2zxzh73onrac3vems5aq

Robust header compression (ROHC) in next-generation network processors

D.E. Taylor, A. Herkersdorf, A. Doring, G. Dittmann
2005 IEEE/ACM Transactions on Networking  
Robust Header Compression (ROHC) provides for more efficient use of radio links for wireless communication in a packet switched network.  ...  We explore the design tradeoffs for hardware assists in the form of reconfigurable hardware, Application-Specific Instruction-set Processors (ASIPs), and Application-Specific Integrated Circuits (ASICs  ...  We also would like to thank Peter Buchmann for his assistance with CAD tool flows for the ASIC hardware assists evaluation.  ... 
doi:10.1109/tnet.2005.852887 fatcat:zvpswaywtjbxfbzkfanza6zqrq

Low-power processor architecture exploration for online biomedical signal analysis

A.Y. Dogan, J. Constantin, D. Atienza, A. Burg, L. Benini
2012 IET Circuits, Devices & Systems  
IMs for each core, a shared DM and an interconnection crossbar between the cores and the DM.  ...  The results show that with respect to the single-core architecture, the multi-core solution consumes 62% less power for high computation requirements (167 MOps/s), while consuming 46% more power for extremely  ...  The second reference benchmark, DWT-based data compression [3] , performs a 50% compression on a block of ECG data per lead similar to the CS-based data compression.  ... 
doi:10.1049/iet-cds.2012.0011 fatcat:fdjk6h7fczb4rhwuz22hx2obry

Energy-Efficient System-Level Design [chapter]

Luca Benini, Giovanni De Micheli
2002 Power Aware Design Methodologies  
The complexity of current and future integrated systems requires a paradigm shift towards component-based design techno logies that enable the integration of large computational cores, memory hierarchies  ...  and communication channel as well as system and application software onto a single chip.  ...  Instruction re-ordering for low -energy can be done by exploiting the degrees of freedom allowed by the partial order.  ... 
doi:10.1007/0-306-48139-1_16 fatcat:rikxlmoqmjfd3o3whfmbnvymwm

A compressed memory hierarchy using an indirect index cache

Erik G. Hallnor, Steven K. Reinhardt
2004 Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04  
We propose and analyze a memory hierarchy that increases both the effective capacity of memory structures and the effective bandwidth of interconnects by storing and transmitting data in compressed form  ...  Compressed bus transfers alone account for up to 59% of this improvement, with the remainder coming from increased effective cache capacity.  ...  Recently, a number of designs for cache data compression have been proposed. Most of these schemes have applied compression for power/energy savings rather than performance.  ... 
doi:10.1145/1054943.1054945 dblp:conf/wmpi/HallnorR04 fatcat:vtgte3gjpjcexbad6wqa2qma2q
« Previous Showing results 1 — 15 out of 2,117 results