A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Exploiting a Computation Reuse Cache to Reduce Energy in Network Processors
[chapter]
2005
Lecture Notes in Computer Science
Caches on the other hand are meant to help latency not throughput in a traditional processor, and provide no additional throughput for a balanced network processor design. ...
This is why most high end routers do not use caches for their data plane algorithms. In this paper we examine how to use a cache for a balanced high bandwidth network processor. ...
This work was funded in part by NSF grant CNS-0509546, and grants from Microsoft and Intel Corporation to the University of California, San Diego and NSF grant CCF-0208756, and grants from Intel Corp., ...
doi:10.1007/11587514_17
fatcat:awyd3uttwrdqjdocxpg3msj3ma
On the effectiveness of prefetching and reuse in reducing L1 data cache traffic
2004
Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04
and (ii) load Instruction Reuse (IR) -in reducing data cache traffic. ...
matching engine found in many network processors. ...
In this paper, we compare two techniques -prefetching and Instruction Reuse [19] -in terms of their ability to reduce L1 data cache traffic in a popular network IDS called Snort [16] . ...
doi:10.1145/1054943.1054955
dblp:conf/wmpi/SurendraBN04
fatcat:7526wjzttng7hmtpllpfktwxtm
On Improving Efficiency and Utilization of Last Level Cache in Multicore Systems
2018
Information Technology and Control
Maintaining energy efficient system is a crucial challenge for multicore processors. ...
With the increasing need of computational power the trend towards multicore processors is ubiquitous. ...
LLC spend a larger fraction of their energy in the form of leakage energy and hence need techniques which work by turning off a part of the cache to reduce the leakage energy consumption. ...
doi:10.5755/j01.itc.47.3.18433
fatcat:pgrmyliv3ra5vjlkqqv3vhuudu
An Energy-Efficient Processor Architecture for Embedded Systems
2008
IEEE computer architecture letters
The data register organization captures reuse and locality in different levels of the hierarchy to reduce the cost of delivering data. ...
The processor architecture uses instruction registers to reduce the cost of delivering instructions, and a hierarchical and distributed data register organization to deliver data. ...
Data Supply The distributed and hierarchical data register organization exploits reuse and locality in computations to satisfy most references from the operand register files (ORFs) located at the inputs ...
doi:10.1109/l-ca.2008.1
fatcat:efpogiee7nhu7jlg22awd6c6hm
Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study †
2017
Journal of Low Power Electronics and Applications
Data reuse exploration aims at reducing the pressure on the memory subsystem by exploiting the temporal locality in data accesses. ...
In this paper, we investigate the effects on performance and energy from a data reuse methodology combined with parallelization and vectorization in multi-and many-core processors. ...
Author Contributions: All authors contributed extensively to the work presented in this paper. ...
doi:10.3390/jlpea7010005
fatcat:grbddqazojasvgscajioyyrtsq
a small footprint of 3.02 mm 2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. ...
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). ...
While a cache is an excellent storage structure for a general-purpose processor, it is a sub-optimal way to exploit reuse because of the cache access overhead (tag check, associativity, line size, speculative ...
doi:10.1145/2541940.2541967
dblp:conf/asplos/ChenDSWWCT14
fatcat:ersjbr5ovrbybifa3fzj322pbi
Low Power Coarse-Grained Reconfigurable Instruction Set Processor
[chapter]
2003
Lecture Notes in Computer Science
Preliminary results show that the presented coarse-grained processor can achieve on average 2.5x the performance of a RISC processor at an 18% overhead in energy consumption. ...
In this paper, we present a novel coarse-grained reconfigurable processor and study its power consumption. ...
Acknowledgements This work is in part supported by MESA under MEDEA+. ...
doi:10.1007/978-3-540-45234-8_23
fatcat:4usoc63ulra2df3jx6n6yunlxy
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
[article]
2018
arXiv
pre-print
This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. ...
Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. ...
This work was supported in part by the NSF CAREER-1652294 award, and Intel gift award. ...
arXiv:1805.03718v1
fatcat:d72fse5przg43h5ojhqydsl64i
Non-uniform power access in large caches with low-swing wires
2009
2009 International Conference on High Performance Computing (HiPC)
The proposed mechanisms reduce cache bank energy by 42% while incurring a minor 1% drop in performance. ...
While there have been a number of proposals to minimize energy consumption in the inter-bank network, very little attention has been paid to the optimization of intra-bank network power that contributes ...
All of the above schemes do little to reduce energy in the Htree, a major contributor to cache energy. ...
doi:10.1109/hipc.2009.5433222
dblp:conf/hipc/UdipiMB09
fatcat:e4qmsg74wjcbzctvrr6dbgqzly
Toward application-specific memory reconfiguration for energy efficiency
2013
Proceedings of the 1st International Workshop on Energy Efficient Supercomputing - E2SC '13
The end of Dennard scaling has made energy-efficiency a critical challenge in the continued increase of computing performance. ...
Finally, as a first step towards automatic reconfiguration, we explore application characterization via reuse distance as a guide to select the best memory hierarchy configuration; we show that reuse distance ...
This work was supported in part by the DOE Office of Science through the Advanced Scientific Computing Research (ASCR) award titled "Thrifty: An Exascale Architecture for Energy-Proportional Computing" ...
doi:10.1145/2536430.2536434
dblp:conf/sc/CicottiCC13
fatcat:ssw2vucenzdm7fk2452j5p4z3i
Unified performance and power modeling of scientific workloads
2013
Proceedings of the 1st International Workshop on Energy Efficient Supercomputing - E2SC '13
The end of Dennard scaling has made energy-efficiency a critical challenge in the continued increase of computing performance. ...
Finally, as a first step towards automatic reconfiguration, we explore application characterization via reuse distance as a guide to select the best memory hierarchy configuration; we show that reuse distance ...
This work was supported in part by the DOE Office of Science through the Advanced Scientific Computing Research (ASCR) award titled "Thrifty: An Exascale Architecture for Energy-Proportional Computing" ...
doi:10.1145/2536430.2536435
dblp:conf/sc/SongBK13
fatcat:al4dkkcccrettiv3cmaacktety
Exploiting temporal loads for low latency and high bandwidth memory
2005
IEE Proceedings - Computers and digital Techniques
The paper proposes a novel technique, called the 'temporal load cache architecture', to reduce load latencies and provide higher memory bandwidths. ...
When a load is predicted to be temporal, the data predicted to be accessed by it are read early in the pipeline from a small temporal load cache that stores the temporal data. ...
This is mainly due to the reduced activity in the clock network and instruction window (note that they are dominant consumers of dynamic energy in current highperformance processors [26] ). ...
doi:10.1049/ip-cdt:20045124
fatcat:gspxg53qa5cpboqc5vl5f2vnrq
Runtime-Aware Architectures: A First Approach
2014
Supercomputing Frontiers and Innovations
ILP) in superscalar processors. ...
In this paper, we introduce a first approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective. ...
This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, the HiPEAC Network of Excellence, and by the European Research Council under the European ...
doi:10.14529/jsfi140102
fatcat:4bh33566cfbz7iylsf2ufppsfa
Location-aware cache management for many-core processors with deep cache hierarchy
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
Our instructions provide a 1.07× speedup and a 1.24× energy efficiency boost, on average, according to simulations on a 64-core system with private L1 and L2 caches. ...
With a large shared L3 cache added, the benefits increase, providing 1.33× energy reduction on average. ...
Acknowledgements The authors would like to thank Samantika Subramaniam and Rob F. Van der Wijngaart for discussion during the initial stage of our project. ...
doi:10.1145/2503210.2503224
dblp:conf/sc/ParkYKHK13
fatcat:yvtqvwtg3rbnbcfgdbamqq5dy4
Load Miss Prediction - Exploiting Power Performance Trade-offs
2007
2007 IEEE International Parallel and Distributed Processing Symposium
However, cache hierarchies do not necessarily benefit sparse scientific computing codes, which tend to have limited data locality and reuse. ...
We therefore propose a new memory architecture with a Load Miss Predictor (LMP), which includes a data bypass cache and a predictor table, to reduce access latencies by determining whether a load should ...
This allows better efficiency to be maintained upon scaling to multiple processors where network latencies can dominate. ...
doi:10.1109/ipdps.2007.370536
dblp:conf/ipps/MalkowskiLRI07
fatcat:prn3i7s4yvgh5nmhnob5cdfmgu
« Previous
Showing results 1 — 15 out of 4,744 results