Filters








156 Hits in 7.6 sec

Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

Moinuddin K. Qureshi, M. Aater Suleman, Yale N. Patt
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
We propose Line Distillation (LDIS), a technique that retains only the used words and evicts the unused words in a cache line.  ...  We also propose Distill Cache, a cache organization to utilize the capacity created by LDIS.  ...  This work was supported by gifts from IBM, Intel, and the Cockrell Foundation. Moinuddin Qureshi was supported by an IBM PhD fellowship during this work.  ... 
doi:10.1109/hpca.2007.346202 dblp:conf/hpca/QureshiSP07 fatcat:u7zdaf6mrna2zezcx7sv7nq4fu

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish Kumar, Hongzhou Zhao, Arrvindh Shriraman, Eric Matthews, Sandhya Dwarkadas, Lesley Shannon
2012 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture  
The Amoeba-Cache effectively filters out unused words in a conventional block and prevents them from being inserted into the cache, allowing the resulting free space to be used to hold tags or data of  ...  Unused words occupy between 17-80% of a 64K L1 cache and between 1%-79% of a 1MB private LLC. This effectively shrinks the cache size, increases miss rate, and wastes on-chip bandwidth.  ...  Line distillation [30] filters out unused words from the cache at evictions using a separate word-granularity cache.  ... 
doi:10.1109/micro.2012.42 dblp:conf/micro/KumarZSMDS12 fatcat:yhczj576pbbc7gfk4jqzsp5sou

Knowledge Distillation: A Survey [article]

Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao
2021 arXiv   pre-print
It has received rapid increasing attention from the community.  ...  Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.  ...  In other words, the quality of knowledge acquisition and distillation from teacher to student is also determined by how to design the teacher and student networks.  ... 
arXiv:2006.05525v6 fatcat:aedzaeln5zf3jgjsgsn5kvjrri

Increasing cache capacity via critical-words-only cache

Cheng-Chieh Huang, Vijay Nagarajan
2014 2014 IEEE 32nd International Conference on Computer Design (ICCD)  
In this paper, we propose a novel cache design known as the critical-words-only cache (co-cache) for increasing the effective cache capacity.  ...  The first-level cache (L1) is typically small, in order to match the speed of the processor. The lower level caches, on the other hand, are typically large, in order to reduce capacity misses.  ...  For instance, Line Distillation [3] attempts to discard such unused words in a cache line to improve cache capacity.  ... 
doi:10.1109/iccd.2014.6974671 dblp:conf/iccd/HuangN14 fatcat:gn5ak65cxjhl3fo3weko3xm7v4

CHOP: Adaptive filter-based DRAM caching for CMP server platforms

Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravishankar Iyer, Srihari Makineni, Donald Newell, Yan Solihin, Rajeev Balasubramonian
2010 HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture  
, (b) several magnitudes lower area overhead in tag space required for cache-line based DRAM caches, (c) significantly lower memory bandwidth consumption as compared to pagegranular DRAM caches.  ...  In this paper, we propose CHOP (Caching HOt Pages) in DRAM caches to address these challenges.  ...  [31] propose line distillation to filter out the unused words in a cache line to increase effective cache capacity. Moshovos et al.  ... 
doi:10.1109/hpca.2010.5416642 dblp:conf/hpca/JiangMZUIMNSB10 fatcat:o7fotbnqdbeinf5kbou6hpkvse

An adaptable storage slicing algorithm for content delivery networks

Andre Moreira, Ernani Azevedo, Judith Kelner, Djamel Sadok, Arthur Callado, Victor Souza
2015 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM)  
CDNsim tutorial: trace_file50000 (first 5000 lines) Provided by CDNsim tutorial: trace_file50000 Link Capacity 100 Mbit/s 100 Mbit/s Table I .  ...  In other words, insertion and removal of objects depends solely on the cache replacement technique used in the surrogate cache.  ...  The number of requests, clients and objects depends on the trace used, as shown in Table II .1.  ... 
doi:10.1109/inm.2015.7140355 dblp:conf/im/MoreiraAKSCS15 fatcat:u2d3ub67zrgozhao4ft6pgyk6y

Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training [article]

Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman (+5 others)
2022 arXiv   pre-print
data, will dominate and constrain training capacity.  ...  As innovations in DSAs continue to increase training efficiency and throughput, the data storage and ingestion (DSI) pipeline, the systems and hardware responsible for storing and preprocessing training  ...  In other words, we must provision significantly more storage capacity per datacenter than is required, in order to meet IOPS demands.  ... 
arXiv:2108.09373v3 fatcat:wwnk7w5t7rbldheztu6v6kunna

Controlling Energy Demand in Mobile Computing Systems

Carla Schlatter Ellis
2007 Synthesis Lectures on Mobile and Pervasive Computing  
Mobile computing and pervasive computing represent major evolutionary steps in distributed systems, a line of research and development that dates back to the mid-1970s.  ...  These include: unpredictable variation in network quality, lowered trust and robustness of mobile elements, limitations on local resources imposed by weight and size constraints, and concern for battery  ...  Deferred writes are one of the ways in which the cache filters the disk request stream.  ... 
doi:10.2200/s00089ed1v01y200704mpc002 fatcat:myednkwcj5h5jizajmmhrj6hmy

Mining Hardware Assertions With Guidance From Static Analysis

S. Hertz, D. Sheridan, S. Vasudevan
2013 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
GoldMine assertions distill the random input stimulus space and can be used for calibrating directed tests. They can be used in a regression test suite of an evolving RTL.  ...  These candidate assertions are then passed through a formal verification engine to filter out the spurious candidates.  ...  The methodology discovers word-level features by computing the weakest precondition for word-level predicates in the RTL source code.  ... 
doi:10.1109/tcad.2013.2241176 fatcat:ter7nec5eneh7dycxldysdmcsu

Microkernel Mechanisms for Improving the Trustworthiness of Commodity Hardware

Yanyan Shen, Kevin Elphinstone
2015 2015 11th European Dependable Computing Conference (EDCC)  
that can malfunction when the hardware is impacted by transient hardware faults.  ...  words maximum: (PLEASE TYPE) The thesis presents microkernel-based software-implemented mechanisms for improving the trustworthiness of computer systems based on commercial off-the-shelf (COTS) hardware  ...  Nevertheless, the system SER of DRAM remains constant or even increases due to the greatly increased memory density and capacity.  ... 
doi:10.1109/edcc.2015.16 dblp:conf/edcc/ShenE15 fatcat:xq65e72x7zcnjbbmrwpgebqnxa

High-Performance and Scalable GPU Graph Traversal

Duane Merrill, Michael Garland, Andrew Grimshaw
2015 ACM Transactions on Parallel Computing  
Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations  ...  This scheme relies upon capacity and conflict misses to update stale bitmask data within the readonly texture caches.  ...  As described in Algorithm 5, each thread attempts to vie for control of its warp by writing its thread-identifier into a single word shared by all threads of that warp.  ... 
doi:10.1145/2717511 fatcat:yspiacetirfpjnbrmungchwyue

Basic Concepts [chapter]

2011 Oracle Database Performance and Scalability  
Each title brings the principles and theory of programming in-the-large and industrial strength software into focus.  ...  ORACLE 11g NEW FEATURES Zero-Size Unusable Indexes and Index Partitions From an administrative perspective, one often is concerned with the space occupied by unusable indexes and unusable index partitions  ...  Dividing a tablespace's capacity by a data block size gives the total number of blocks contained in that tablespace.  ... 
doi:10.1002/9781118135532.ch1 fatcat:itsggugltfdqvis5qndwmthbku

Final Version Of Core Transport System

Naeem Khademi, Zdravko Bozakov, Anna Brunstrom, Øystein Dale, Dragana Damjanovic, Kristian Riktor Evensen, Gorry Fairhurst, Andreas Fischer, Karl-Johan Grinnemo, Tom Jones, Simone Mangiante, Andreas Petlund (+6 others)
2017 Zenodo  
This document updates Deliverable D2.2; in particular, the descriptions of NEAT components presented here correspond to their implementation status by the end of WP2, and as such they supersede those in  ...  architecture defined in Task 1.2.  ...  This is done for pre-filtering purposes, to avoid running DNS lookups for non-feasible candidates since this may increase setup latency. 3 .  ... 
doi:10.5281/zenodo.1216124 fatcat:ekzx6nrsd5h7vedrne3rlv27ba

Scalable GPU graph traversal

Duane Merrill, Michael Garland, Andrew Grimshaw
2012 SIGPLAN notices  
Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations  ...  This scheme relies upon capacity and conflict misses to update stale bitmask data within the read-only texture caches.  ...  Enlistment operates by having each thread attempt to vie for control of its warp by writing its thread-identifier into a single word shared by all threads of that warp.  ... 
doi:10.1145/2370036.2145832 fatcat:u36wmlchhnfljiivpf3u56xjxa

Scalable GPU graph traversal

Duane Merrill, Michael Garland, Andrew Grimshaw
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations  ...  This scheme relies upon capacity and conflict misses to update stale bitmask data within the read-only texture caches.  ...  Enlistment operates by having each thread attempt to vie for control of its warp by writing its thread-identifier into a single word shared by all threads of that warp.  ... 
doi:10.1145/2145816.2145832 dblp:conf/ppopp/MerrillGG12 fatcat:dn7judc27nawnpf7iwjpmh3vqa
« Previous Showing results 1 — 15 out of 156 results