Filters








28,938 Hits in 3.0 sec

Dynamic memory instruction bypassing

Daniel Ortega, Eduard Ayguadé, Mateo Valero
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
This mechanisms benefits increase in the presence of memory prefetching or a good memory behaviour, since these scenarios allow for the bypassing of more loads.  ...  Its main aim is to move data from L1 and L2 silently and ahead of time into the register file so that the load instruction can be subsequently bypassed (hence the name).  ...  Memory Instruction Bypassing.  ... 
doi:10.1145/782814.782858 dblp:conf/ics/OrtegaAV03 fatcat:mzmdt6ar5rglbbmqq7z3v2dzdi

Dynamic memory instruction bypassing

Daniel Ortega, Eduard Ayguadé, Mateo Valero
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
This mechanisms benefits increase in the presence of memory prefetching or a good memory behaviour, since these scenarios allow for the bypassing of more loads.  ...  Its main aim is to move data from L1 and L2 silently and ahead of time into the register file so that the load instruction can be subsequently bypassed (hence the name).  ...  Memory Instruction Bypassing.  ... 
doi:10.1145/782856.782858 fatcat:ymivakntgfe6njz52euosav7ga

Dynamic Memory Instruction Bypassing

Daniel Ortega, Mateo Valero, Eduard Ayguadé
2004 International journal of parallel programming  
This mechanisms benefits increase in the presence of memory prefetching or a good memory behaviour, since these scenarios allow for the bypassing of more loads.  ...  Its main aim is to move data from L1 and L2 silently and ahead of time into the register file so that the load instruction can be subsequently bypassed (hence the name).  ...  The mechanism presented in this paper, Dynamic Memory Instruction Bypassing, tries to eliminate loads from the critical path.  ... 
doi:10.1023/b:ijpp.0000029273.49634.19 fatcat:7srx7nmugnhnljo6t4zekjemiq

Coordinated static and dynamic cache bypassing for GPUs

Xiaolong Xie, Yun Liang, Yu Wang, Guangyu Sun, Tao Wang
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
Our dynamic bypassing technique modulates the ratio of thread blocks that cache or bypass at run-time. We choose to modulate at thread block level in order to avoid the memory divergence problems.  ...  In this paper, we propose coordinated static and dynamic cache bypassing to improve application performance.  ...  At run-time, the dynamic cache bypassing component honors the cache or bypass decisions for the memory requests with strong preferences to cache or bypass, but has the flexibility to adjust the behavior  ... 
doi:10.1109/hpca.2015.7056023 dblp:conf/hpca/XieLWSW15 fatcat:bican3tomzamlc4yswnfaxl6xe

The D30V/MPEG multimedia processor

H. Takata, T. Watanabe, T. Nakajima, T. Takagaki, H. Sato, A. Mohri, A. Yamada, T. Kanamoto, Y. Matsuda, S. Iwade, Y. Horiba
1999 IEEE Micro  
The operand fetched from memory in the M stage of the memory unit pipe is bypassed to the following instructions both in the memory unit and integer unit pipes.  ...  Memory access instructions execute in the instruction fetch (I), instruction decode and address generation (D/A), memory access (M), and write-back (W) stages.  ...  He helped implement the system LSIs, focusing on the clock skew analysis and the dynamic circuit design.  ... 
doi:10.1109/40.782566 fatcat:3u2475fqp5ayta5l7i7tpl5ajq

Reducing memory latency via read-after-read memory dependence prediction

A. Moshovos, G.S. Sohi
2002 IEEE transactions on computers  
Moreover, a combined RAW-and RAR-based cloaking/bypassing mechanism improves performance by 6.44 percent (integer) and 4.66 percent (floatingpoint) over a highly aggressive dynamically scheduled superscalar  ...  This is done whenever RAR dependences are predicted among the vyeh i instructions.  ...  Dynamic Instruction Distance Distribution In this section, we measure the distance in dynamic instructions between the loads that get a correct value via cloaking and the source instruction that supplied  ... 
doi:10.1109/12.990129 fatcat:yy22nedq4vbafgctr5an45r62i

New Two-Level L1 Data Cache Bypassing Technique for High Performance GPUs

Gwang Bok Kim, Cheol Hong Kim
2021 Journal of Information Processing Systems  
On-chip caches of graphics processing units (GPUs) have contributed to improved GPU performance by reducing long memory access latency.  ...  Our two-level bypassing technique can be applied to recent GPU models and improves the performance by 6% on average compared to the architecture without bypassing.  ...  Dynamic bypassing schemes for GPUs can be classified into two categories.  ... 
doi:10.3745/jips.01.0062 dblp:journals/jips/KimK21 fatcat:lodv2kq2gbd3lfgwkiu4cetlii

Run-time adaptive cache hierarchy management via reference analysis

Teresa L. Johnson, Wen-mei W. Hwu
1997 SIGARCH Computer Architecture News  
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism.  ...  We introduce the concept of a macroblock, which allows us to feasibly characterize the memory locations accessed by a program, and a Memory Address Table, which performs the dynamic reference analysis.  ...  The bypassing choices are made by a Memory Address Table ( MAT), which performs dynamic reference analysis in a location-sensitive manner.  ... 
doi:10.1145/384286.264213 fatcat:elusjog2t5btlfchbwgpejfvpq

Run-time adaptive cache hierarchy management via reference analysis

Teresa L. Johnson, Wen-mei W. Hwu
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism.  ...  We introduce the concept of a macroblock, which allows us to feasibly characterize the memory locations accessed by a program, and a Memory Address Table, which performs the dynamic reference analysis.  ...  The bypassing choices are made by a Memory Address Table ( MAT), which performs dynamic reference analysis in a location-sensitive manner.  ... 
doi:10.1145/264107.264213 dblp:conf/isca/JohnsonH97 fatcat:bgpkx6mt4zefvf64oydyavsdge

DaCache

Bin Wang, Weikuan Yu, Xian-He Sun, Xinning Wang
2015 Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15  
DaCache also adopts a constrained replacement policy with L1D bypassing to sustain a good supply of Fully Cached Warps (FCW), along with a dynamic mechanism to adjust FCW during runtime.  ...  The benchmarks are categorized into memory-divergent and memory-coherent ones, depending on the dynamic divergence of load instructions in these benchmarks.  ...  In the rest of this paper, the memory instructions that incur more than 2 uncoalescable memory accesses are called divergent instructions, while the others are called coherent instructions.  ... 
doi:10.1145/2751205.2751239 dblp:conf/ics/WangYSW15 fatcat:bz3db7ty3fac7hgak6jilvfurm

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

Yijie Huangfu, Wei Zhang
2017 Journal of Computing Science and Engineering  
Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality.  ...  bypassing with considerably less profiling cost.  ...  [21] implemented a PC-based dynamic GPU cache bypassing predictor. Xie et al.  ... 
doi:10.5626/jcse.2017.11.2.69 fatcat:pudydgokind6dpxyk6hdsssgmm

Comparing static and dynamic code scheduling for multiple-instruction-issue processors

Pohua P. Chang, William Y. Chen, Scott A. Mahlke, Wen-mei W. Hwu
1991 Proceedings of the 24th annual international symposium on Microarchitecture - MICRO 24  
The overall result is that the dynamic and static approaches are comparable in performance.  ...  When applied to a four-instruction-issue processor, both methods achieve more than two times speedup over a high performance single-instruction-issue processor.  ...  Memory load operations are allowed to bypass preceding memory store operations if the memory addresses do not con ict. Cache misses do not stall the instruction pipeline.  ... 
doi:10.1145/123465.123471 dblp:conf/micro/ChangCMH91 fatcat:nr5ulvdutfantipi344kiofhhi

Adaptive GPU cache bypassing

Yingying Tian, Sooraj Puthoor, Joseph L. Greathouse, Bradford M. Beckmann, Daniel A. Jiménez
2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015  
We show that, with a 16KB L1 data cache, dynamic bypassing achieves similar performance to a double-sized L1 cache while reducing energy consumption by 25% and power by 18%.  ...  We give a case study to demonstrate the inefficiency of current GPU caches compared to programmer-managed scratchpad memories and show the extent to which cache bypassing can make up for the potential  ...  Using only the PC of the last memory instruction to access a block is sufficient for a compact GPU bypassing predictor.  ... 
doi:10.1145/2716282.2716283 dblp:conf/ppopp/TianPGBJ15 fatcat:dsin33y6zzeg7mwejrtiticoyq

Energy efficient special instruction support in an embedded processor with compact isa

Dongrui She, Yifan He, Henk Corporaal
2012 Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems - CASES '12  
Though proposed design imposes extra constraints on the operation patterns, the experimental results show that the average dynamic instruction count is reduced by over 25%, which is only about 2% less  ...  However, in an embedded generic processor with compact instruction set architecture (ISA), such instructions may lead to large overhead as: i) more bits are needed to encode the extra opcodes and operands  ...  Figure 14 :Figure 15 : 1415 Dynamic instruction count (overhead included) Normalized memory energy consumption ory) per cycle.  ... 
doi:10.1145/2380403.2380430 dblp:conf/cases/SheHC12 fatcat:r72lcor4tzajfgmw5olaf7tk3a

An L2 Cache Architecture Supporting Bypassing for Low Energy and High Performance

Jungwoo Park, Soontae Kim, Jong-Uk Hou
2021 Electronics  
Bypassing the L2 cache for those small programs has two benefits. When only a single program runs, bypassing the L2 cache allows to power it down removing its leakage energy consumption.  ...  When multiple programs run simultaneously on multiple cores, small programs bypass the L2 cache while large programs use it.  ...  Dynamic energy of memory can be reduced because it will reduce read and write activation power.  ... 
doi:10.3390/electronics10111328 fatcat:7a2shldttrcctgyaukmy2gd77q
« Previous Showing results 1 — 15 out of 28,938 results