Filters








15 Hits in 1.8 sec

A Method for Hiding the Increased Non-Volatile Cache Read Latency [article]

Apostolos Kokolis, Namrata Mantri, Shrikanth Ganapathy, Josep Torrellas, John Kalamatianos
2021 arXiv   pre-print
The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their area density benefit.
more » ... lthough researchers have proposed methods to tolerate NVM's increased write latency, little emphasis has been placed on reducing the critical NVM read latency. To address this problem, this paper proposes Cloak. Cloak exploits data reuse in the LLC at the page level, to hide NVM read latency. Specifically, on certain L1 TLB misses to a page, Cloak transfers LLC-resident data belonging to the page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, Cloak outperforms an SRAM LLC by 23.8% and an NVM-only LLC by 8.9% -- in both cases, with negligible additional area. Further, Cloak's ED^2 is 39.9% and 17.5% lower, respectively, than these designs.
arXiv:2112.10632v1 fatcat:zqf5b4gbffcofojvikl6qpwg6y

MODEST

Shrikanth Ganapathy, Ramon Canal, Antonio Gonzalez, Antonio Rubio
2010 Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10  
Estimation of static and dynamic energy of caches is critical for high-performance low-power designs. Commercial CAD tools performing energy estimation statically are not aware of the changing operating and environmental conditions which makes the problem of energy estimation more dynamic in nature. It is worsened by process induced variations of low level parameters like threshold voltage and channel length. In this paper we present MODEST, a proposal for estimating the static and dynamic
more » ... y of caches taking into account spatial variations of physical parameters, temporal changes of supply voltage and environmental factors like temperature. It can be used to estimate the energy of different blocks of a cache based on a combination empirical data and analytical equations. The observed maximum and median error between MODEST and HSPICE energyestimates for 22,500 samples is around 7.8% and 0.5% respectively. As a case study, using MODEST, we propose a two step iterative optimization procedure involving Dual-V th assignment and standby supply voltage minimization for reclaiming energy-constrained caches. The observed energy reduction is around 50.8% for the most-leaky Cache. A speed-up of 750X over conventional hard-coded implementation for such optimizations is achieved.
doi:10.1145/1840845.1840873 dblp:conf/islped/GanapathyCGR10 fatcat:qy72ti5rlng77g64j46d3qt7au

Approximate computing with unreliable dynamic memories

Shrikanth Ganapathy, Adam Teman, Robert Giterman, Andreas Burg, Georgios Karakonstantis
2015 2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS)  
doi:10.1109/newcas.2015.7182027 dblp:conf/newcas/GanapathyTGBK15 fatcat:njnkvw47avdufmahjalyszkv3m

Circuit propagation delay estimation through multivariate regression-based modeling under spatio-temporal variability

Shrikanth Ganapathy, Ramon Canal, Antonio Gonzalez, Antonio Rubio
2010 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)  
With every process generation, the problem of variability in physical parameters and environmental conditions poses a great challenge to the design of fast and reliable circuits. Propagation delays which decide circuit performance are likely to suffer the most from this phenomena. While Statistical static timing analysis (SSTA) is used extensively for this purpose, it does not account for dynamic conditions during operation. In this paper, we present a multivariate regression based technique
more » ... t computes the propagation delay of circuits subject to manufacturing process variations in the presence of temporal variations like temperature. It can be used to predict the dynamic behavior of circuits under changing operating conditions. The median error between the proposed model and circuit-level simulations is below 5%. With this model, we ran a study of the effect of temperature on access time delays for 500 cache samples. The study was run in 0.557 seconds, compared to the 20h and 4min of the SPICE simulation achieving a speedup of over 1X10 5 . As a case study, we show that the access times of caches can vary as much as 2.03X at high temperatures in future technologies under process variations.
doi:10.1109/date.2010.5457167 dblp:conf/date/GanapathyCGR10 fatcat:m5lqmp6wbjfe5bismmwrgw2spa

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 SIGARCH Computer Architecture News  
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional S-RAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new
more » ... challenges. Periodic refresh operations are needed to maintain data integrity. This is exacerbated with the scaling of eDRAM density, process variations and temperature. Unlike conventional CPUs which make use of multi-ported RF, most of the RFs in modern GPGPU are heavily banked but not multi-ported to reduce the hardware cost. This provides a unique opportunity to hide the refresh overhead. We propose two different eDRAM implementations based on 3T1D and 1T1C memory cells. To mitigate the impact of periodic refresh, we propose two novel refresh solutions using bank bubble and bank walk-through. Plus, for the 1T1C RF, we design an interleaved bank organization together with an intelligent warp scheduling strategy to reduce the impact of the destructive reads. The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.
doi:10.1145/2508148.2485952 fatcat:nbaghtk2rvhfri6wkpah4t27ta

Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors

Shrikanth Ganapathy, Ramon Canal, Antonio Gonzalez, Antonio Rubio
2011 2011 IEEE 29th International Conference on Computer Design (ICCD)  
In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce standby leakage power in first level data-caches under process variations. Accessed physical arrays are forward body biased (FBB) to improve latency while idle (unaccessed) arrays are reverse body biased (RBB) for reducing standby leakage power. The bias voltage to be applied is computed at design time and updated at run-time to counter the negative effects of process variations. This ensures that under
more » ... ll scenarios, the cache will consume the lowest leakage power for the target access latency computed at design-time. A sensor-like hardware mechanism measures the variation in latency and leakage at run-time and this measurement is used to update the bias voltage. The backbone of the hardware used for measurement is a three-transistor one-diode(3T1D)DRAM cell embedded into a regular cache array. By measuring the access and retention time of the 3T1D cell, we show that it is possible to classify cache arrays based on run-time latency/leakage profiles. Our technique reduces leakage energy consumption and access latency of the cache on an average by 20% & 18% respectively. Finally we show that our technique will improve parametric yield by a maximum of 38% for worst-case scenario.
doi:10.1109/iccd.2011.6081420 dblp:conf/iccd/GanapathyCGR11 fatcat:bodlkc2txbbfvftlurhxgkzrea

Mitigating the impact of faults in unreliable memories for error-resilient applications

Shrikanth Ganapathy, Georgios Karakonstantis, Adam Teman, Andreas Burg
2015 Proceedings of the 52nd Annual Design Automation Conference on - DAC '15  
Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energyefficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to
more » ... e faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to errorcorrection codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28 nm process technology.
doi:10.1145/2744769.2744871 dblp:conf/dac/GanapathyKTB15 fatcat:mwwngmb4pvfvhoh7bpynbncy6a

iRMW: A low-cost technique to reduce NBTI-dependent parametric failures in L1 data caches

Shrikanth Ganapathy, Ramon Canal, Antonio Gonzalez, Antonio Rubio
2014 2014 IEEE 32nd International Conference on Computer Design (ICCD)  
Negative bias temperature instability (NBTI) is a major cause of concern for chip designers because of its inherent ability to drastically reduce silicon reliability over the lifetime of the processor. Coupled with statistical variations of process parameters, it can potentially render systems dysfunctional in certain scenarios. Data caches suffer the most from such phenomenon because of the unbalanced duty cycle ratio of SRAM cells and maximum intrinsic susceptibility to process variations. In
more » ... this paper, we propose a novel NBTI-aware technique, invert-Read-Modify-Write (iRMW) that can improve the functional yield of the data cache significantly over its lifetime. Using architecture-level benchmarks, we first analyse the impact of activity factor and workload variation on NBTI-induced failures in data caches. iRMW is then used as a means to balance the duty cycle by alternating between recovery and stress cycle upon successive read accesses to the cache line. The highly transient nature of the data stored in L1 data cache aides this process of recovery upon using iRMW. A unique feature of iRMW is its intelligent use of low-leakage & NBTI-tolerant embedded-DRAM cells as an alternative to SRAM-cells for storing important state information. Our experiments conducted using SPEC2006 and PhysicsBench workloads show that on-average the cache failure probability can be reduced by 22%, 33% and 36% after two, four and eight years of processor usage respectively. In addition to being extremely power-frugal, use of eDRAM reduces total area footprint of iRMW tremendously. • A novel technique, iRMW that leverages duty-cycle imbalance and r/w access patterns in caches to lower the impact of NBTI-dependent failures.
doi:10.1109/iccd.2014.6974664 dblp:conf/iccd/GanapathyCGR14 fatcat:5aoa5a5hdnfpzpyb2svxqjvcbm

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional S-RAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new
more » ... challenges. Periodic refresh operations are needed to maintain data integrity. This is exacerbated with the scaling of eDRAM density, process variations and temperature. Unlike conventional CPUs which make use of multi-ported RF, most of the RFs in modern GPGPU are heavily banked but not multi-ported to reduce the hardware cost. This provides a unique opportunity to hide the refresh overhead. We propose two different eDRAM implementations based on 3T1D and 1T1C memory cells. To mitigate the impact of periodic refresh, we propose two novel refresh solutions using bank bubble and bank walk-through. Plus, for the 1T1C RF, we design an interleaved bank organization together with an intelligent warp scheduling strategy to reduce the impact of the destructive reads. The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.
doi:10.1145/2485922.2485952 dblp:conf/isca/JingSLGMGCL13 fatcat:niswlskhwbgxvc27zdnc5bnq54

On the concept of simultaneous execution of multiple applications on hierarchically based cluster and the silicon operating system

N. Venkateswaran, Vinoth Krishnan Elangovan, Karthik Ganesan, TP Ramnath Sai Sagar, Sriram Aananthakrishanan, Shreyas Ramalingam, Shyamsundar Gopalakrishnan, Madhavan Manivannan, Deepak Srinivasan, Viswanath Krishnamurthy, Karthik Chandrasekar, Vishwanath Venkatesan (+6 others)
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
In this paper we present a novel cluster paradigm and silicon operating system. Our approach in developing the competent cluster design revolves around an execution model to aid the execution of multiple independent applications simultaneously on the cluster, leading to cost sharing across applications. The execution model should envisage simultaneous execution of multiple applications (running traces of multiple independent applications in the same node at an instant, without time sharing) and
more » ... on all the partitions(nodes) of a single cluster, without sacrificing the performance of individual application, unlike in the current cluster models. Performance scalability is achieved as we increase the number of nodes, the problem size of the individual independent applications, due to non-dependency across applications and hence increase in the number of non-dependent operations( as the problem sizes of the applications get increased) and this leads to better utilization of the unused resources within the node. This execution model is very much dependent on the node architecture for performance scalability. This would be a major initiative towards achieving performance Cost-Effective Supercomputing. * §N.Venkateswaran
doi:10.1109/ipdps.2008.4536347 dblp:conf/ipps/VenkateswaranEGSARGMSKCVSSVGMT08 fatcat:yoib5rzpy5f6rcm3pytl5sykp4

Energy vs. Reliability Trade-offs Exploration in Biomedical Ultra-Low Power Devices

Loris Duch, P. Garcia del Valle, Shrikanth Ganapathy, Andreas Burg, David Atienza
2016 Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)   unpublished
State-of-the-art wearable devices such as embedded biomedical monitoring systems apply voltage scaling to lower as much as possible their energy consumption and achieve longer battery lifetimes. While embedded memories often rely on Error Correction Codes (ECC) for error protection, in this paper we explore how the characteristics of biomedical applications can be exploited to develop new techniques with lower power overhead. We then introduce the Dynamic eRror compEnsation And Masking (DREAM)
more » ... echnique, that provides partial memory protection with less area and power overheads than ECC. Different tradeoffs between the error correction ability of the techniques and their energy consumption are examined to conclude that, when properly applied, DREAM consumes 21% less energy than a traditional ECC with Single Error Correction and Double Error Detection (SEC/DED) capabilities.
doi:10.3850/9783981537079_0053 fatcat:zwfgdjazcfcvfm6hzsfnrsxcaa

Table of contents

2011 2011 IEEE 29th International Conference on Computer Design (ICCD)  
Ganapathy, Ramon Canal, Antonio Gonzalez and Antonio Rubio Systems Potpourri (Systems Track) Session 8 Memory and Cache Architectures (Systems Track) A Morphable Phase Change Memory Architecture  ...  Panagopoulos, Georgios Karakonstantis, Dag Wisland, Hamid Mahmoodi, Jens Kargaard Madsen and Kaushik Roy Dynamic Fine-Grain Body Biasing of Caches with Latency and Leakage 3T1D-Based Monitors . . . 332 Shrikanth  ... 
doi:10.1109/iccd.2011.6081365 fatcat:bvwkllkwzngmtimda4zlgl3f3m

Improvements to the IBM speech activity detection system for the DARPA RATS program

Samuel Thomas, George Saon, Maarten Van Segbroeck, Shrikanth S. Narayanan
2015 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The authors thank Brian Kingsbury, Sriram Ganapathy, Hagen Soltau and Tomas Beran for useful discussions.  ... 
doi:10.1109/icassp.2015.7178822 dblp:conf/icassp/ThomasSSN15 fatcat:lvbz7xfvbjhzzd4c5ng7envdiq

Acknowledgment to Reviewers of the International Journal of Molecular Sciences in 2020

International Journal of Molecular Sciences Editorial Office International Journal of Molecular Sciences Editorial Office
2021 International Journal of Molecular Sciences  
Fukunaga, Ryuya Gachomo, Emma Fukunaga, Tsukasa Gachon, Frederic Fukuoka, Hidenori Gackowski, Daniel Fukushima, Atsushi Gaczynska, Maria Fukushima, Hiroshi Gad, Ahmed Fukushima, Kentaro Gadad, Shrikanth  ...  Samuel Gamian, Andrzej Garcia, Victor Gamper, Armin García-Álvarez, Yolanda Gan, Renyou García-Aranda, Marilina Gan, Samuel García-Barrado, María José Ganaie, Safder Garcia-Borron, Jose Carlos Ganapathy  ... 
doi:10.3390/ijms22031123 pmid:33498748 fatcat:7dgl7jrq7rdkxco62ryiwqceay

Development and Commercialization of CMS Pigeonpea Hybrids [chapter]

KB Saxena, D Sharma, MI Vales
2018 Plant Breeding Reviews  
Ganapathy et al. (2012) reported monogenic recessive resistance in one cross, and non-allelic digenic with complementary epistasis in another cross.  ...  Fitzg) Maesen was used as a male parent (Shrikanth et al., 2015) . The CMS A 9 appears to be better than A 5 , since both the maintainers and restorers have already been identified.  ... 
doi:10.1002/9781119414735.ch3 fatcat:q5t34r35djb7pftarkin4cudza