Filters








62 Hits in 8.2 sec

Architecture and Optimal Configuration of a Real-Time Multi-Channel Memory Controller

Manil Dev Gomony, Benny Akesson, Kees Goossens
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013  
However, there is currently no real-time memory controller for multichannel memories, and there is no methodology to optimally configure multi-channel memories in real-time systems.  ...  We also demonstrate configuring a multi-channel Wide IO DRAM in a High-Definition (HD) video and graphics processing system to emphasize the effectiveness of our approach.  ...  ACKNOWLEDGMENT This work was partially funded by projects EU FP7 288008 T-CREST and 288248 Flextiles, Catrene CA104 COBRA, ARTEMIS 100202 RECOMP, PT FCT, and NL STW 10346 NEST.  ... 
doi:10.7873/date.2013.270 dblp:conf/date/GomonyAG13 fatcat:d5xjk6pizbfnbgmunx7rp4577y

MIMS: Towards a Message Interface Based Memory System

Li-Cheng Chen, Ming-Yu Chen, Yuan Ruan, Yong-Bing Huang, Ze-Han Cui, Tian-Yue Lu, Yun-Gang Bao
2014 Journal of Computer Science and Technology  
Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big  ...  A lot of research works have been done to address part of these problems, such as photonics technology for bandwidth, 3D stacking for capacity, and NVM for power as well as many micro-architecture level  ...  Zhang et al. proposed heterogeneous multi-channel (HMC) [25] to balance the performance and power consumption of the DRAM system.  ... 
doi:10.1007/s11390-014-1428-7 fatcat:ywlupx7r2vhsnngdh3uozw4omy

Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications

Brice Goglin
2016 Proceedings of the Second International Symposium on Memory Systems - MEMSYS '16  
High-performance computing requires a deep knowledge of the hardware platform to fully exploit its computing power. The performance of data transfer between cores and memory is becoming critical.  ...  It correctly exposes new heterogeneous architectures with high-bandwidth or non-volatile memories to applications, while still being convenient for affinity-aware HPC runtimes.  ...  ACKNOWLEDGMENTS We would like to thank Intel for providing us with hints for designing our new hwloc model.  ... 
doi:10.1145/2989081.2989115 dblp:conf/memsys/Goglin16 fatcat:eev2v2bomzcdri2gnsnfyn3fey

High performance sparse matrix-vector multiplication on FPGA

Dan Zou, Yong Dou, Song Guo, Shice Ni
2013 IEICE Electronics Express  
memory subsystem is capable of obtaining similar performance by using a single FPGA to that of a highly optimized BFS implementation on a commercial heterogeneous system containing four FPGAs.  ...  This paper presents the design and implementation of a high performance sparse matrix-vector multiplication (SpMV) on fieldprogrammable gate array (FPGA).  ...  Combining the flexibility of software and the high performance of customized hardware design, FPGA can offer superior performance and power efficiency for many specific applications.  ... 
doi:10.1587/elex.10.20130529 fatcat:rjhtbcwllbe3jbigtfxrfo7aby

A distributed interleaving scheme for efficient access to WideIO DRAM memory

Ciprian Seiculescu, Luca Benini, Giovanni De Micheli
2012 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '12  
Achieving the main memory (DRAM) required bandwidth at acceptable power levels for current and future applications is a major challenge for System-on-Chip designers for mobile platforms.  ...  We propose a new distributed interleaved access method that leverages the on-chip interconnect to simplify the design and implementation of the DRAM controller, without impacting performance compared to  ...  and to analyze how to implement low overhead Quality-of-Service support to aggressively reduce latency for critical flows.  ... 
doi:10.1145/2380445.2380467 dblp:conf/codes/SeiculescuBM12 fatcat:5czvs4lzjff63gw4fscr4i4cxy

NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric [article]

Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Christoph Hagleitner, Sander Stuijk, Henk Corporaal, Onur Mutlu
2021 arXiv   pre-print
We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency.  ...  NERO reduces the energy consumption by 12x and 35x for the same two kernels over the POWER9 system with an energy efficiency of 1.61 GFLOPS/Watt and 21.01 GFLOPS/Watt.  ...  For multi-channel designs, we observe that the best-performing multi-channel-single PE design (i.e., using 3 PEs with 12 HBM channels for both workloads) has 4.7× and 3.1× lower performance than the best-performing  ... 
arXiv:2107.08716v1 fatcat:2kthhs3t4rblbmmlmthnqbfdq4

A Review of Near-Memory Computing Architectures: Opportunities and Challenges

Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra
2018 2018 21st Euromicro Conference on Digital System Design (DSD)  
The conventional approach of moving stored data to the CPU for computation has become a major performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse.  ...  Using a case study, we present our methodology and also identify topics for future research to unlock the full potential of near-memory computing.  ...  ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program and is funded by European Commission under Marie Sklodowska-Curie Innovative Training Networks European Industrial Doctorate  ... 
doi:10.1109/dsd.2018.00106 dblp:conf/dsd/SinghCCASJCB18 fatcat:26ucg3klobahff5mguj25lh44m

A Real-Time Multichannel Memory Controller and Optimal Mapping of Memory Clients to Memory Channels

Manil Dev Gomony, Benny Akesson, Kees Goossens
2015 ACM Transactions on Embedded Computing Systems  
However, there is currently no real-time memory controller for multi-channel memories, and there is no methodology to optimally configure multi-channel memories in real-time systems.  ...  Ever increasing demands for main memory bandwidth and memory speed/power trade-off led to the introduction of memories with multiple memory channels, such as Wide IO DRAM.  ...  In Section 7, we present both the experimental evaluation of the multi-channel memory controller architecture and the performance evaluation of our two mapping methods.  ... 
doi:10.1145/2661635 fatcat:rfvs5rcx4vbdfj6xqcqorst67y

Near-Memory Computing: Past, Present, and Future [article]

Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra
2019 arXiv   pre-print
The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse.  ...  In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions.  ...  ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program for the project "Near-Memory Computing (Ne-MeCo)" and is funded by European Commission under Marie Sklodowska-Curie Innovative  ... 
arXiv:1908.02640v1 fatcat:nvppe5zx2vbb5k4phfi4o5qoqq

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, Stefano Markidis
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific  ...  For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory.  ...  This work was supported by the DOE Office of Science, Advanced Scientific Computing Research, under the ARGO project (award number 66150) and the CENATE project (award number 64386).  ... 
doi:10.1109/ipdpsw.2017.115 dblp:conf/ipps/PengGKCLM17 fatcat:awswvmbqurctpazotrxgii5ece

Survey of Memory Management Techniques for HPC and Cloud Computing

Anna Pupykina, Giovanni Agosta
2019 IEEE Access  
However, for this scenario to succeed in practice, resources, including memory, need to be allocated with a vision that includes both the application requirements and the current and future state of the  ...  overall system.  ...  Such memory architecture consists in the HBM implemented with 3D-stacked Multi-Channel DRAM (MCDRAM) and conventional DRAM.  ... 
doi:10.1109/access.2019.2954169 fatcat:hwtpltrdrffqdjdofhr3shjkla

Row buffer locality aware caching policies for hybrid memories

HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, Onur Mutlu
2012 2012 IEEE 30th International Conference on Computer Design (ICCD)  
Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row.  ...  Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type  ...  RBLA-Dyn efficiently bridges this gap in performance and fairness using a small (1 GB) DRAM cache by exploiting the strengths of both DRAM and PCM.  ... 
doi:10.1109/iccd.2012.6378661 dblp:conf/iccd/YoonMAHM12 fatcat:h6kyo3hs5zdc5caruau3xzuyy4

2014 Index IEEE Transactions on Parallel and Distributed Systems Vol. 25

2015 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS Sept. 2014 2264-2274 Max-Min Lifetime Optimization for Cooperative Communications in Multi- Channel Wireless Networks.  ...  Arabnejad, Hamid, +, TPDS March 2014 682-694 Max-Min Lifetime Optimization for Cooperative Communications in Multi-Channel Wireless Networks.  ...  Psaras, I., +, TPDS Nov. 2014 2920 -2931 Max-Min Lifetime Optimization for Cooperative Communications in Multi-Channel Wireless Networks.  ... 
doi:10.1109/tpds.2014.2371591 fatcat:qxyljogalrbfficryqjowgv3je

Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study †

Abdullah Al Hasib, Lasse Natvig, Per Kjeldsberg, Juan Cebrián
2017 Journal of Low Power Electronics and Applications  
Thread-level and data-level parallel architectures have become the design of choice in many of today's energy-efficient computing systems.  ...  While single-threaded execution serves as a common reference point for all architectures to analyze the effects of data reuse on both scalar and vector codes, scalability with thread count is also discussed  ...  A.A.H. designed and implemented the experiments under the supervision of L.N. and P.G.K. for multi-core systems and J.M.C. for the KNL coprocessor.  ... 
doi:10.3390/jlpea7010005 fatcat:grbddqazojasvgscajioyyrtsq

2020 Index IEEE Transactions on Circuits and Systems II: Express Briefs Vol. 67

2020 IEEE Transactions on Circuits and Systems - II - Express Briefs  
Zhang, L., Quantized Fuzzy Finite-Time Control for Nonlinear Semi-Markov Switching Systems; TCSII Nov. 2020 2622-2626 Qi, X., see Liu, W., 1249-1253 Qian, G., see Dong, F., TCSII Dec. 2020 3587-3591  ...  Chip PWM Driver Circuit for Inverter Welding Power Source; TCSII April 2020 720-724 Jacobsson, S., see Castaneda, O., TCSII May 2020 891-895 Jafari, E., and Binazadeh, T., Robust Output Regulation in  ...  ., +, TCSII Oct. 2020 1924-1928 Fine-Grained Bit-Flipping Decoding for LDPC Codes.  ... 
doi:10.1109/tcsii.2020.3047305 fatcat:ifjzekeyczfrbp5b7wrzandm7e
« Previous Showing results 1 — 15 out of 62 results