Filters








91 Hits in 8.1 sec

MICA: a memory and interconnect simulation environment for cache-based architectures

Hung-Chang Hsiao, Chung-Ta King
Proceedings 33rd Annual Simulation Symposium (SS 2000)  
It runs on the inexpensive Linux-based PCs. MICA uses application traces as inputs and provides a core scheduler and memory and interconnect interfaces.  ...  MICA is a new-generation simulation environment, which provides complete simulation facilities for simulating distributed shared memory (DSM) multiprocessors.  ...  In this paper, we introduce MICA -a memory and interconnect simulation environment for cache-based architecture [15] . MICA was developed for studying DSM multiprocessors.  ... 
doi:10.1109/simsym.2000.844930 dblp:conf/anss/HsiaoK00 fatcat:nu5rhslszzhoxd44m26vcwjcjm

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Sheng Li, Pradeep Dubey, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee
2015 SIGARCH Computer Architecture News  
and that copies bear this notice and the full citation on the first page.  ...  Copyrights for third-party components of this work must be honored.  ...  This work was supported in part by the National Science Foundation under award CNS-1345305 and by the Intel Science and Technology Center for Cloud Computing.  ... 
doi:10.1145/2872887.2750416 fatcat:gvbhknk66rcmne66ccebbuahv4

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Sheng Li, Pradeep Dubey, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
and that copies bear this notice and the full citation on the first page.  ...  Copyrights for third-party components of this work must be honored.  ...  This work was supported in part by the National Science Foundation under award CNS-1345305 and by the Intel Science and Technology Center for Cloud Computing.  ... 
doi:10.1145/2749469.2750416 dblp:conf/isca/LiLLAKKASLD15 fatcat:el2q5w7665hnhouwqcrj2eg2xi

Characterizing Optimizations to Memory Access Patterns using Architecture-Independent Program Features [article]

Aditya Chilukuri, Josh Milthorpe, Beau Johnston
2020 arXiv   pre-print
The Architecture-Independent Workload Characterization (AIWC) tool is a plugin for the Oclgrind OpenCL simulator that gathers metrics of OpenCL programs that can be used to understand and predict program  ...  The new metric can be used to distinguish between the OpenDwarfs benchmarks based on the memory access patterns affecting their performance on various architectures.  ...  cost characteristics of the target code. e simulation environment used by CuMAPz and the a ached analysis framework is highly speci c to CUDA enabled GPUs.  ... 
arXiv:2003.06064v1 fatcat:24y6blwtofb6njvhq3ny6dccvu

Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform

Sheng Li, Pradeep Dubey, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, Seongil O, Sukhan Lee
2016 ACM Transactions on Computer Systems  
We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our  ...  Our system delivers the best performance and energy efficiency (RPS/watt) demonstrated to date over existing KVSs including the best-published FPGA-based and GPU-based claims.  ...  ACKNOWLEDGMENTS We thank Luke Chang, Patrick Lu, Srinivas Sridharan, Karthikeyan Vaidyanathan, Venkyand Venkatesan, Amir Zinaty, and the anonymous reviewers for their valuable feedback.  ... 
doi:10.1145/2897393 fatcat:57tqejsyubbc5ktu3yobifqkhu

AIWC: OpenCL-based Architecture-Independent Workload Characterisation [article]

Beau Johnston, Josh Milthorpe
2018 arXiv   pre-print
The tool, AIWC, is a plugin for the open-source Oclgrind simulator. It supports parallel workloads and is capable of characterizing OpenCL codes currently in use in the supercomputing setting.  ...  This work presents the first architecture- independent workload characterization framework for heterogeneous compute platforms, proposing a set of metrics determining the suitability and performance of  ...  L2 and L3 cache or main memory of a typical current-generation CPU architecture.  ... 
arXiv:1805.04207v2 fatcat:fxlid5m47ncstp6mlfaf72atfm

A Review of Near-Memory Computing Architectures: Opportunities and Challenges

Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra
2018 2018 21st Euromicro Conference on Digital System Design (DSD)  
Using a case study, we present our methodology and also identify topics for future research to unlock the full potential of near-memory computing.  ...  The conventional approach of moving stored data to the CPU for computation has become a major performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse.  ...  ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program and is funded by European Commission under Marie Sklodowska-Curie Innovative Training Networks European Industrial Doctorate  ... 
doi:10.1109/dsd.2018.00106 dblp:conf/dsd/SinghCCASJCB18 fatcat:26ucg3klobahff5mguj25lh44m

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems

Yuhwan Ro, Minchul Sung, Yongjun Park, Jung Ho Ahn
2017 IEICE Electronics Express  
Satisfying a demand for higher memory capacity is a major problem for computing systems.  ...  Conventional solutions are reaching those limits; instead, DRAM/NVM hybrid main memory systems which consist of emerging Non-Volatile Memory for large capacity and DRAM last-level cache for high access  ...  On a PCM-based DRAM/NVM hybrid main memory system, we show through systemlevel simulation that OBYST improves memory bandwidth, IPC, and EDP by up to 22%, 21%, and 26% over the baseline (without any bandwidth  ... 
doi:10.1587/elex.14.20170437 fatcat:4nk2wxep2bfwbgaa65npptp4cy

SENFIS: a Sensor Node File System for increasing the scalability and reliability of Wireless Sensor Networks applications

Soledad Escolar Díaz, Florin Isaila, Alejandro Calderón Mateos, Luis Miguel Sanchez García, David E. Singh
2009 Journal of Supercomputing  
This paper proposes Sensor Node File System (SENFIS), a novel file system for sensor nodes, which addresses both scalability and reliability concerns.  ...  First, it can transparently be employed as a permanent storage for distributed TinyDB queries, in order to increase the reliability and scalability.  ...  SENFIS has been implemented on TinyOS and is based on the Atmel AT45DB [1] flash memory chip, employed in Mica and Telos A motes.  ... 
doi:10.1007/s11227-009-0275-8 fatcat:fi5zh7k5wng4vfjghe5ijkdcky

Services Everywhere: an Object-Oriented Distributed Platform to Support Pervasive Access to HW and SW Objects in Ambient Intelligence Environments [chapter]

Jesus Barba, Felix Jesus, David Villa, Francisco Moya, Fernando Rincon, Maria Jose, Juan Carlos
2010 Ambient Intelligence  
Input and output messages are handled on-the-fly using a generated byte-stream processor which saves memory since the incoming message has not even need cached by the device.  ...  The base architecture of a hardware node in OOPAmI is shown in Figure X and comprises:  The objects, implemented as hardware units.  ...  In this book a selection of unsolved problems which are considered key for ambient intelligence to become a reality, is analyzed and studied in depth.  ... 
doi:10.5772/8675 fatcat:ju3ywpifavbkzp6ka3vtaxvyni

An in-memory object caching framework with adaptive load balancing

Yue Cheng, Aayush Gupta, Ali R. Butt
2015 Proceedings of the Tenth European Conference on Computer Systems - EuroSys '15  
While individual load balancing approaches are being leveraged in in-memory caches, MBal goes beyond the extant systems and offers a holistic solution wherein the load balancing model tracks hotspots and  ...  Performance evaluation on an 8-core commodity server shows that compared to a state-of-the-art approach, MBal scales with number of cores and executes 2.3× and 12× more queries/second for GET and SET operations  ...  Acknowledgements We are grateful to the anonymous reviewers and our shepherd, Thomas Moscibroda, for their valuable feedback, which significantly improved the paper. We also thank Ali Anwar and M.  ... 
doi:10.1145/2741948.2741967 dblp:conf/eurosys/ChengGB15 fatcat:5jfkyowt5rcxzmb6igdf4rvila

Fiber-based architecture for NFV cloud databases

Vaidas Gasiunas, Alexander Nozdrin, Weijie Ou, Nir Pachter, Dima Sivov, Eliezer Levy, David Dominguez-Sal, Ralph Acker, Aharon Avitzur, Ilan Bronshtein, Rushan Chen, Eli Ginot (+2 others)
2017 Proceedings of the VLDB Endowment  
Therefore, we designed a special shared-nothing architecture that is based on cooperative multi-tasking using user-level threads (fibers).  ...  Furthermore, fibers yield a simpler-to-maintain software and enable controlling a trade-off between long-duration computations and real-time requests.  ...  For example, MICA architecture assigns a data partition to each core and gives exclusive usage of one NIC to each CPU core [12] .  ... 
doi:10.14778/3137765.3137774 fatcat:tlzqna6ukzdy7f7mfqz3qtsipu

Quo Vadis, SLD? Reasoning About the Trends and Challenges of System Level Design

Alberto Sangiovanni-Vincentelli
2007 Proceedings of the IEEE  
He had been a great companion in research, vision, industrial interaction, and life in general. He will be sorely missed.  ...  Acknowledgment This paper is dedicated to the memory of my long-time friend and colleague, Richard Newton, who passed away in January.  ...  ; 3) base tools for simulation and design imports.  ... 
doi:10.1109/jproc.2006.890107 fatcat:gtargbuwf5hztmjvzoecubtbea

The Case for RackOut

Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, Boris Grot
2016 Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16  
In addition, we implement a RackOut proof-of-concept key-value store, evaluate it on two experimental platforms based on RDMA and Scale-Out NUMA, and use these results to validate the model.  ...  Our results show that RackOut can increase throughput up to 6× for RDMA and 8.6× for Scale-Out NUMA compared to a scale-out deployment, while respecting tight tail latency service-level objectives.  ...  Acknowledgements The authors thank the anonymous reviewers for their precious comments and feedback.  ... 
doi:10.1145/2987550.2987577 dblp:conf/cloud/NovakovicDBFG16 fatcat:j6px37guazcx7o6q5hwm3byuna

ZygOS

George Prekas, Marios Kogias, Edouard Bugnion
2017 Proceedings of the 26th Symposium on Operating Systems Principles - SOSP '17  
For a service-level objective of 1000µs latency at the 99 t h percentile, ZYGOS can deliver a 1.63× speedup over Linux (because of its dataplane architecture) and a 1.26× speedup over IX, a state-of-the-art  ...  FCFS) for 10µs tasks, and 88% for 25µs tasks.  ...  This work was funded in part by the Microsoft-EPFL Joint Research Center, the NanoTera YINS project, and a VMware Research Grant. George Prekas is supported by a Google Graduate Research Fellowship.  ... 
doi:10.1145/3132747.3132780 dblp:conf/sosp/PrekasKB17 fatcat:ocyr44ciajgdhd3td7gaeaihnq
« Previous Showing results 1 — 15 out of 91 results