Filters








42,276 Hits in 3.3 sec

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 SIGARCH Computer Architecture News  
We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture  ...  We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads.  ...  We thank the DSLab for their assistance with Cloud9, Emre Özer and Rustam Miftakhutdinov for their feedback and suggestions, and Aamer Jaleel and Carole Jean-Wu for their assistance with understanding  ... 
doi:10.1145/2189750.2150982 fatcat:26l7woyutjhodbffqiidze5i2e

Clearing the clouds

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture  ...  We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads.  ...  We thank the DSLab for their assistance with Cloud9, Emre Özer and Rustam Miftakhutdinov for their feedback and suggestions, and Aamer Jaleel and Carole Jean-Wu for their assistance with understanding  ... 
doi:10.1145/2150976.2150982 dblp:conf/asplos/FerdmanAKVAJKPAF12 fatcat:z37fymq7dzgzxhnrwjudviuzwi

A PRET architecture supporting concurrent programs with composable timing properties

Isaac Liu, Jan Reineke, Edward A. Lee
2010 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers  
Our realization employs a threadinterleaved pipeline with scratchpad memories, and has a predictable DRAM controller.  ...  Modern computer architectures introduce timing interference between functions due to unrestricted access of shared hardware resources, such as pipelines and caches.  ...  Thus, a huge part of analyzing execution time depends on whether or not a memory access can be accurately classified as a cache access or memory access.  ... 
doi:10.1109/acssc.2010.5757922 fatcat:tbqmh3mmjngq3f5gwi75mcugky

A characterization of data mining algorithms on a modern processor

Amol Ghoting, Gregory Buehrer, Srinivasan Parthasarathy, Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, Pradeep Dubey
2005 Proceedings of the 1st international workshop on Data management on new hardware - DAMON '05  
In this paper, we characterize the performance and memory access behavior of several data mining algorithms.  ...  Consequently, all these algorithms grossly under-utilize a modern day processor.  ...  PERFORMANCE CHARACTERIZATION To analyze the performance of the selected data mining algorithms, we use a system with an Intel Pentium 4 processor with HT technology and 1.5GB of physical memory.  ... 
doi:10.1145/1114252.1114258 fatcat:dprmlukme5ev7gsyycgjqcvlvm

PM-DB: Partition-based Multi-instance Database System for Multicore Platforms

Fang Xi, Takeshi Mishima, Haruo Yokota
2015 Proceedings of the 17th International Conference on Enterprise Information Systems  
In this paper, we analyze the bottlenecks in existing database engines on a modern multicore platform using the mixed workload of the TPC-W benchmark and describe strategies for higher scalability and  ...  The continued evolution of modern hardware has brought several new challenges to database management systems (DBMSs).  ...  As the processor-memory gap is becoming larger on modern multicore platforms, we analyzed the possibility of optimizing the processor cache performance for concurrent queries.  ... 
doi:10.5220/0005370901280138 dblp:conf/iceis/XiMY15 fatcat:eqhqg3fd3zgqhkalyc6ujklxsu

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 ACM Transactions on Computer Systems  
We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture  ...  We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor microarchitecture is inefficient for running these workloads.  ...  APPENDIX A Comment: We multiply the number of LLC misses per cycle with the number of bytes fetched (64 bytes) and the frequency of the processor in Hertz.  ... 
doi:10.1145/2382553.2382557 fatcat:huy2nlmwibftnbrk32z77noowq

DeLoc: A Locality and Memory-congestion-aware Task Mapping Method for Modern NUMA Systems

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa
2020 IEEE Access  
On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and  ...  Conventional work on task mapping mostly focuses on improving the locality of memory accesses.  ...  Each processor consists of a group of processor cores that is physically associated with one or more memory controllers and memory devices.  ... 
doi:10.1109/access.2019.2963726 fatcat:xy2vshqhkvcc7fk7s3f4tmzdd4

Reducing Competitive Cache Misses in Modern Processor Architectures

Milcho Prisagjanec, Pece Mitrevski
2016 International Journal of Computer Science & Information Technology (IJCSIT)  
Inevitably, the development of modern processor architectures leads to an increased number of cache misses.  ...  The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses  ...  But, in modern processor architectures, there is competitive access of the threads to the first level cache with the prefetching technique.  ... 
doi:10.5121/ijcsit.2016.8605 fatcat:kxwdrc2pczet5bwh6j5waclezi

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi
2014 IEEE Micro  
We thank the DSLab for their assistance with SAT Solver, and Aamer Jaleel and Carole Jean-Wu for their assistance with understanding the Intel prefetchers and configuration.  ...  We thank the PARSA lab for continual support and feedback, in particular Pejman Lotfi-Kamran and Javier Picorel for their assistance with the SPEC-web09 and SAT Solver benchmarks.  ...  These results indicate that the memory accesses in scale-out workloads are replete with complex dependencies, limiting the MLP that can be found by modern aggressive processors.  ... 
doi:10.1109/mm.2014.41 fatcat:gowz5x2fjvbobhcm2p4qsy2nlu

Classifying and alleviating the communication overheads in matrix computations on large-scale NUMA multiprocessors

Yi-Min Wang, Hsiao-Hsi Wang, Ruei-Chuan Chang
1998 Journal of Systems and Software  
Therefore, we conclude that rectangular processor allocation policy performs better than other popular policies, and that the combination of rectangular processor allocation policy with software interleaving  ...  This methodology may reduce a lot of unnecessary memory accesses to the memory modules.  ...  Conclusion As we know, modern large-scale, shared-memory multiprocessors have non-uniform memory access costs.  ... 
doi:10.1016/s0164-1212(98)10040-7 fatcat:uhsirwqzsnfv5jthfqa4vjtmju

Hypervisor-assisted Atomic Memory Acquisition in Modern Systems

Michael Kiperberg, Roee Leon, Amit Resh, Asaf Algawi, Nezer Zaidenberg
2019 Proceedings of the 5th International Conference on Information Systems Security and Privacy  
We describe a hypervisor-based memory acquisition method that solves the two aforementioned deficiencies. We analyze the memory usage and performance of the proposed method.  ...  Unfortunately, the proposed method has two deficiencies: (1) the method does not support multiprocessing and (2) the method does not support modern operating systems featuring address space layout randomization  ...  Unfortunately, two problems arise with the described method in modern systems. The first problem is the availability of multiple processors.  ... 
doi:10.5220/0007566101550162 dblp:conf/icissp/KiperbergLRAZ19 fatcat:fnh5ba6ppbfibol5n2ni2o5kgm

Time-predictable Cache Organization

Martin Schoeberl
2009 2009 Software Technologies for Future Dependable Distributed Systems  
Caches are a mandatory feature of current processors to deliver instructions and data to a fast processor pipeline.  ...  The author would like to acknowledge the discussions with Wolgang Puffitsch, who is currently implementing and evaluating a fully associative data cache for JOP.  ...  for constants Furthermore, the integration of a program-or compilermanaged scratchpad memory can help to tighten the bounds for hard to analyze memory access patterns.  ... 
doi:10.1109/stfssd.2009.10 fatcat:c675ggrxdnh4hbv3vxwyscz3hi

Sequence Alignment Through the Looking Glass

Raja Appuswamy, Jacques Fellay, Nimisha Chaturvedi
2018 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
In this work, we answer the question "How do sequence aligners utilize modern processors?"  ...  We examine four state-ofthe-art aligners running on an Intel processor and identify that all aligners leave the processor substantially underutilized.  ...  However, there is no work that analyzes the efficiency of sequence aligners with respect to processor utilization.  ... 
doi:10.1109/ipdpsw.2018.00050 dblp:conf/ipps/AppuswamyFC18 fatcat:s4443chwingkrdxovhkwlpkruq

How to stop under-utilization and love multicores

Anastasia Ailamaki, Erietta Liarou, Pinar Tözün, Danica Porobic, Iraklis Psaroudakis
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
In addition, it examines the sources of under-utilization in a modern processor and presents insights and hardware/software techniques to better exploit the microarchitectural resources of a processor  ...  Then, it demonstrates the data and work sharing opportunities for analytical workloads, and reviews advanced scheduling mechanisms that are aware of nonuniform memory accesses and alleviate bandwidth saturation  ...  Additionally, memory is accessed through memory controllers of individual processors.  ... 
doi:10.1145/2588555.2588892 dblp:conf/sigmod/AilamakiLTPP14 fatcat:rkcyemhxhvh45d6gbzorxfinvi

Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

MuhammadYasir Qadri, KlausD McDonald-Maier
2009 EURASIP Journal on Embedded Systems  
Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device.  ...  These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a  ...  Modern 16-bit and 32-bit embedded processors increasingly contain cache memories to further instruction throughput and performance of the device.  ... 
doi:10.1155/2009/725438 fatcat:zdd2kmf635arnj3yt72u2zuq3i
« Previous Showing results 1 — 15 out of 42,276 results