Filters








393 Hits in 6.1 sec

A scheduling policy for preserving cache locality in a multiprogrammed system

Inbum Jung, Jongwoong Hyun, Joonwon Lee
2000 Journal of systems architecture  
To solve this requirement, we propose a preemption-safe policy to exploit the cache locality of blocked programs in a multiprogrammed system.  ...  In a multiprogrammed system, when the operating system switches contexts, in addition to the cost for handling the processes being swapped out and in, the cache performance of processors also can be aected  ...  Acknowledgements This work was supported in part by National Research Laboratory Program funded by Ministry of Science and Technology and university S/W research center program by Ministry of Information  ... 
doi:10.1016/s1383-7621(00)00020-5 fatcat:bgqgndhm45defdhs2ylfos3nxe

Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs [chapter]

Christos D. Antonopoulos, Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou
2004 Lecture Notes in Computer Science  
In this paper we reformulate the thread scheduling problem on multiprogrammed SMPs.  ...  Therefore, we present and evaluate two realistic scheduling policies which treat memory bandwidth as a first-class resource.  ...  Acknowledgements The first author is supported by a grant from 'Alexander S. Onassis' public benefit foundation and the European Commission through IST grant No. 2001-33071.  ... 
doi:10.1007/978-3-540-30474-6_33 fatcat:elyq5hocivhazeb54xkn4imndi

Processor allocation policies for message-passing parallel computers

Cathy McCann, John Zahorjan
1994 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems - SIGMETRICS '94  
When multiple jobs compete for processing resources qThis materiaf is  ...  Acknowledgements The authors thank Martin Tompa for valuable discussions regarding the analysis of the Folding rotation scheme.  ...  In contrast, Equipartition shows some tendency to- wards a local minimum, especially for high loads.  ... 
doi:10.1145/183018.183022 dblp:conf/sigmetrics/McCannZ94 fatcat:zi3jy5wq3baelpwdzw76n4dszm

Processor allocation policies for message-passing parallel computers

Cathy McCann, John Zahorjan
1994 Performance Evaluation Review  
When multiple jobs compete for processing resources qThis materiaf is  ...  Acknowledgements The authors thank Martin Tompa for valuable discussions regarding the analysis of the Folding rotation scheme.  ...  In contrast, Equipartition shows some tendency to- wards a local minimum, especially for high loads.  ... 
doi:10.1145/183019.183022 fatcat:hlhwokoafra23ed4z67wl4hizy

Scheduling algorithms with bus bandwidth considerations for SMPs

C.D. Antonopoulos, D.S. Nikolopoulos, T.S. Papatheodorou
2003 2003 International Conference on Parallel Processing, 2003. Proceedings.  
The new scheduling policies improve system throughput by up to 68% (26% in average) in comparison with the standard Linux scheduler.  ...  However, both software and scheduling policies for these systems generally focus on memory hierarchy optimizations and do not address the bus bandwidth limitations directly.  ...  The same philosophy is followed in SMP operating systems for scheduling multiprogrammed workloads with time-sharing. All SMP schedulers use cache affinity links for each thread.  ... 
doi:10.1109/icpp.2003.1240622 dblp:conf/icpp/AntonopoulosNP03 fatcat:vqmmhe3ztfhzxbx5ymk5fn7q54

Scheduling Algorithms with Bus Bandwidth Considerations for SMPs [chapter]

Christos D. Antonopoulos, Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou
2006 High-Performance Computing  
The new scheduling policies improve system throughput by up to 68% (26% in average) in comparison with the standard Linux scheduler.  ...  However, both software and scheduling policies for these systems generally focus on memory hierarchy optimizations and do not address the bus bandwidth limitations directly.  ...  The same philosophy is followed in SMP operating systems for scheduling multiprogrammed workloads with time-sharing. All SMP schedulers use cache affinity links for each thread.  ... 
doi:10.1002/0471732710.ch16 fatcat:oq2pxgeq2bbmnclpyngg5tw4k4

Adaptive two-level thread management for fast MPI execution on shared memory machines

Kai Shen, Hong Tang, Tao Yang
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
There is also work on OS scheduling to exploit cache affinity [30] . We combine these two ideas together and extend them for the MPI runtime system.  ...  SGI machines, thread yielding and resumption is cumbersome and fairly slow (e.g. the thread yield function resumes a kernel thread in a non-deterministic manner and the shortest sleep interval for a nanosleep  ...  We would like to thank Bill Gropp, Eric Salo, and anonymous referees for their helpful comments, and Claus Jeppesen for his help in using Origin 2000 at UCSB.  ... 
doi:10.1145/331532.331581 dblp:conf/sc/ShenTY99 fatcat:x3sbdt6cmnclzgxftkoavp4v3u

A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors

Shoaib Akram, Alexandros Papakonstantinou, Rakesh Kumar, Deming Chen
2010 International Journal of Reconfigurable Computing  
In this paper, we first motivate the need for workload-adaptive interconnection networks.  ...  Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads.  ...  The IO cache snoops the requests in a similar Figure 16 : The system architecture used for full-system mode simulation. manner as the rest of L2 caches in the system.  ... 
doi:10.1155/2010/205852 fatcat:xzmi24sg3zgbzok3rrfyrowtha

Cache restoration for highly partitioned virtualized systems

David Daly, Harold W. Cain
2012 IEEE International Symposium on High-Performance Comp Architecture  
While most systems allow for partitioning at the relatively coarse grain of a single core, some systems also support multiprogrammed virtualization, whereby a system can be more finely partitioned through  ...  Through cycle-accurate simulation of a POWER7 system, we show that when applied to its private per-core L3 last-level cache, the warm cache translates into 20% on average performance improvement for a  ...  For example, in a typical Nehalem EX system, an L3 cache miss requires 79 ns to receive the data from DRAM.  ... 
doi:10.1109/hpca.2012.6169029 dblp:conf/hpca/DalyC12 fatcat:oj3lparhifffnmfytdnhdbwjbu

The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors

Raj Vaswani, John Zahorjan
1991 Proceedings of the thirteenth ACM symposium on Operating systems principles - SOSP '91  
A scheduling policy that ignores this affinity may waste processing power by causing excessive cache refilling.  ...  In a shared memory multiprocessor with caches, executing tasks develop "affinity" to processors by filling their caches with data and instructions during execution.  ...  Using an analytic model of cache footprint behavior, and an analytic model of a multiprogrammed system and its workload, they concluded that affinity scheduling can have a pronounced effect on performance  ... 
doi:10.1145/121132.121140 dblp:conf/sosp/VaswaniZ91 fatcat:c2jnnrx5mnat3no5tmuepnm2hy

A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models [chapter]

Ioannis E. Venetis, Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou
2001 Lecture Notes in Computer Science  
Our experiments show that using these services in a multiprogrammed SMP yields a throughput improvement of up to 41.2%.  ...  This paper defines a unified set of services, implemented at the operating system level, which can be used to embed adaptability in any thread-based programming paradigm.  ...  Local scheduling is also performed if a thread controlled by our scheduler is dequeued from the run-queue of the native scheduler.  ... 
doi:10.1007/3-540-44681-8_75 fatcat:kc6gucwhzjg6pcb6le7aixdtdm

Flex memory: Exploiting and managing abundant off-chip optical bandwidth

Ying Wang, Lei Zhang, Yinhe Han, Huawei Li, Xiaowei Li
2011 2011 Design, Automation & Test in Europe  
To further preserve locality and maintain service parallelism for different workloads, page folding technique is employed to achieve adaptive data mapping in photonics-connected DRAM chips via optical  ...  However, current DRAM organization has mainly been optimized for a higher storage capacity and package pin utilization.  ...  Both open-page and close-page management policies with first-ready-first-come-first-serve (FR-FCFS) and batching scheduling are evaluated. The simulated system is organized as shown in Table II.  ... 
doi:10.1109/date.2011.5763157 dblp:conf/date/WangZHLL11 fatcat:v2d3w7fjxrfhndfnmukxsdiag4

Informing algorithms for efficient scheduling of synchronizing threads on multiprogrammed SMPs

C.D. Antonopoulos, D.S. Nikolopoulos, T.S. Papatheodorou
2001 International Conference on Parallel Processing, 2001.  
The applications are given the opportunity to influence, in a non-intrusive manner, the scheduling decisions concerning their threads.  ...  We present novel algorithms for efficient scheduling of synchronizing threads on multiprogrammed SMPs. The algorithms are based on intra-application priority control of synchronizing threads.  ...  The problem of scheduling synchronizing threads in a multiprogramming environment has not been adequately addressed, if at all, in contemporary commercial SMP schedulers for small-and medium-scale systems  ... 
doi:10.1109/icpp.2001.952054 dblp:conf/icpp/AntonopoulosNP01 fatcat:itlttyxpynbrrpariy3642ori4

Optimizing virtual machine scheduling in NUMA multicore systems

Jia Rao, Kun Wang, Xiaobo Zhou, Cheng-Zhong Xu
2013 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)  
We propose a Bias Random vCPU Migration (BRM) algorithm that dynamically migrates vCPUs to minimize the system-wide uncore penalty. We have implemented the scheme in the Xen virtual machine monitor.  ...  Experiment results on a two-way Intel NUMA multicore system with various workloads show that BRM is able to improve application performance by up to 31.7% compared with the default Xen credit scheduler  ...  Acknowledgements We are grateful to the anonymous reviewers for their constructive comments. This research was supported in part by the U.S.  ... 
doi:10.1109/hpca.2013.6522328 dblp:conf/hpca/RaoWZX13 fatcat:jz2yyayw4jgzbozgdvvj7hsivm

The Locality Principle [chapter]

Peter J. Denning
2006 Communication Networks and Computer Systems  
It remains a rich source of inspirations for contemporary research in architecture, caching, Bayesian inference, forensics, web-based business processes, context-aware software, and network science.  ...  Locality is among the oldest systems principles in computer science. It was discovered in 1967 during efforts to make early virtual memory systems work well.  ...  A feedback control system can stabilize the multiprogramming level and prevent thrashing. The amount of free space is monitored and fed back to the scheduler.  ... 
doi:10.1142/9781860948947_0004 fatcat:t2wgrmpozja7lj2fqsmwq5ignm
« Previous Showing results 1 — 15 out of 393 results