Filters








29,024 Hits in 3.9 sec

Increasing Memory Utilization with Transient Memory Scheduling

Qi Wang, Jiguo Song, Gabriel Parmer, Andrew Sweeney, Guru Venkataramani
2012 2012 IEEE 33rd Real-Time Systems Symposium  
This paper introduces the TMEM system for increasing memory utilization while optimizing for application end-toend constraints such as meeting deadlines.  ...  In addition to the traditional spatial multiplexing of memory, TMEM introduces the predictable temporal multiplexing of memory within caches in a system component, and memory scheduling to continually  ...  CONCLUSIONS AND FUTURE WORK This paper presents the TMEM system for scheduling transient memory to increase effective memory capacity while optimizing for end-to-end application constraints.  ... 
doi:10.1109/rtss.2012.76 dblp:conf/rtss/WangSPSV12 fatcat:phhbuwl3q5dz7cmvin6gaod67e

Hedge Your Bets: Optimizing Long-term Cloud Costs by Mixing VM Purchasing Options [article]

Pradeep Ambati, Noman Bashir, David Irwin, Mohammad Hajiesmaili, Prashant Shenoy
2020 arXiv   pre-print
However, longer and less flexible time commitments can increase cloud costs for users if future workloads cannot utilize the VMs they committed to buying.  ...  Cloud platforms offer the same VMs under many purchasing options that specify different costs and time commitments, such as on-demand, reserved, sustained-use, scheduled reserve, transient, and spot block  ...  The figure shows that the overall cost increased relative to with the transient option because transient VMs were by far the cheapest option.  ... 
arXiv:2004.04302v1 fatcat:yulmxe6zivdwdbx2hxxot2kzy4

Review Paper on Fault Tolerant Scheduling in Multicore System

2018 VFAST Transactions on Software Engineering  
Most of the real-time systems used shared memory as dominant part.  ...  Another system used for scheduling approach in real-time systems was hybrid scheduling.  ...  Ready Task Queue Figure1:-Fault tolerant task scheduling algorithm for multicore system Recovery and re-execution are more convenient when it used checkpoints and can be utilized with soft real time system  ... 
doi:10.21015/vtse.v13i2.509 fatcat:4nxrmvlkprdrncejeo2mfthzay

Out-of-core Data Management for Path Tracing on Hybrid Resources

Brian Budge, Tony Bernardin, Jeff A. Stuart, Shubhabrata Sengupta, Kenneth I. Joy, John D. Owens
2009 Computer graphics forum (Print)  
The path tracer scales well with respect to CPUs, GPUs and memory per node as well as scaling with the number of nodes.  ...  The use of GPUs speeds up the runtime of these components by factors ranging from two to twenty, resulting in a substantial overall increase in rendering speed.  ...  Thanks also to Per Chistensen at Pixar for assistance with RenderMan. The authors gratefully acknowledge funding from the Department of Energy's Early Career Principal Investigator Award  ... 
doi:10.1111/j.1467-8659.2009.01378.x fatcat:rxqz7v7ctbadrcgand7oaigd4y

Characterizing Co-located Datacenter Workloads: An Alibaba Case Study [article]

Yue Cheng, Zheng Chai, Ali Anwar
2018 arXiv   pre-print
Warehouse-scale cloud datacenters co-locate workloads with different and often complementary characteristics for improved resource utilization.  ...  Two types of workload---long-running, user-facing, containerized production jobs, and transient, highly dynamic, non-containerized, and non-production batch jobs---are running on a shared cluster of 1313  ...  To improve resource utilization and thereby reduce costs, leading cloud infrastructure operators such as Google and Alibaba co-locate transient batch jobs with long-running, latency-sensitive, user-facing  ... 
arXiv:1808.02919v2 fatcat:etzdkl5nyze7rijeshodqtpr2i

The RCU-Reader Preemption Problem in VMs

Aravinda Prasad, K. Gopinath, Paul E. McKenney
2017 USENIX Annual Technical Conference  
The resulting CPU utilization and memory footprint increases can negate the serverconsolidation benefits of virtualization.  ...  Our evaluation shows 50% increase in the peak memory footprint and 155% increase in fragmentation for a microbenchmark, 23.71% increase in average kernel CPU utilization, 2.9× increase in the CPU time  ...  Transient memory spikes: As discussed earlier, when using call rcu(), GP delay due to vCPU preemption can cause transient memory-footprint spikes, which can in turn increase peak memory footprint.  ... 
dblp:conf/usenix/PrasadGM17 fatcat:vtsowsqskrbjxbhl6uaghvrzjy

Locality-information-based scheduling in shared-memory multiprocessors [chapter]

Frank Bellosa
1996 Lecture Notes in Computer Science  
While CPU utilization of processes still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies based on cache miss rates and information about synchronization  ...  The distribution of data structures and the usage of locality information characterizes the proposed memory-conscious scheduling architecture.  ...  The ELiTE project in cooperation with Convex Computer Corp. is supported by the Bavarian Consortium for High Performance Scientific Computing (FORTWIHR).  ... 
doi:10.1007/bfb0022298 fatcat:a2tqyztdcjapzmupo4sbvnq77e

Real-world design and evaluation of compiler-managed GPU redundant multithreading

Jack Wadden, Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan, Kevin Skadron
2014 SIGARCH Computer Architecture News  
Finally, we demonstrate the benefit of architectural support for RMT with a specific example of fast, register-level thread communication.  ...  We further analyze the individual costs of redundant work scheduling, redundant computation, and inter-thread communication, showing that no single component in general is responsible for high overheads  ...  An increase in radiation-induced transient faults due to shrinking process technologies and increasing operating frequencies [8, 27] , coupled with the increasing node count in supercomputers, has promoted  ... 
doi:10.1145/2678373.2665686 fatcat:sp5qrkfdzzfhrg5pho3oo6wjne

Real-world design and evaluation of compiler-managed GPU redundant multithreading

Jack Wadden, Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan, Kevin Skadron
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
Finally, we demonstrate the benefit of architectural support for RMT with a specific example of fast, register-level thread communication.  ...  We further analyze the individual costs of redundant work scheduling, redundant computation, and inter-thread communication, showing that no single component in general is responsible for high overheads  ...  An increase in radiation-induced transient faults due to shrinking process technologies and increasing operating frequencies [8, 27] , coupled with the increasing node count in supercomputers, has promoted  ... 
doi:10.1109/isca.2014.6853227 dblp:conf/isca/WaddenLGSS14 fatcat:ccshcrfm5rembof45ujamghcoy

Architectural Support for Fault Tolerance in a Teradevice Dataflow System

Sebastian Weis, Arne Garbade, Bernhard Fechner, Avi Mendelson, Roberto Giorgi, Theo Ungerer
2014 International journal of parallel programming  
Therefore, future many-core systems will require faulttolerance techniques, which are capable to scale with the number of cores and the increasing failure probability on a chip in conjunction with a reasonable  ...  In detail, we provide methods to dynamically detect and manage permanent, intermittent, and transient faults during runtime.  ...  However, with an increasing node size, fib (36) is no longer able to utilize all cores.  ... 
doi:10.1007/s10766-014-0312-y fatcat:kygdzmqyvrbonia2cu7n4glnsu

Workload propagation - overload in bursty servers

Qi Zhang, A. Riska, E. Riedel
2005 Second International Conference on the Quantitative Evaluation of Systems (QEST'05)  
We illustrate the effectiveness of a scheduling mechanism at the disk and provide a proof of concept that such self-adaptive scheduling mechanisms at the lower levels are a step toward faster system recovery  ...  under conditions of transient overload, and can thus complement admission control at the front end.  ...  Recall that for the experiments with the large database, memory may become the bottleneck as the ITEM table is 512 MB and the available memory only 768 MB.  ... 
doi:10.1109/qest.2005.43 dblp:conf/qest/ZhangRR05 fatcat:nephgsqkgrhrjlvblifpagilu4

Predictable, system-level fault tolerance in composite

Jiguo Song, Gabriel Parmer
2013 ACM SIGBED Review  
We note that the costs of recovery are small (35 and 46 µsec) with small variation, and are dominated (over 80% of the cost) by mem_cpy and mem_set to reset the service's memory to an initial state.  ...  A checkpoint of the service has periods of inconsistency with the state of a task -a scheduler that dispatches a task after a checkpoint will lose the accounting for that time if it rolled back.  ...  We note that the costs of recovery are small (35 and 46 µsec) with small variation, and are dominated (over 80% of the cost) by mem_cpy and mem_set to reset the service's memory to an initial state.  ... 
doi:10.1145/2518148.2518169 fatcat:2ouz64ppnfdobhqmjukvcvk4sa

Holmes

Aidi Pi, Xiaobo Zhou, Chengzhong Xu
2022 Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing  
Co-location of latency-critical services with best-effort batch jobs is commonly adopted in production systems to increase resource utilization.  ...  Holmes tackles two challenges: accurately measuring SMT interference on memory access, and efficiently adjusting CPU allocation to achieve low latency and high resource utilization at the same time.  ...  For the thread with the maximum RPS, its maximum RPS decreases from ∼ 70, 000 to ∼ 45, 000 with the increasing RPS on its sibling thread. Its memory access latency also increases.  ... 
doi:10.1145/3502181.3531464 fatcat:xmcmdsuinbbjxpaxbq3qsffm2y

Exploring the Impact of Task Preemption on Dependability in Time-Triggered Embedded Systems: A Pilot Study

Michael Short, Michael J. Pont, Jianzhong Fang
2008 2008 Euromicro Conference on Real-Time Systems  
Our particular focus in this exploratory study is on static-priority, time-triggered scheduler architectures.  ...  The study is empirical in nature and we employ a hardware-in-the-loop (HIL) testbed, representing a cruise control system for a passenger vehicle, in conjunction with fault-injection to perform the dependability  ...  The most significant differences being the increase in RAM requirements (651.1 %) and scheduler overheads (563.8 %), followed by an increased CPU utilization (13.4 %).  ... 
doi:10.1109/ecrts.2008.14 dblp:conf/ecrts/ShortPF08 fatcat:qzrbneshzbhrxjkvv4dbuu7dyq

Efficient high throughput decoding architecture for non-binary LDPC codes

C Arul Murugan, B Banuselvasaraswathy, K Gayathree, M Ishwarya Niranjana
2018 International Journal of Engineering & Technology  
Consequently, it is necessary to increase the throughput for improving the efficiency of the system.  ...  This article, deals with efficient trellis inbuilt decoding architecture for non-binary Linear Density Parity Check (LDPC) codes.  ...  It consists of permutation network, message compression and decompression unit, forward and backward memory unit and filter circuit The figure 12 shows the transient response of the decoding architecture  ... 
doi:10.14419/ijet.v7i2.8.10407 fatcat:ilvkh4iixbbjdeqxu2odqr7tay
« Previous Showing results 1 — 15 out of 29,024 results