Filters








8,869 Hits in 5.5 sec

Extending OpenMP for NUMA Machines

John Bircsak, Peter Craig, RaeLyn Crowell, Zarka Cvetanovic, Jonathan Harris, C. Alexander Nelson, Carl D. Offner
2000 Scientific Programming  
OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures.  ...  Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data  ...  dec$ migrate next touch (X) This user-directed form of page migration avoids the delays associated with gathering statistics for automatic page migration.  ... 
doi:10.1155/2000/464182 fatcat:axpjz3lj55awrgp35f2vurngy4

Extending OpenMP For NUMA Machines

J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C.A. Nelson, C.D. Offner
2000 ACM/IEEE SC 2000 Conference (SC'00)  
OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures.  ...  Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data  ...  dec$ migrate next touch (X) This user-directed form of page migration avoids the delays associated with gathering statistics for automatic page migration.  ... 
doi:10.1109/sc.2000.10019 dblp:conf/sc/BircsakCCCHNO00 fatcat:6zmlgcpnfbbfrjevezaoe7pmmm

Distance-aware round-robin mapping for large NUCA caches

Alberto Ros, Marcelo Cintra, Manuel E. Acacio, Jose M. Garcia
2009 2009 International Conference on High Performance Computing (HiPC)  
Our policy tries to map the pages accessed by a core to its closest (local) bank, like in a firsttouch policy.  ...  We also show that the private cache indexing commonly used in many-core architectures is not the most appropriate for OS-managed distance-aware mapping policies, and propose to employ different bits for  ...  Alberto Ros is supported by a research grant from Spanish MEC under the FPU national plan (AP2004-3735).  ... 
doi:10.1109/hipc.2009.5433220 dblp:conf/hipc/RosCAG09 fatcat:scsn6jbwcvesddpspej3bphrda

The Effect of Multi-core on HPC Applications in Virtualized Systems [chapter]

Jaeung Han, Jeongseob Ahn, Changdae Kim, Youngjin Kwon, Young-ri Choi, Jaehyuk Huh
2011 Lecture Notes in Computer Science  
As such non-uniformity in memory systems increases, NUMA and cache awareness in VM scheduling will be critical for shared memory applications.  ...  Due to the lack of support for non-uniform memory access (NUMA) in the Xen hypervisor, shared memory applications suffer from a significant performance degradation by virtualization.  ...  To eliminate the effect of vCPU migration, we fix each vCPU to a physical core. In this case, the Xen scheduler cannot migrate vCPUs.  ... 
doi:10.1007/978-3-642-21878-1_76 fatcat:6gjkpgfgzzgu3pedabuykdxzem

CODA

Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel Loh
2018 ACM Transactions on Architecture and Code Optimization (TACO)  
In today's systems, where no computation occurs in memory modules, the physical address space is interleaved at a fine granularity among all memory modules to help improve the utilization of processor-memory  ...  In order to address this new challenge, we propose a set of techniques that (1) enable collections of OS pages to either be fine-grain interleaved among memory modules (as is done today) or to be placed  ...  The OS should be aware of the dual-mode address mapping (1) to indicate the granularity information in the PTEs and TLB entries, and (2) for page management, such as free page management or page replacement  ... 
doi:10.1145/3232521 fatcat:vrmsepasrfgadanadruj6bvuoq

The Impact of Dynamic Directories on Multicore Interconnects

Matthew Schuchhardt, Abhishek Das, Nikos Hardavellas, Gokhan Memik, Alok Choudhary
2013 Computer  
Dynamic Directories exhibit the same gains as PCD on private pages.  ...  This eliminates a large fraction of on-chip interconnect traversals, thus reducing interconnect power and energy consumption by up to 37.3% (22.9% on average for scientific workloads, and 8.0% for Map-Reduce  ...  The placement is performed at the granularity of pages, i.e., the directory entries for all blocks within a page are placed at the same tile.  ... 
doi:10.1109/mc.2013.334 fatcat:j53gelrml5fddb6gpempritg7i

Page Classifier and Placer: A Scheme of Managing Hybrid Caches [chapter]

Xin Yu, Xuanhua Shi, Hai Jin, Xiaofei Liao, Song Wu, Xiaoming Li
2014 Lecture Notes in Computer Science  
We propose a new HCA approach that enables OS to be aware of underlying hybrid cache architecture and to control data placement, at OS page level, onto difference cache regions.  ...  Our approach employs a light-weighted hardware profiler to monitor cache behaviors at OS page level and to capture the hot pages.  ...  Page Placer The page placer is designed to determine where and when to migrate a candidate page to a new physical page.  ... 
doi:10.1007/978-3-662-44917-2_2 fatcat:2kfksbpiszejrjefjbtdss2u4e

When physical is not real enough

Frank Bellosa
2004 Proceedings of the 11th workshop on ACM SIGOPS European workshop: beyond the PC - EW11  
Furthermore, the programmable memory controller is responsible for the allocation and migration of memory according to power and performance demands.  ...  This position paper argues that policies for physical memory management and for memory power mode control should be relocated to the system software of a programmable memory management controller (MMC)  ...  The mapping is done on the granularity of pages.  ... 
doi:10.1145/1133572.1133573 dblp:conf/sigopsE/Bellosa04 fatcat:qxfjhs4rnrcofgdak7m67n57da

CloudSSI

Mansoor Alicherry, Ashok Anand, Shoban Preeth Chandrabose, Theophilius Benson
2013 Proceedings of the 4th annual Symposium on Cloud Computing - SOCC '13  
We make a case for leveraging the old idea of single system image (SSI) in the cloud context.  ...  A potential way to deal with these issues is to keep the backups of remote memory pages in local disk; the hypervisor (or OS or middleware) should be made aware of these backups to retrieve from local  ...  We make a case for leveraging the old idea of single system image (SSI) in the cloud context.  ... 
doi:10.1145/2523616.2525959 dblp:conf/cloud/AlicherryACB13 fatcat:jdng5d24grb2lfvuiig4xmojki

User Interface User Interface Migration Based on the Use of Logical Descriptions [chapter]

Giuseppe Ghiani, Fabio Paternò, Carmen Santoro
2011 Migratory Interactive Applications for Ubiquitous Environments  
In our solution we opted for a distribution down to the granularity of the single interactor but no deeper, since we judged such fine granularity unimportant for our goals.  ...  In this case, a partial migration will be done since the user interactively selects the parts of the UI that are of interest for him/her.  ... 
doi:10.1007/978-0-85729-250-6_5 dblp:series/hci/GhianiPS11 fatcat:j3xsl5tdjnhkfnirgkzaditrni

ICE: Managing cold state for big data applications

Badrish Chandramouli, Justin Levandoski, Eli Cortez
2016 2016 IEEE 32nd International Conference on Data Engineering (ICDE)  
We present ICE (incremental coldstate engine), a framework that allows an SPE to seamlessly migrate cold state to secondary storage (disk or flash).  ...  A stream processing engine (SPE) enables such a seamless M3 loop for applications such as targeted advertising, recommender systems, risk analysis, and call-center analytics.  ...  Migration-Aware Streaming Join As a concrete example, this section describes the implementation details for a migration-aware streaming join operator.  ... 
doi:10.1109/icde.2016.7498262 dblp:conf/icde/ChandramouliLC16 fatcat:mzm3bppvbjgbjmyromqoklcnpa

Asymmetric-access aware optimization for STT-RAM caches with process variations

Yi Zhou, Chao Zhang, Guangyu Sun, Kun Wang, Yu Zhang
2013 Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI - GLSVLSI '13  
First, we demonstrate that a write-oriented data migration is preferred. Second, a block remapping is necessary to prevent some cache sets from being significantly affected by process variations.  ...  In order to overcome the problem of process variations, we propose to apply the variable-latency access method to STT-RAM caches by introducing a variation-aware LRU (Least Recently Used) policy.  ...  Sun et al proposed a variation aware data management for non-uniform cache architectures. They compensate write time variations via dynamic data migration [11] .  ... 
doi:10.1145/2483028.2483079 dblp:conf/glvlsi/ZhouZSWZ13 fatcat:bvdhvl3xijgrpgyzkchnxu6jwq

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Jiacheng Zhang, Jiwu Shu, Youyou Lu
2016 USENIX Annual Technical Conference  
ParaFS is a log-structured file system over a simplified block-level FTL that exposes the physical layout.  ...  Evaluations show that ParaFS effectively improves system performance for write-intensive workloads by 1.6× to 3.1×, compared to the flash-optimized F2FS file system.  ...  Acknowledgments We thank our shepherd Haryadi Gunawi and anonymous reviewers for their feedbacks and suggestions.  ... 
dblp:conf/usenix/ZhangSL16 fatcat:pbhzssee2ja4jgkxjandliwbva

Toward dependency-aware live virtual machine migration

Anthony Nocentino, Paul M. Ruth
2009 Proceedings of the 3rd international workshop on Virtualization technologies in distributed computing - VTDC '09  
This paper proposes a novel dependency-aware approach to live virtual machine migration and presents the results of the initial investigation into its ability to reduce migration latency and overhead.  ...  Unfortunately, the need for live migration increases during times when resources are most scarce.  ...  This case has a persistent external dependency and should not perform any better using dependancy-aware migration.  ... 
doi:10.1145/1555336.1555347 dblp:conf/icac/NocentinoR09 fatcat:g3vv7a7xrzhvvh7obc6mrdf5y4

PsmArena: Partitioned shared memory for NUMA-awareness in multithreaded scientific applications

Zhang Yang, Aiqing Zhang, Zeyao Mo
2021 Tsinghua Science and Technology  
In this paper, we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware  ...  heap manager which eliminates false page-sharing and minimizes fragmentation.  ...  The authors thank the reviewers for their helpful comments.  ... 
doi:10.26599/tst.2019.9010036 fatcat:kodfvslnmbd4thfqyrjxuy47hq
« Previous Showing results 1 — 15 out of 8,869 results