69 Hits in 5.2 sec

Modeling communication in cache-coherent SMP systems

Sabela Ramos, Torsten Hoefler
2013 Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13  
A PERFORMANCE MODEL FOR COM-MUNICATION IN CACHE-COHERENT SYSTEMS In most multi-core systems, the only way to communicate data from one thread, T0, to another thread, T1, is to issue load and store instructions  ...  We developed an intuitive performance model for cache-coherent architectures and demonstrate its use with the currently most scalable cache-coherent many-core architecture, Intel Xeon Phi.  ...  Acknowledgments We thank the Swiss National Supercomputing Center (CSCS), especially Hussein Harake, Thomas Schoenemeyer, and Thomas Schulthess, for providing access to and support with Xeon Phi hardware  ... 
doi:10.1145/2493123.2462916 fatcat:5okdd5xclzbozavpsz3yykoud4

Performance Evaluation of Massively Parallel Systems Using SPECOMP Suite

Dheya Mustafa
2022 Computers  
On the other hand, the Intel Xeon Phi coprocessor armed with 61 on-chip x86 cores, provides high theoretical peak performance, as well as software development flexibility with existing high-level programming  ...  Performance analysis plays an essential role in achieving a scalable performance of applications on massively parallel supercomputers equipped with thousands of processors.  ...  In [32] , the authors studied the performance and scalability of OpenMP programs on Xeon Phi in stand-alone mode, and they compared it with a two-socket Xeon-based system.  ... 
doi:10.3390/computers11050075 fatcat:4lcuefuno5fwbdibxht43taw74

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Subhash Saini, Haoqiang Jin, Dennis Jespersen, Huiyu Feng, Jahed Djomehri, William Arasin, Robert Hood, Piyush Mehrotra, Rupak Biswas
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors.  ...  In this paper we present preliminary results based on our performance evaluation of various aspects of a Phi-based system.  ...  The cores in each of the Phi coprocessors share an 8-GB cache-coherent memory system. Each Phi is connected to other devices on the node via a separate 16-lane PCI Express (PCIe) bus [4] [5] .  ... 
doi:10.1145/2503210.2503272 dblp:conf/sc/SainiJJFDAHMB13 fatcat:v5e5l3uvlzbd3c7gzhushuu3ku

A programming system for xeon phis with runtime SIMD parallelization

Xin Huo, Bin Ren, Gagan Agrawal
2014 Proceedings of the 28th ACM international conference on Supercomputing - ICS '14  
In this paper, we consider the problem of accelerating applications involving different communication patterns on Xeon Phis, with an emphasis on effectively using available SIMD parallelism.  ...  The Intel Xeon Phi offers a promising solution to coprocessing, since it is based on the popular x86 instruction set.  ...  MIMD Parallelization Issues A Xeon Phi can be viewed as a SMP machine, in which all the cores not only share the same memory address, but also a coherent Particularly, applications with different communication  ... 
doi:10.1145/2597652.2597682 dblp:conf/ics/HuoRA14 fatcat:klizb5inbzgi5ipwgzic6n334i

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Pawel Gepner, Victor Gamayunov, David L. Fraser, Eric Houdard, Ludovic Sauge, Damien Declat, Mathieu Dubois
2014 Journal of Computers  
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor.  ...  We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU.  ...  PLATFORM ARCHITECTURE AND SYSTEM CONFIGURATION In our study we utilized two systems, the first one based on a dual socket Intel Xeon E5-2687W CPU processor with a single coprocessor card and the second  ... 
doi:10.4304/jcp.9.7.1566-1571 fatcat:lkjta53agzf2fiaeo7nfh4rtpm

An Empirical Study of Intel Xeon Phi [article]

Jianbin Fang, Ana Lucia Varbanescu, Henk Sips, Lilun Zhang, Yonggang Che, Chuanfu Xu
2013 arXiv   pre-print
With at least 50 cores, Intel Xeon Phi is a true many-core architecture.  ...  Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s.  ...  Being an x86 SMP-on-a-chip architecture, Xeon Phi offers the full capability to use the same tools, programming languages, and programming models as a regular Intel Xeon processor.  ... 
arXiv:1310.5842v2 fatcat:dmez6j673nfj5iqoqwvignbkxi

Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi

Naveen Namashivayam, Sayan Ghosh, Dounia Khaldi, Deepak Eachempati, Barbara Chapman
2014 Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14  
Given the importance of communication in parallel architectures, this paper describes a novel methodology for optimizing remote-memory accesses for execution of OpenSHMEM programs on Intel Xeon Phi processors  ...  Moreover, we study different reduction algorithms and exploit local load/store to optimize data transfers in these algorithms for Xeon Phi which permits improvement of up to 22% compared to MVAPICH and  ...  A performance model for cache-coherent SMP systems is developed in [18] . Xeon Phi is used to showcase the ap-plicability of this model.  ... 
doi:10.1145/2676870.2676881 dblp:conf/pgas/NamashivayamGKEC14 fatcat:nrrn5wksxfgqphco47ywlzjs44

Test-driving Intel Xeon Phi

Jianbin Fang, Henk Sips, LiLun Zhang, Chuanfu Xu, Yonggang Che, Ana Lucia Varbanescu
2014 Proceedings of the 5th ACM/SPEC international conference on Performance engineering - ICPE '14  
Next, we choose a medical imaging application (Leukocyte Tracking) as a case study.  ...  Given its promised ease-of-use and high performance, we took Xeon Phi out for a test drive.  ...  The authors would like to thank Sabela Ramos Garea from University of A Coruña and Evghenii Gaburov from SURF-sara for the numerous on-line discussions.  ... 
doi:10.1145/2568088.2576799 dblp:conf/wosp/FangSZXCV14 fatcat:gvrvvmqusnfprcclbgb3u4skky

Retargeting of the Open Community Runtime to Intel Xeon Phi

Jiri Dokulil, Siegfried Benkner
2015 Procedia Computer Science  
Since manycore architectures like the Intel Xeon Phi are likely to play a major role in future high performance systems, we have implemented the OCR API for shared-memory machines, including the Xeon Phi  ...  The Open Community Runtime (OCR) is a recent effort in the search for a runtime for extreme scale parallel systems.  ...  pay in such a case (i.e., some cache misses).  ... 
doi:10.1016/j.procs.2015.05.335 fatcat:z7m2uxvnbbac5icj5hj7mwhskq

Scaling the capacity of memory systems; evolution and key approaches

Kyriakos Paraskevas, Andrew Attwood, Mikel Lujan, John Goodacre
2019 Proceedings of the International Symposium on Memory Systems - MEMSYS '19  
Despite such efforts, the fundamental problems of maintaining cache coherence across a scaled system with thousands of nodes is not something that any of the current approaches are capable of efficiently  ...  Computer clusters were the first configurations to eventually provide a Distributed Shared Memory (DSM) system at a linear cost while also being more scalable than the traditional cache coherent NUMA systems  ...  A version of HMC named Multi-Channel DRAM (MCDRAM) was developed in partnership with Intel and Micron to be used in the Intel Xeon Phi processor codenamed Knights Landing.  ... 
doi:10.1145/3357526.3357555 dblp:conf/memsys/ParaskevasALG19 fatcat:qlsuo7csujg3rhbwf3efqljhta

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor

Alexander Heinecke, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy, Alexander Kobotov, Roman Dubtsov, Greg Henry, Aniruddha G. Shet, George Chrysos, Pradeep Dubey
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
In this paper we describe how several flavors of the Linpack benchmark are accelerated on Intel's recently released Intel R Xeon Phi TM 1 co-processor (code-named Knights Corner) in both native and hybrid  ...  This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators.  ...  We would also like to thank Catherine Djunaedi, Ravi Murty and Susan Meredith for their continuous support in building, debugging and analyzing the systems.  ... 
doi:10.1109/ipdps.2013.113 dblp:conf/ipps/HeineckeVSKDHSCD13 fatcat:ggcaf5r7zbadzbcdhigrvllamu

Rhymes: A shared virtual memory system for non-coherent tiled many-core architectures

King Tin Lam, Jinghao Shi, Dominic Hung, Cho-Li Wang, Zhiquan Lai, Wangbin Zhu, Youliang Yan
2014 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)  
Rhymes features a two-way cache coherence protocol to enforce release consistency for pages allocated in shared physical memory (SPM) and scope consistency for pages in percore private memory.  ...  This paper presents a shared virtual memory (SVM) system, dubbed Rhymes, tailored to new processor kinds of non-coherent and hybrid memory architectures.  ...  ., Ltd. for their kind support of the SCC platform in their Wuxi data centers for this work.  ... 
doi:10.1109/padsw.2014.7097807 dblp:conf/icpads/LamSHWLZY14 fatcat:sadkzvqywjepzenwnp32nr65fi

Tibidabo: Making the case for an ARM-based HPC system

Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, Alex Ramirez
2014 Future generations computer systems  
In this paper we advocate a different approach: building HPC systems from low-power embedded and mobile technology parts, over time designed for maximum energy efficiency, which now show promise for competitive  ...  We present the lessons learned for the design and improvement in energy efficiency of future HPC systems based on such lowpower cores.  ...  In addition, authors would like to thank to Bernard Ortiz de  ... 
doi:10.1016/j.future.2013.07.013 fatcat:psrd2ufjffgzhlrjiwbtu2tfhm

Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory Processors

Amrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger, Jeongnim Kim
2017 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
These optimizations are portable on four distinct cache-coherent architectures and result in up to 5.6x performance enhancements on Intel Xeon Phi processor 7250P (KNL), 5.7x on Intel Xeon Phi coprocessor  ...  Then by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes.  ...  Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.  ... 
doi:10.1109/ipdps.2017.33 dblp:conf/ipps/MathuriyaLBSK17 fatcat:6km46ws4yvahbjhawam7jy6dr4

Optimising simulation data structures for the Xeon Phi

Mozhgan K. Chimeh, Paul Cockshott
2016 2016 International Conference on High Performance Computing & Simulation (HPCS)  
Index Terms-Xeon Phi; many integrated core (MIC); Gatelevel simulation; Parallel logic simulation • All gates are two input, NOT is represented by a NAND with duplicate inputs, 3 input ANDs made up of  ...  In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines.  ...  Interconnection among cores is based on a ring network model that allows the L2 caches for each core to be accessible by all others. In all, a total coherent cache of over 30MB is available.  ... 
doi:10.1109/hpcsim.2016.7568358 dblp:conf/ieeehpcs/ChimehC16 fatcat:dflqcw2cljg4veeofc43w2msfa
« Previous Showing results 1 — 15 out of 69 results