A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Modeling communication in cache-coherent SMP systems
2013
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13
A PERFORMANCE MODEL FOR COM-MUNICATION IN CACHE-COHERENT SYSTEMS In most multi-core systems, the only way to communicate data from one thread, T0, to another thread, T1, is to issue load and store instructions ...
We developed an intuitive performance model for cache-coherent architectures and demonstrate its use with the currently most scalable cache-coherent many-core architecture, Intel Xeon Phi. ...
Acknowledgments We thank the Swiss National Supercomputing Center (CSCS), especially Hussein Harake, Thomas Schoenemeyer, and Thomas Schulthess, for providing access to and support with Xeon Phi hardware ...
doi:10.1145/2493123.2462916
fatcat:5okdd5xclzbozavpsz3yykoud4
Performance Evaluation of Massively Parallel Systems Using SPECOMP Suite
2022
Computers
On the other hand, the Intel Xeon Phi coprocessor armed with 61 on-chip x86 cores, provides high theoretical peak performance, as well as software development flexibility with existing high-level programming ...
Performance analysis plays an essential role in achieving a scalable performance of applications on massively parallel supercomputers equipped with thousands of processors. ...
In [32] , the authors studied the performance and scalability of OpenMP programs on Xeon Phi in stand-alone mode, and they compared it with a two-socket Xeon-based system. ...
doi:10.3390/computers11050075
fatcat:4lcuefuno5fwbdibxht43taw74
An early performance evaluation of many integrated core architecture based SGI rackable computing system
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors. ...
In this paper we present preliminary results based on our performance evaluation of various aspects of a Phi-based system. ...
The cores in each of the Phi coprocessors share an 8-GB cache-coherent memory system. Each Phi is connected to other devices on the node via a separate 16-lane PCI Express (PCIe) bus [4] [5] . ...
doi:10.1145/2503210.2503272
dblp:conf/sc/SainiJJFDAHMB13
fatcat:v5e5l3uvlzbd3c7gzhushuu3ku
A programming system for xeon phis with runtime SIMD parallelization
2014
Proceedings of the 28th ACM international conference on Supercomputing - ICS '14
In this paper, we consider the problem of accelerating applications involving different communication patterns on Xeon Phis, with an emphasis on effectively using available SIMD parallelism. ...
The Intel Xeon Phi offers a promising solution to coprocessing, since it is based on the popular x86 instruction set. ...
MIMD Parallelization Issues A Xeon Phi can be viewed as a SMP machine, in which all the cores not only share the same memory address, but also a coherent Particularly, applications with different communication ...
doi:10.1145/2597652.2597682
dblp:conf/ics/HuoRA14
fatcat:klizb5inbzgi5ipwgzic6n334i
Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor
2014
Journal of Computers
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. ...
We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. ...
PLATFORM ARCHITECTURE AND SYSTEM CONFIGURATION In our study we utilized two systems, the first one based on a dual socket Intel Xeon E5-2687W CPU processor with a single coprocessor card and the second ...
doi:10.4304/jcp.9.7.1566-1571
fatcat:lkjta53agzf2fiaeo7nfh4rtpm
An Empirical Study of Intel Xeon Phi
[article]
2013
arXiv
pre-print
With at least 50 cores, Intel Xeon Phi is a true many-core architecture. ...
Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. ...
Being an x86 SMP-on-a-chip architecture, Xeon Phi offers the full capability to use the same tools, programming languages, and programming models as a regular Intel Xeon processor. ...
arXiv:1310.5842v2
fatcat:dmez6j673nfj5iqoqwvignbkxi
Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi
2014
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14
Given the importance of communication in parallel architectures, this paper describes a novel methodology for optimizing remote-memory accesses for execution of OpenSHMEM programs on Intel Xeon Phi processors ...
Moreover, we study different reduction algorithms and exploit local load/store to optimize data transfers in these algorithms for Xeon Phi which permits improvement of up to 22% compared to MVAPICH and ...
A performance model for cache-coherent SMP systems is developed in [18] . Xeon Phi is used to showcase the ap-plicability of this model. ...
doi:10.1145/2676870.2676881
dblp:conf/pgas/NamashivayamGKEC14
fatcat:nrrn5wksxfgqphco47ywlzjs44
Test-driving Intel Xeon Phi
2014
Proceedings of the 5th ACM/SPEC international conference on Performance engineering - ICPE '14
Next, we choose a medical imaging application (Leukocyte Tracking) as a case study. ...
Given its promised ease-of-use and high performance, we took Xeon Phi out for a test drive. ...
The authors would like to thank Sabela Ramos Garea from University of A Coruña and Evghenii Gaburov from SURF-sara for the numerous on-line discussions. ...
doi:10.1145/2568088.2576799
dblp:conf/wosp/FangSZXCV14
fatcat:gvrvvmqusnfprcclbgb3u4skky
Retargeting of the Open Community Runtime to Intel Xeon Phi
2015
Procedia Computer Science
Since manycore architectures like the Intel Xeon Phi are likely to play a major role in future high performance systems, we have implemented the OCR API for shared-memory machines, including the Xeon Phi ...
The Open Community Runtime (OCR) is a recent effort in the search for a runtime for extreme scale parallel systems. ...
pay in such a case (i.e., some cache misses). ...
doi:10.1016/j.procs.2015.05.335
fatcat:z7m2uxvnbbac5icj5hj7mwhskq
Scaling the capacity of memory systems; evolution and key approaches
2019
Proceedings of the International Symposium on Memory Systems - MEMSYS '19
Despite such efforts, the fundamental problems of maintaining cache coherence across a scaled system with thousands of nodes is not something that any of the current approaches are capable of efficiently ...
Computer clusters were the first configurations to eventually provide a Distributed Shared Memory (DSM) system at a linear cost while also being more scalable than the traditional cache coherent NUMA systems ...
A version of HMC named Multi-Channel DRAM (MCDRAM) was developed in partnership with Intel and Micron to be used in the Intel Xeon Phi processor codenamed Knights Landing. ...
doi:10.1145/3357526.3357555
dblp:conf/memsys/ParaskevasALG19
fatcat:qlsuo7csujg3rhbwf3efqljhta
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor
2013
2013 IEEE 27th International Symposium on Parallel and Distributed Processing
In this paper we describe how several flavors of the Linpack benchmark are accelerated on Intel's recently released Intel R Xeon Phi TM 1 co-processor (code-named Knights Corner) in both native and hybrid ...
This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators. ...
We would also like to thank Catherine Djunaedi, Ravi Murty and Susan Meredith for their continuous support in building, debugging and analyzing the systems. ...
doi:10.1109/ipdps.2013.113
dblp:conf/ipps/HeineckeVSKDHSCD13
fatcat:ggcaf5r7zbadzbcdhigrvllamu
Rhymes: A shared virtual memory system for non-coherent tiled many-core architectures
2014
2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)
Rhymes features a two-way cache coherence protocol to enforce release consistency for pages allocated in shared physical memory (SPM) and scope consistency for pages in percore private memory. ...
This paper presents a shared virtual memory (SVM) system, dubbed Rhymes, tailored to new processor kinds of non-coherent and hybrid memory architectures. ...
., Ltd. for their kind support of the SCC platform in their Wuxi data centers for this work. ...
doi:10.1109/padsw.2014.7097807
dblp:conf/icpads/LamSHWLZY14
fatcat:sadkzvqywjepzenwnp32nr65fi
Tibidabo: Making the case for an ARM-based HPC system
2014
Future generations computer systems
In this paper we advocate a different approach: building HPC systems from low-power embedded and mobile technology parts, over time designed for maximum energy efficiency, which now show promise for competitive ...
We present the lessons learned for the design and improvement in energy efficiency of future HPC systems based on such lowpower cores. ...
In addition, authors would like to thank to Bernard Ortiz de ...
doi:10.1016/j.future.2013.07.013
fatcat:psrd2ufjffgzhlrjiwbtu2tfhm
Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory Processors
2017
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
These optimizations are portable on four distinct cache-coherent architectures and result in up to 5.6x performance enhancements on Intel Xeon Phi processor 7250P (KNL), 5.7x on Intel Xeon Phi coprocessor ...
Then by blocking SoA objects, we optimize cache reuse and get sustained throughput for a range of problem sizes. ...
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. ...
doi:10.1109/ipdps.2017.33
dblp:conf/ipps/MathuriyaLBSK17
fatcat:6km46ws4yvahbjhawam7jy6dr4
Optimising simulation data structures for the Xeon Phi
2016
2016 International Conference on High Performance Computing & Simulation (HPCS)
Index Terms-Xeon Phi; many integrated core (MIC); Gatelevel simulation; Parallel logic simulation • All gates are two input, NOT is represented by a NAND with duplicate inputs, 3 input ANDs made up of ...
In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. ...
Interconnection among cores is based on a ring network model that allows the L2 caches for each core to be accessible by all others. In all, a total coherent cache of over 30MB is available. ...
doi:10.1109/hpcsim.2016.7568358
dblp:conf/ieeehpcs/ChimehC16
fatcat:dflqcw2cljg4veeofc43w2msfa
« Previous
Showing results 1 — 15 out of 69 results