5,700 Hits in 4.1 sec

NUMA Time Warp

Alessandro Pellegrini, Francesco Quaglia
2015 Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15  
closer, other farther • This has an effect on access latency • Time-Warp systems are highly demanding for memory Reference Time Warp Architectural Context • Optimistic PDES systems based on the multi-thread  ...  of 18 -NUMA Time Warp of 18 -NUMA Time Warp of 18 -NUMA Time Warp  ...  • The worker thread managing the destination simulation object accesses it more frequently ⇒ keep pages close to it • Yet the pages are not guaranteed to be located on the node closest to the CPU running  ... 
doi:10.1145/2769458.2769479 dblp:conf/pads/PellegriniQ15a fatcat:2y5rfnzh2rc2vnlwx5jcd75ddm

Time-Sharing Time Warp via Lightweight Operating System Support

Alessandro Pellegrini, Francesco Quaglia
2015 Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15  
In this article we present the design and implementation of a timesharing Time Warp platform, to be run on multi-core machines, where the platform-level software is allowed to take back control on a periodical  ...  However, common Time Warp platforms usually execute events as atomic actions.  ...  and its final effectiveness in terms of improvements of the execution speed of Time Warp simulations on multi-core machines.  ... 
doi:10.1145/2769458.2769478 dblp:conf/pads/PellegriniQ15 fatcat:fufmvumap5hsxm43kpyzv3aocu

A Fine-Grain Time-Sharing Time Warp System

Alessandro Pellegrini, Francesco Quaglia
2017 ACM Transactions on Modeling and Computer Simulation  
In this article we present the design and realization of a fine-grain time-sharing Time Warp system, to be run on multi-core Linux machines, which makes systematic use of event preemption in order to dynamically  ...  Although Parallel Discrete Event Simulation (PDES) platforms relying on the Time Warp (optimistic) synchronization protocol already allow for exploiting parallelism, several techniques have been proposed  ...  This is compliant with the idea that a single processnamely the multi-thread Time Warp platform running on the multi-core machineneeds to use the facilities offered by the special device file for supporting  ... 
doi:10.1145/3013528 fatcat:h4rbo6cqnfffxi7duqkeudvvyu

Parallel neighbourhood search on many-core platforms

Yuet Ming Lam, Kuen Hung Tsoi, Wayne Luk
2013 International Journal of Computational Science and Engineering (IJCSE)  
We evaluate the performance of our approach against a multi-threaded CPU implementation on a server containing two Intel Xeon X5650 CPUs (12 cores in total).  ...  This paper presents a parallel search parallel move approach to parallelise neighbourhood search algorithms on many-core platforms.  ...  Multi-core CPU-based multi-threaded designs To evaluate the performance of our PSPM implementation on GPU, two reference multi-threaded designs for multicore CPU are implemented.  ... 
doi:10.1504/ijcse.2013.055354 fatcat:7z5xoayxmne6rkj5ju257dpvna

On the Relevance of Wait-free Coordination Algorithms in Shared-Memory HPC:The Global Virtual Time Case [article]

Alessandro Pellegrini, Francesco Quaglia
2020 arXiv   pre-print
High-performance computing on shared-memory/multi-core architectures could suffer from non-negligible performance bottlenecks due to coordination algorithms, which are nevertheless necessary to ensure  ...  In this paper we explore the relevance of this paradigm shift in shared-memory architectures, by focusing on the context of Parallel Discrete Event Simulation, where the Global Virtual Time (GVT) represents  ...  Overall, the wait-free GVT protocol is aimed at better coping with scalability aspects of Time Warp platforms to be run on top of multi/many-core shared-memory platforms.  ... 
arXiv:2004.10033v1 fatcat:g2qwt25qufgexefke2fdgawbwm

Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems

Long Chen, Oreste Villa, Guang R. Gao
2011 2011 IEEE International Conference on Cluster Computing  
over other solutions based on the standard CUDA programming methodologies.  ...  means for solving the above issues and efficiently utilizing multi-GPU systems.  ...  Merge [14] is a programming framework proposed for heterogeneous multi-core systems.  ... 
doi:10.1109/cluster.2011.50 dblp:conf/cluster/ChenVG11 fatcat:nk3qod6lengutiveiihw55zkjq

Sparse LU factorization for parallel circuit simulation on GPU

Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang
2012 Proceedings of the 49th Annual Design Automation Conference on - DAC '12  
Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency.  ...  On matrices whose factorization involves many floating-point operations, our GPU-based sparse LU factorization achieves 7.90× speedup over 1-core CPU and 1.49× speedup over 8-core CPU.  ...  ), multi-core CPU and GPU.  ... 
doi:10.1145/2228360.2228565 dblp:conf/dac/RenCWZY12 fatcat:q2j7b5yapvcp5mzn6qvzdpundm

Thread warping

Greg Stitt, Frank Vahid
2007 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '07  
We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, averaging 130x compared to a multiprocessor system having four ARM11  ...  We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable  ...  The simulator currently does not simulate arbitration overhead for multi-core microprocessor memory accesses, and instead assumes all cores can simultaneously access memory.  ... 
doi:10.1145/1289816.1289841 dblp:conf/codes/StittV07 fatcat:clqzdsg76fedxihf6phlsjcxqm

Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting

Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, Luca Benini
2012 Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units - GPGPU-5  
Software emulators, such as the open-source QEMU project, cope quite well in terms of simulation speed and functional accuracy with homogeneous coarse-grained multi-cores.  ...  Platform developers are looking for efficient full-system simulators capable of simulating complex applications, middleware and operating systems on these heterogeneous targets.  ...  We measure wall clock time at the boundaries of this call to account for the actual simulation time of many-core accelerator spent on GPU platform.  ... 
doi:10.1145/2159430.2159442 dblp:conf/asplos/RaghavMPARB12 fatcat:57upavxshfdv3a4ozy4uf5izsu

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures

Guillermo Vigueras, Juan M Orduña, Miguel Lozano, José M Cecilia, José M García
2013 The international journal of high performance computing applications  
The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds.  ...  On the other hand, the comparison shows that the GPU greatly accelerates the collision test with respect to any other implementation optimized for multi-core CPUs.  ...  The warp is the scheduled unit, so the threads of the same thread-block are scheduled in a given multi-processor warp by warp.  ... 
doi:10.1177/1094342013476119 fatcat:36bbeamzpffc7cfzomsgyyqipi

FastLanes: An FPGA accelerated GPU microarchitecture simulator

Kuan Fang, Yufei Ni, Jiayuan He, Zonghui Li, Shuai Mu, Yangdong Deng
2013 2013 IEEE 31st International Conference on Computer Design (ICCD)  
Such a mechanism makes it possible to simulate hundreds of GPU cores on a single FPGA evaluation board.  ...  FastLanes consists of a function model and a timing model, both implemented on FPGA.  ...  Second, software simulation of many-core processors on a CPU platform is even more time-taking due to the lack of sufficient computing resource.  ... 
doi:10.1109/iccd.2013.6657049 dblp:conf/iccd/FangNHLMD13 fatcat:li4p6f2ebrfmlcx7m7u3sb7dc4

Acceleration of bulk memory operations in a heterogeneous multicore architecture

JongHyuk Lee, Ziyi Liu, Xiaonan Tian, Dong Hyuk Woo, Weidong Shi, Dainis Boumber, Yonghong Yan, Kyeong-An Kwon
2012 Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12  
The performance results based on our solution showed that offloaded bulk memory operations outperform CPU up to 4.3 times in micro benchmarks while still using less resources.  ...  Offloading the bulk memory operations to the GPU has many advantages, i) the throughput driven GPU outperforms the CPU on the bulk memory operations; ii) for on-die GPU with unified cache between the GPU  ...  ARCHITECTURE AND DESIGN Our baseline architecture is a heterogeneous platform that has multiple CPU cores and GPU cores on a chip as shown in Figure 1 .  ... 
doi:10.1145/2370816.2370877 dblp:conf/IEEEpact/LeeLTWSBYK12 fatcat:http6nmpcjdljfadjhpailq76m

Experiments with Hardware-based Transactional Memory in Parallel Simulation

Joshua Hay, Philip A. Wilsey
2015 Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15  
Parallel Discrete Event Simulation is a problem space that has been studied for many years, but still suffers from significant lock contention on SMP platforms.  ...  This thesis explores the use of transactional memory as an alternative to conventional synchronization mechanisms for managing the pending event set in a time warp synchronized parallel simulator.  ...  It was recently redesigned for parallel execution on multi-core processing nodes [16] . It has many configuration options and utilizes many different algorithms of the Time Warp protocol [9] .  ... 
doi:10.1145/2769458.2769462 dblp:conf/pads/HayW15 fatcat:in3vxjwjrfgvfmqwifayl3xmga

A Power Cap Oriented Time Warp Architecture

Stefano Conoci, Davide Cingolani, Pierangelo Di Sanzo, Bruno Ciciani, Alessandro Pellegrini, Francesco Quaglia
2018 Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation - SIGSIM-PADS '18  
In this article we present an innovative Time Warp architecture oriented to efficiently run parallel simulations under a power cap.  ...  Controlling power usage has become a core objective in modern computing platforms.  ...  Warp simulations would consists in running the Time Warp platform on top of a group of CPU-cores with properly tuned down performance states (i.e. operating frequency and voltage).  ... 
doi:10.1145/3200921.3200930 dblp:conf/pads/ConociCSC0Q18 fatcat:dwqv2rxitnckzfdksdq2uvejha

GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs

Chenggang Zhang, Guodong Han, Cho-Li Wang
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
Thread Level Speculation (TLS) is a technique proposed to parallelize loops with dynamic parallelism on multi-core or multi-processor CPUs.  ...  Speculative Loop Parallelization on CPUs The BOP system [12] proposed by Ding et al. uses a process based runtime model to speculatively execute Potentially Parallel Regions (PPRs) on multi-core CPUs  ... 
doi:10.1109/ccgrid.2013.34 dblp:conf/ccgrid/ZhangHW13 fatcat:gqqdzklyk5aedmt2zpzd3emlmi
« Previous Showing results 1 — 15 out of 5,700 results