426 Hits in 11.8 sec

Improving Parallel I/O Performance Using Multithreaded Two-Phase I/O with Processor Affinity Management [chapter]

Yuichi Tsujita, Kazumi Yoshinaga, Atsushi Hori, Mikiko Sato, Mitaro Namiki, Yutaka Ishikawa
2014 Lecture Notes in Computer Science  
(N) Read (N) Overlapping between read and data exchange phases  Exch. : Data exchange phase  Read: File read phase CPU Core Affinity Management in Multithreaded Two-Phase I/O Prevention  ...  file systems such as Lustre or PVFS2 through an ADIO interface layer x Motivation (2)  Our proposal • Multithreaded Two-Phase I/O by using a Pthreads library • Overlapping file I/O with data  ... 
doi:10.1007/978-3-642-55224-3_67 fatcat:cjazxneiuned3ng2f5dw7cjp2m

Improved Parallel Scanner for the Concurrent Execution of Lexical Analysis Tasks on Multi-Core Systems

T. Vaikunta Pai, P. S. Nethravathi, P. S. Aithal
2022 Zenodo  
Modern eras of computing are driven by elevated parallel processing by the revolution of multi-core processors.  ...  This is done by recognizing tokens in different lines of the source program in parallel along with auto detection of keyword in a character stream.  ...  This prevents processor in the I/O queue from experiencing a delay, resulting in a significant improvement in file I/O performance.  ... 
doi:10.5281/zenodo.6375532 fatcat:vtq7zwm6nfea7ewn2mf7ssy36a

Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures

Guangming Tan, Ninghui Sun, G.R. Gao
2009 IEEE Transactions on Parallel and Distributed Systems  
In order to predict the execution time, we formulate an analytical performance model of the parallel algorithm.  ...  Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems.  ...  ACKNOWLEDGMENTS The authors would like to thank the editor, all the reviewers, Russo Andew, and Yungang Bao for the help in improving this paper.  ... 
doi:10.1109/tpds.2008.78 fatcat:tnaje4pgmrcuhn5x32txbktkcq

Improving the scalability of parallel N-body applications with an event driven constraint based execution model [article]

Chirag Dekate, Matthew Anderson, Maciej Brodowicz, Hartmut Kaiser, Bryce Adelstein-Lelbach, Thomas Sterling
2011 arXiv   pre-print
We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.  ...  This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads.  ...  A process generally runs on a processor core and perhaps communicates with other processes via messages in the system I/O distributed name space.  ... 
arXiv:1109.5190v1 fatcat:ashsmlawhbbglfif4ykcu5vc3q

Improving the design flow for parallel and heterogeneous architectures running real-time applications: The PHARAON FP7 project

Héctor Posadas, Alejandro Nicolás, Pablo Peñil, Eugenio Villar, Florian Broekaert, Michel Bourdelles, Albert Cohen, Mihai T. Lazarescu, Luciano Lavagno, Andrei Terechko, Miguel Glassee, Manuel Prieto
2014 Microprocessors and microsystems  
This tool chain will offer the possibility to propose and implement several parallelization strategies and drive the designer into implementation steps.  ...  Acknowledgments This work is being performed in the framework of the FP7-288307 project PHARAON.  ...  The main new issues to be covered are heterogeneity, parallelization, I/O support and run-time power management. Thus, specific enhancements are proposed for all these points.  ... 
doi:10.1016/j.micpro.2014.05.003 fatcat:6kbn3sgvkjglhgr6yjbdmhqkhu

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

Nicholas Moore, Miriam Leeser, Laurie Smith King
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
This dissertation introduces GPGPU kernel specialization, a technique that can be used to describe highly adaptable kernels that work across different generations of GPUs with high performance.  ...  reference implementations, kernel specialization is shown to maintain adaptability while providing performance improvements in terms of speedups and reduction in per-thread register usage.  ...  with kernel specialization, (3) using C with POSIX Threads-based multithreading, and (4) using MATLAB.  ... 
doi:10.1109/ipdps.2013.31 dblp:conf/ipps/MooreLK13 fatcat:5wxiwkpk2zdbte2rjwmqspp7um

Virtual Aggregated Processor in Multi-core Computers

Zhiyi Huang, Andrew Trotman, Jiaqi Zhang, Xiangfei Jia, Mariusz Nowostawski, Nathan Rountree, Paul Werstein
2008 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies  
We have proposed and implemented two techniques, helper thread and I/O specialization, to demonstrate the potential effectiveness of the Virtual Aggregated Processor technology.  ...  In this paper, we have proposed a Virtual Aggregated Processor that is aiming at speeding up execution of a thread through exploiting the fine-grained parallelism in I/O tasks and memory accesses.  ...  I/O tasks such as disk I/O.  ... 
doi:10.1109/pdcat.2008.27 dblp:conf/pdcat/HuangTZJNRW08 fatcat:tloo6s7hjrcqzelj42f5odekw4

Optimization of atmospheric transport models on HPC platforms

Raúl de la Cruz, Arnau Folch, Pau Farré, Javier Cabezas, Nacho Navarro, José María Cela
2016 Computers & Geosciences  
In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity.  ...  The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output  ...  Osores from the Argentinean National Scientific and Technical Research Council (CONICET) for providing hourly column heights for the Cordón Caulle eruption simulation and the constructive comments from two  ... 
doi:10.1016/j.cageo.2016.08.019 fatcat:wwey4zcggfhrfc5f23urabxox4

Evaluating the scalability of Java event-driven Web servers

V. Beltran, D. Carrera, J. Torres, E. Ayguade
2004 International Conference on Parallel Processing, 2004. ICPP 2004.  
The new 1.4 release of the J2SE introduces the NIO (New I/O) API to help in the development of event-driven I/O intensive applications.  ...  and using only one or two worker threads.  ...  The 1.4 release of the Java 2 Standard Edition includes a new set of I/O capabilities created to improve the performance and scalability of intense I/O applications.  ... 
doi:10.1109/icpp.2004.1327913 dblp:conf/icpp/BeltranCTA04 fatcat:44zrbjulljad5mtenzw2t57zru

Mitigating Amdahl's Law through EPI Throttling

Murali Annavaram, Ed Grochowski, John Shen
2005 SIGARCH Computer Architecture News  
Using the equation, Power=Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly  ...  , during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed.  ...  Acknowledgments This paper benefited from several stimulating discussions on EPI throttle ideas we had with Bryan Black, Richard Hankins, Norman Oded, Ryan Rakvic, Ronny Ronen, Hong Wang and Uri Weiser  ... 
doi:10.1145/1080695.1069995 fatcat:xq5sy7gfajh3pe3sj7vetlfa3e


Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt
2009 Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles - SOSP '09  
We show up to a 28% performance improvement by offloading tasks to the XScale I/O card.  ...  To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers.  ...  Aaron Stern's development efforts, which added ARM instruction sets to the Bartok compiler, made it possible to target Singularity code on the XScale I/O board.  ... 
doi:10.1145/1629575.1629597 dblp:conf/sosp/NightingaleHMHH09 fatcat:pcbyzqrw2bdhbf67h6ryqasofu

Midpoint routing algorithms for Delaunay triangulations

Weisheng Si, Albert Y. Zomaya
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
One promising approach to address the performance issue of I/O virtualization is to use single root I/O virtualization (SR-IOV) devices which have been standardized by the PCI-SIG.  ...  Using this methodology, we examine two I/O-intensive scientific computations from cosmology and climate science, and demonstrate that our approach can identify application and middleware performance deficiencies  ...  All our solutions on a P-processor PEM model provide an optimal speedup of Θ(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts.  ... 
doi:10.1109/ipdps.2010.5470471 dblp:conf/ipps/SiZ10 fatcat:yuchdc4zp5borm5vs7j4rqgmzy

The case for hardware transactional memory in software packet processing

Martin Labrecque, J. Gregory Steffan
2010 Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems - ANCS '10  
Relative to NetThreads [2], our two-processor four-way-multithreaded system with conventional lock-based synchronization, we find that adding HTM achieves 6%, 54% and 57% increases in packet throughput  ...  With increasing numbers of programmable processor and accelerator cores per network node, it is a challenge to support sharing and synchronization across them in a way that is scalable and easy-to-program  ...  We only monitor the processing for each packet and ignore the packet and console I/O routines.  ... 
doi:10.1145/1872007.1872053 dblp:conf/ancs/LabrecqueS10 fatcat:bvf53xyzhventa2htuipfhoxku

An Introduction to the Gilgamesh PIM Architecture [chapter]

Thomas Sterling
2001 Lecture Notes in Computer Science  
A multithreaded task switching and management capability provides overlapping use of parallel on-chip resources for high efficiency of memory and I/O channels.  ...  In this mode, it may be used in conjunction with the streaming I/O interface described below.  ... 
doi:10.1007/3-540-44681-8_4 fatcat:nq7bwa5lb5d53bqzreh7jel3sa

A pipeline virtual environment architecture for multicore processor systems

Eric Acosta, Alan Liu
2011 The Visual Computer  
This paper describes our approach, and shows it is efficient and scalable with performance experiments.  ...  The representation enables VEs to be processed in parallel using a multistage, dual-frame pipeline.  ...  As expected, performance improved in most cases under the heavy load condition since more work can be performed in parallel. No improvement was seen with Cilk Plus.  ... 
doi:10.1007/s00371-011-0661-0 fatcat:st3oov2mczgnte7c5furybpdie
« Previous Showing results 1 — 15 out of 426 results