A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving Parallel I/O Performance Using Multithreaded Two-Phase I/O with Processor Affinity Management
[chapter]
2014
Lecture Notes in Computer Science
(N)
Read
(N)
Overlapping between read and data exchange phases
Exch. : Data exchange phase
Read: File read phase
CPU Core Affinity Management in Multithreaded Two-Phase I/O
Prevention ...
file systems such as Lustre or PVFS2 through an ADIO interface layer x
Motivation (2)
Our proposal
• Multithreaded Two-Phase I/O by using a Pthreads library
• Overlapping file I/O with data ...
doi:10.1007/978-3-642-55224-3_67
fatcat:cjazxneiuned3ng2f5dw7cjp2m
Improved Parallel Scanner for the Concurrent Execution of Lexical Analysis Tasks on Multi-Core Systems
2022
Zenodo
Modern eras of computing are driven by elevated parallel processing by the revolution of multi-core processors. ...
This is done by recognizing tokens in different lines of the source program in parallel along with auto detection of keyword in a character stream. ...
This prevents processor in the I/O queue from experiencing a delay, resulting in a significant improvement in file I/O performance. ...
doi:10.5281/zenodo.6375532
fatcat:vtq7zwm6nfea7ewn2mf7ssy36a
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures
2009
IEEE Transactions on Parallel and Distributed Systems
In order to predict the execution time, we formulate an analytical performance model of the parallel algorithm. ...
Dynamic programming (DP) is a popular technique which is used to solve combinatorial search and optimization problems. ...
ACKNOWLEDGMENTS The authors would like to thank the editor, all the reviewers, Russo Andew, and Yungang Bao for the help in improving this paper. ...
doi:10.1109/tpds.2008.78
fatcat:tnaje4pgmrcuhn5x32txbktkcq
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
[article]
2011
arXiv
pre-print
We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing. ...
This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. ...
A process generally runs on a processor core and perhaps communicates with other processes via messages in the system I/O distributed name space. ...
arXiv:1109.5190v1
fatcat:ashsmlawhbbglfif4ykcu5vc3q
Improving the design flow for parallel and heterogeneous architectures running real-time applications: The PHARAON FP7 project
2014
Microprocessors and microsystems
This tool chain will offer the possibility to propose and implement several parallelization strategies and drive the designer into implementation steps. ...
Acknowledgments This work is being performed in the framework of the FP7-288307 project PHARAON. ...
The main new issues to be covered are heterogeneity, parallelization, I/O support and run-time power management. Thus, specific enhancements are proposed for all these points. ...
doi:10.1016/j.micpro.2014.05.003
fatcat:6kbn3sgvkjglhgr6yjbdmhqkhu
Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
2013
2013 IEEE 27th International Symposium on Parallel and Distributed Processing
This dissertation introduces GPGPU kernel specialization, a technique that can be used to describe highly adaptable kernels that work across different generations of GPUs with high performance. ...
reference implementations, kernel specialization is shown to maintain adaptability while providing performance improvements in terms of speedups and reduction in per-thread register usage. ...
with kernel specialization, (3) using C with POSIX Threads-based multithreading, and (4) using MATLAB. ...
doi:10.1109/ipdps.2013.31
dblp:conf/ipps/MooreLK13
fatcat:5wxiwkpk2zdbte2rjwmqspp7um
Virtual Aggregated Processor in Multi-core Computers
2008
2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies
We have proposed and implemented two techniques, helper thread and I/O specialization, to demonstrate the potential effectiveness of the Virtual Aggregated Processor technology. ...
In this paper, we have proposed a Virtual Aggregated Processor that is aiming at speeding up execution of a thread through exploiting the fine-grained parallelism in I/O tasks and memory accesses. ...
I/O tasks such as disk I/O. ...
doi:10.1109/pdcat.2008.27
dblp:conf/pdcat/HuangTZJNRW08
fatcat:tloo6s7hjrcqzelj42f5odekw4
Optimization of atmospheric transport models on HPC platforms
2016
Computers & Geosciences
In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. ...
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output ...
Osores from the Argentinean National Scientific and Technical Research Council (CONICET) for providing hourly column heights for the Cordón Caulle eruption simulation and the constructive comments from two ...
doi:10.1016/j.cageo.2016.08.019
fatcat:wwey4zcggfhrfc5f23urabxox4
Evaluating the scalability of Java event-driven Web servers
2004
International Conference on Parallel Processing, 2004. ICPP 2004.
The new 1.4 release of the J2SE introduces the NIO (New I/O) API to help in the development of event-driven I/O intensive applications. ...
and using only one or two worker threads. ...
The 1.4 release of the Java 2 Standard Edition includes a new set of I/O capabilities created to improve the performance and scalability of intense I/O applications. ...
doi:10.1109/icpp.2004.1327913
dblp:conf/icpp/BeltranCTA04
fatcat:44zrbjulljad5mtenzw2t57zru
Mitigating Amdahl's Law through EPI Throttling
2005
SIGARCH Computer Architecture News
Using the equation, Power=Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly ...
, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed. ...
Acknowledgments This paper benefited from several stimulating discussions on EPI throttle ideas we had with Bryan Black, Richard Hankins, Norman Oded, Ryan Rakvic, Ronny Ronen, Hong Wang and Uri Weiser ...
doi:10.1145/1080695.1069995
fatcat:xq5sy7gfajh3pe3sj7vetlfa3e
We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. ...
To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. ...
Aaron Stern's development efforts, which added ARM instruction sets to the Bartok compiler, made it possible to target Singularity code on the XScale I/O board. ...
doi:10.1145/1629575.1629597
dblp:conf/sosp/NightingaleHMHH09
fatcat:pcbyzqrw2bdhbf67h6ryqasofu
Midpoint routing algorithms for Delaunay triangulations
2010
2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
One promising approach to address the performance issue of I/O virtualization is to use single root I/O virtualization (SR-IOV) devices which have been standardized by the PCI-SIG. ...
Using this methodology, we examine two I/O-intensive scientific computations from cosmology and climate science, and demonstrate that our approach can identify application and middleware performance deficiencies ...
All our solutions on a P-processor PEM model provide an optimal speedup of Θ(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts. ...
doi:10.1109/ipdps.2010.5470471
dblp:conf/ipps/SiZ10
fatcat:yuchdc4zp5borm5vs7j4rqgmzy
The case for hardware transactional memory in software packet processing
2010
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems - ANCS '10
Relative to NetThreads [2], our two-processor four-way-multithreaded system with conventional lock-based synchronization, we find that adding HTM achieves 6%, 54% and 57% increases in packet throughput ...
With increasing numbers of programmable processor and accelerator cores per network node, it is a challenge to support sharing and synchronization across them in a way that is scalable and easy-to-program ...
We only monitor the processing for each packet and ignore the packet and console I/O routines. ...
doi:10.1145/1872007.1872053
dblp:conf/ancs/LabrecqueS10
fatcat:bvf53xyzhventa2htuipfhoxku
An Introduction to the Gilgamesh PIM Architecture
[chapter]
2001
Lecture Notes in Computer Science
A multithreaded task switching and management capability provides overlapping use of parallel on-chip resources for high efficiency of memory and I/O channels. ...
In this mode, it may be used in conjunction with the streaming I/O interface described below. ...
doi:10.1007/3-540-44681-8_4
fatcat:nq7bwa5lb5d53bqzreh7jel3sa
A pipeline virtual environment architecture for multicore processor systems
2011
The Visual Computer
This paper describes our approach, and shows it is efficient and scalable with performance experiments. ...
The representation enables VEs to be processed in parallel using a multistage, dual-frame pipeline. ...
As expected, performance improved in most cases under the heavy load condition since more work can be performed in parallel. No improvement was seen with Cilk Plus. ...
doi:10.1007/s00371-011-0661-0
fatcat:st3oov2mczgnte7c5furybpdie
« Previous
Showing results 1 — 15 out of 426 results