Filters








3,654 Hits in 3.1 sec

Datarol-II: A fine-grain massively parallel architecture [chapter]

Tetsuo Kawano, Shigeru Kusakabe, Rin-ichiro Taniguchi, Makoto Amamiya
1994 Lecture Notes in Computer Science  
In this paper, we introduce the Datarol-II processor, that can efficiently execute a fine-grain multi-thread program, called Datarol.  ...  The simulation results show that the Datarol-II processor can tolerate remote memory access latencies and execute a fine-grain multi-thread program efficiently.  ...  Therefore, to eliminate the PEs' idling time caused by these latencies~ a new architecture~ that realizes more efficient context switching among fine-grain concurrent processes, should be developed.  ... 
doi:10.1007/3-540-58184-7_156 fatcat:thh3ez4jlrfh3ghdvj73n6ctoq

An Abstraction Methodology for the Evaluation of Multi-core Multi-threaded Architectures

Ruken Zilan, Javier Verdu, Jorge Garcia, Mario Nemirovsky, Rodolfo A. Milito, Mateo Valero
2011 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems  
Keywords-component; Fine grain modeling; a methodology to build simulators; a simulation tool for multi-levels of shared resource architecture modeling; UltraSPARC T2, queueing modeling; "whatif"s for  ...  Analytical models overcome these limitations but do not provide the fine grain details needed for a deep analysis of these architectures.  ...  To our knowledge, this is the first work that defines queueing based techniques to model current and future massive multi-threaded architectures at fine-grain, especially integrating three different elements  ... 
doi:10.1109/mascots.2011.11 dblp:conf/mascots/ZilanVGNMV11 fatcat:mahx7e4a5bdj3bvdbclclzf5r4

IMPROVING PERFORMANCE IN HPC SYSTEM UNDER POWER CONSUMPTIONS LIMITATIONS

Muhammad Usman Ashraf
2019 International Journal of Advanced Research in Computer Science  
Consequently, we have suggested a massive parallel programming mechanism which is promising to achieve HPC Exascale system goals.  ...  Today's High-Performance Computing (HPC) systems require significant usage of "supercomputers" and extensiveparallel processing approaches for solving complicated computational tasks at the Petascale level  ...  OpenMP is used to achieve fine-grain parallelism and to parallelize CPU threads over intranode. CUDA is used to achieve finer grain parallelism by executing data over accelerated GPU cores.  ... 
doi:10.26483/ijarcs.v10i2.6397 fatcat:k3l3lk5kuzhnldn5b2qzkh4eia

CHALLENGES IN PARALLEL GRAPH PROCESSING

ANDREW LUMSDAINE, DOUGLAS GREGOR, BRUCE HENDRICKSON, JONATHAN BERRY
2007 Parallel Processing Letters  
Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems.  ...  The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.  ...  and some SMPs and fine-grained parallelism that performs well on massively multi-threaded architectures like the MTA-2.  ... 
doi:10.1142/s0129626407002843 fatcat:samtlwojnjccfg7fhvzvjh6xmq

Survey on microprocessor architecture and development trends

YaoYingbiao, Zhang Jianwu, Zhao Danying
2008 2008 11th IEEE International Conference on Communication Technology  
To improve the performance of microprocessor, many kinds of novel architecture, such as multi-thread processor, CMP, PIM, and reconfigurable computing processor, have been proposed.  ...  These new processors improve performance mainly dependant on making use of all kinds of parallelism of workloads, solving the speed mismatch between processor and external memory, reconfigurable computing  ...  Thread level parallelism (TLP) For the next generation high-performance processor, parallelism should not be limited to the fine-grained ILP of single program.  ... 
doi:10.1109/icct.2008.4716247 fatcat:3mrfuvgwcrcyjbcm2an6nuymri

Implementing a non-strict functional programming language on a threaded architecture [chapter]

Shigeru Kusakabe, Kentaro Inenaga, Makoto Amamiya, Xinan Tang, Andres Marquez, Guang R. Gao
1999 Lecture Notes in Computer Science  
The combination of a language with ne-grain implicit parallelism and a dataow e v aluation scheme is suitable for high-level programming on massively parallel architectures.  ...  Since overhead caused by ne-grain processing may degrade performance for programs with little parallelism, we have adopted a thread merging rule. The preliminary performance results are encouraging.  ...  The languages abstracts the timing problems in writing massively parallel programs, while ne-grain multithread evaluation supports ecient execution of a large number of ne-grain processes for implicit  ... 
doi:10.1007/bfb0097894 fatcat:kjwlaxfjd5citccnro6auetqga

GPU Parallel Computation in Bioinspired Algorithms: A Review [chapter]

M. G. Arenas, G. Romero, A. M. Mora, P. A. Castillo, J. J. Merelo
2012 Studies in Computational Intelligence  
In this sense, recently there has been a growing interest in developing parallel algorithms using graphic processing units (GPU) also referred as GPU computation.  ...  Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units that possess more memory bandwidth and computational capability than central processing  ...  Now, to manage CPU power dissipation, processor makers favor multi-core chip designs, and software has to be written in a multi-threaded or multi-process manner to take full advantage of the hardware.  ... 
doi:10.1007/978-3-642-30154-4_6 fatcat:hs6jd4uvavcfxl7e374fjcufg4

Frameworks for Multi-core Architectures: A Comprehensive Evaluation Using 2D/3D Image Registration [chapter]

Richard Membarth, Frank Hannig, Jürgen Teich, Mario Körner, Wieland Eckert
2011 Lecture Notes in Computer Science  
In this paper, we present five such frameworks for parallelization on shared memory multi-core architectures, namely OpenMP, Cilk++, Threading Building Blocks, RapidMind, and OpenCL.  ...  In an empirical study, a fine-grained data parallel and a coarse-grained task parallel parallelization approach are used to evaluate and estimate different aspects like usability, performance, and overhead  ...  We are indebted to the RRZE (Regional Computing Center Erlangen) and their HPC team for granting computational resources and providing access to preproduction hardware.  ... 
doi:10.1007/978-3-642-19137-4_6 fatcat:guuyi6d3ercqvbgyfs3v5p5lya

Scalable, parallel computers: Alternatives, issues, and challenges

Gordon Bell
1994 International journal of parallel programming  
Programming environments that operate on all computer structures, including networks, have been developed for multi-processing, such as the Parallel Virtual Machine (PVM), Linda, and Parasoft.  ...  KEY WORDS: Scalable multiprocessors and multicomputers; massive parallelism; distributed or shared virtual memory; high performance computers; computer architecture.  ...  massively parallel processing.  ... 
doi:10.1007/bf02577791 fatcat:jnvgpsftabcnnabkmpcm5kifqq

Multigrain parallel Delaunay Mesh generation

Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, Nikos Chrisochoides
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
We focus on Parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level.  ...  Our experimental evaluation shows that current SMTs are not capable of executing fine-grain parallelism in PCDM.  ...  We would like to thank Chaman Verma for his initial implementation of the medium-grain PCDM algorithm and the anonymous referees for their valuable comments.  ... 
doi:10.1145/1088149.1088198 dblp:conf/ics/AntonopoulosDCBNC05 fatcat:2rfdnb2w75ewfmuv7n2cyvfbmm

Analysis and performance results of computing betweenness centrality on IBM Cyclops64

Guangming Tan, Vugranam C. Sreedhar, Guang R. Gao
2009 Journal of Supercomputing  
By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture.  ...  We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and nonpreemptive thread execution, and fine-grain data synchronization  ...  The main contribution of this paper includes: • Being aware of massive light-weight hardware threads, we developed a fine-grained parallel algorithm by combining multi-level parallelism.  ... 
doi:10.1007/s11227-009-0339-9 fatcat:rtslfwssvzasleg55erlq6mpoa

A Comparative Study of Multithreading APIs for Software of ICT Equipment

Isma Farah Siddiqui, Asad Abbas, Abdul Rahim Mohamed Ariffin, Scott Uk-Jin Lee
2016 Indian Journal of Science and Technology  
On the other hand, various application level lightweight thread models are being offers with lighter mechanism for high parallelism and massive concurrency.  ...  fine-grainparallelprocessingenvironmentcallsforoptimizedandeffectivemultithreadingstrategiesforICT's software implementations.  ...  These approaches are found to be more feasible for fine-grained parallel codes and nested task parallel structures.  ... 
doi:10.17485/ijst/2016/v9i48/108873 fatcat:vbco5p36xbbwjpbp5dbgzgi72e

Coarse-grained component concurrency in Earth System modeling

V. Balaji, R. Benson, B. Wyman, I. Held
2016 Geoscientific Model Development Discussions  
Each component can further be parallelized on the fine grain, potentially offering a major increase in scalability of Earth system models.  ...  Climate models represent a large variety of processes on a variety of time and space scales, a canonical example of multi-physics multi-scale modeling.  ...  Within the distributed domains, further fine-grained concurrency is achieved between processors sharing 5 physical memory, with execution threads accessing the same memory locations, using protocols such  ... 
doi:10.5194/gmd-2016-114 fatcat:i5j5cnbyhzbblhyveg5i6hrala

Using Runtime Systems Tools to Implement Efficient Preconditioners for Heterogeneous Architectures

Adrien Roussel, Jean-Marc Gratien, Thierry Gautier, S. de Chaisemartin
2016 Oil & Gas Science and Technology  
Nevertheless, algorithms need to be well suited for these massively parallel architectures.  ...  step is to propose a massively parallel implementations of these techniques.  ...  We present the way we manage multi-level parallelisms both at a coarse and a fine grain parallelism.  ... 
doi:10.2516/ogst/2016020 fatcat:vxx6rbkflnhhrcemgu32zeebge

Spanning Large Graphs by Combining Work-stealing with Multiple Parallel Granularities on GPU

2016 Revista Técnica de la Facultad de Ingeniería Universidad del Zulia  
On the other hand work-stealing mechanism can keep workload balanced among stream multi-processors inside GPU and maintain efficient execution of each multi-processor.  ...  Moreover, this coarse-grained spanning can reduce branch divergence and scatter memory accesses.  ...  Spanning Graph with Fine-grained Parallel In this fine-grained parallel spanning(Duaneet al., 2012), a shared array of column indices offsets was constructed for all threads in each threadblock.  ... 
doi:10.21311/001.39.5.42 fatcat:7gurjh3jarbg3bk3cf4kjoofgu
« Previous Showing results 1 — 15 out of 3,654 results