Filters








1,860 Hits in 4.7 sec

Fast Local Flow-based Method using Parallel Multi-core CPUs Architecture

Rashed Salem, Menoufia University, Wafaa Abdel-Moneim, Mohamed Hassan, Zagazig University, Zagazig University
2021 International Journal of Intelligent Engineering and Systems  
SimpleLocal (SL) algorithm detects a best conductance cuts close to seed vertices set. In this paper, a new Parallel SimpleLocal (PSL) system is proposed using multicore CPUs.  ...  Traditional methods of clustering are not suitable to tackle the problem of clustering large graphs because the computation is very costly, which is solved by local graph clustering using a given vertex  ...  Author Contributions The entire work of conceptualization, formal analysis, validation, implementation, writing, editing and modification of manuscript were done by Rashed Salem and Wafaa Abdel-Moneim  ... 
doi:10.22266/ijies2021.0831.01 fatcat:aekbnk5ssrakbacien6jq3gzke

Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy

Hiroyuki Yoshida, Yin Wu, Wenli Cai, Maria Y. Law, Tessa S. Cook
2014 Medical Imaging 2014: PACS and Imaging Informatics: Next Generation and Innovations  
computing systems such as the multicore, cluster, and cloud computing systems.  ...  Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3D-MIP platform when a larger number of cores  ...  Acknowledgments The project described was supported in part by grant R01CA131718 and R01CA166816 from the National Cancer Institute at the National Institutes of Health.  ... 
doi:10.1117/12.2043869 pmid:24910506 pmcid:PMC4043288 fatcat:6iqegdm5nremnnnq7gclo6ucla

Exploiting Parallelism by Data Dependency Elimination: A Case Study of Circuit Simulation Algorithms

Wei Wu, Fang Gong, Rahul Krishnan, Hao Yu, Lei He
2013 IEEE design & test  
To efficiently parallelize these algorithms on multicore CPUs and many-core GPUs, a few recent innovations of parallelization have been proposed [4]-[7] by reformulating the original irregular or coupled  ...  Recently, multicore CPUs and many-core GPUs have become widely adopted with largely reduced cost.  ...  Current multicore CPUs are usually integrated with one to four cores, or even six cores, on a single die.  ... 
doi:10.1109/mdt.2012.2226201 fatcat:ff5h4qyj45cinbloyhupsnf3ua

Fast Processing of Large Graph Applications Using Asynchronous Architecture [article]

Michel A. Kinsy, Rashmi S. Agrawal, Hien D. Nguyen
2017 arXiv   pre-print
We create a specialized ISA to support these operations. (2) The application compilation and mapping process uses a graph clustering algorithm to optimize parallel computing of graph operations and load  ...  Through the clustering process, we make scalability an inherent property of the architecture where task-to-element mapping can be done at the graph node level or at node cluster level.  ...  It has two FIFO structures, one to communicate with neighbors and one internal FIFO to emulate multiple graph nodes (node cluster mode execution).  ... 
arXiv:1706.09953v1 fatcat:t6mnivvyzbfmpmkmufq2n4ou2u

Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods

Enas Yunis, Rio Yokota, Aron Ahmadia
2012 2012 11th International Symposium on Parallel and Distributed Computing  
We have been able to leverage the scalability and architectural adaptability of the ExaFMM library to create a Force-Directed Graph Layout implementation that runs efficiently on distributed multicore  ...  Solving layout problems for truly large graphs with millions of vertices still requires a scalable algorithm and implementation.  ...  On the GPU, good strong scalability is achieved only when |V | > 10 6 . The breakdown of the CPU and GPU runtime for |V | = 10 7 is shown in Fig. 4 .  ... 
doi:10.1109/ispdc.2012.32 dblp:conf/ispdc/YunisYA12 fatcat:7pxwxgofovh7fhxzpsac44hlb4

Fast Parallel All-Subgraph Enumeration Using Multicore Machines

Saeed Shahrivari, Saeed Jalili
2015 Scientific Programming  
Subenum enumerates subgraphs using edges instead of vertices, and this approach leads to a parallel and load-balanced enumeration algorithm that can have efficient execution on current multicore and multiprocessor  ...  Hence, Subenum can handle large input graphs and subgraph sizes that other solutions cannot handle. Several experiments are done using real-world input graphs.  ...  Then, we evaluate scalability and parallelism performance of Subenum on multicore and multiprocessor machines.  ... 
doi:10.1155/2015/901321 fatcat:xy4vdw24k5g2hhl223xcrl4ddq

Deployment of query plans on multicores

Jana Giceva, Gustavo Alonso, Timothy Roscoe, Tim Harris
2014 Proceedings of the VLDB Endowment  
Efficient resource scheduling of multithreaded software on multicore hardware is difficult given the many parameters involved and the hardware heterogeneity of existing systems.  ...  In this paper we explore the efficient deployment of query plans over a multicore machine. We focus on shared query systems, and implement the proposed ideas using SharedDB.  ...  As presented in Figure 4 , the algorithm consists of four phases: (1) operator graph collapsing, (2) bin-packing of relational operators to clusters based on the CPU utilization dimension of the RAVs,  ... 
doi:10.14778/2735508.2735513 fatcat:icoyrvjbljbldo4byn3nwxu7ii

Virtual private supercomputer: Design and evaluation

Ivan Gankevich, Vladimir Gaiduchok, Dmitry Gushchanskiy, Yuri Tipikin, Vladimir Korkhov, Alexander Degtyarev, Alexander Bogdanov, Valeriy Zolotarev
2013 Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers  
Virtual private supercomputer is an efficient way of conducting experiments on high-performance computational environment and the main role in this approach is played by virtualization and data consolidation  ...  In between experiments data consolidation is used to store initial data and results in a distributed storage system and offers API for distributed data processing.  ...  for Basic Research (project N 13-07-00747) and St.  ... 
doi:10.1109/csitechnol.2013.6710358 fatcat:gbeqt3ls3feebegnwljmohhk5a

Scalable HMM based inference engine in large vocabulary continuous speech recognition

Jike Chong, Kisun You, Youngmin Yi, Ekaterina Gonina, Christopher Hughes, Wonyong Sung, Kurt Keutzer
2009 2009 IEEE International Conference on Multimedia and Expo  
We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor  ...  Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs.  ...  parallel on the multicore and manycore processors.  ... 
doi:10.1109/icme.2009.5202871 dblp:conf/icmcs/ChongYYGHSK09 fatcat:f7xpdimcwbam3nl2atjdfktbd4

FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs [article]

Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, Randal Burns
2017 arXiv   pre-print
We evaluate FlashR on a variety of machine learning and statistics algorithms on inputs of up to four billion data points. FlashR out-of-core tracks closely the performance of FlashR in-memory.  ...  To reduce data movement between CPU and SSDs, FlashR evaluates matrix operations lazily, fuses operations at runtime, and uses cache-aware, two-level matrix partitioning.  ...  FlashR-IM and FlashR-EM run on one EC2 i2.8xlarge instance (16 CPU cores) and Spark MLlib runs on a cluster of four EC2 c4.8xlarge instances (72 CPU cores).  ... 
arXiv:1604.06414v4 fatcat:dsobnkm2tbbn5oe4h4orqzdzfa

Studying multicore processor scaling via reuse distance analysis

Meng-Ju Wu, Minshu Zhao, Donald Yeung
2013 SIGARCH Computer Architecture News  
This paper applies RD analysis to study the scalability of multicore cache hierarchies.  ...  We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling.  ...  Acknowledgment The authors would like to thank the anonymous reviewers for their helpful comments, and Abdel-Hameed Badawy for insightful discussions.  ... 
doi:10.1145/2508148.2485965 fatcat:vjuzzdw2rrekpf7ofcd76nyo2i

Studying multicore processor scaling via reuse distance analysis

Meng-Ju Wu, Minshu Zhao, Donald Yeung
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
This paper applies RD analysis to study the scalability of multicore cache hierarchies.  ...  We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling.  ...  Acknowledgment The authors would like to thank the anonymous reviewers for their helpful comments, and Abdel-Hameed Badawy for insightful discussions.  ... 
doi:10.1145/2485922.2485965 dblp:conf/isca/WuZY13 fatcat:g6p2y66rjndv7natn4a5fv3atq

Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing

Jianhua Li, Guanlong Liu, Zhiyuan Zhen, Zihao Shen, Shiliang Li, Honglin Li, Basilio B. Fraguela
2022 Scientific Programming  
The scalability of the parallel program is also verified in multiple nodes on a distributed memory system and is approximately linear.  ...  The results of the experiments for the ligand-receptor binding process show that on a multicore server with GPUs the parallel program has achieved a speedup ratio as high as 45 times in flexible docking  ...  In computing the interaction energy, because one atom of the ligand and one atom belonging to the residues must interact with each other, the two atoms are considered as an atom pair and the interaction  ... 
doi:10.1155/2022/9197606 fatcat:wxdgjbicmve6lncosgpc6wqkwq

Parallel scalability in speech recognition

Kisun You, Jike Chong, Youngmin Yi, Ekaterina Gonina, Christopher Hughes, Yen-Kuang Chen, Wonyong Sung, Kurt Keutzer
2009 IEEE Signal Processing Magazine  
We propose four application-level implementation alternatives called algorithm styles and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and  ...  Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs.  ...  ACKNOWLEDGMENTS The authors would like to thank Pradeep Dubey, Lynda Grindstaff, and Yasser Rasheed at Intel for initiating and supporting this research and Nelson Morgan, Andreas Stolcke, and Adam Janin  ... 
doi:10.1109/msp.2009.934124 fatcat:jfqfjdhpjbfz5ehnxefvlkjeaq

Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers

Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for CPU cores and/or accelerators/coprocessors on a node.  ...  Present trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining  ...  This model is similar to running on most other CPU-only clusters. For the symmetric model, programs can run on both the host CPU and the Xeon Phi co-processor card natively.  ... 
doi:10.1145/2503210.2503250 dblp:conf/sc/MengHSB13 fatcat:g36u5ewh7na3zgdem377ztrqdm
« Previous Showing results 1 — 15 out of 1,860 results