A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Fast Local Flow-based Method using Parallel Multi-core CPUs Architecture
2021
International Journal of Intelligent Engineering and Systems
SimpleLocal (SL) algorithm detects a best conductance cuts close to seed vertices set. In this paper, a new Parallel SimpleLocal (PSL) system is proposed using multicore CPUs. ...
Traditional methods of clustering are not suitable to tackle the problem of clustering large graphs because the computation is very costly, which is solved by local graph clustering using a given vertex ...
Author Contributions The entire work of conceptualization, formal analysis, validation, implementation, writing, editing and modification of manuscript were done by Rashed Salem and Wafaa Abdel-Moneim ...
doi:10.22266/ijies2021.0831.01
fatcat:aekbnk5ssrakbacien6jq3gzke
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
2014
Medical Imaging 2014: PACS and Imaging Informatics: Next Generation and Innovations
computing systems such as the multicore, cluster, and cloud computing systems. ...
Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3D-MIP platform when a larger number of cores ...
Acknowledgments The project described was supported in part by grant R01CA131718 and R01CA166816 from the National Cancer Institute at the National Institutes of Health. ...
doi:10.1117/12.2043869
pmid:24910506
pmcid:PMC4043288
fatcat:6iqegdm5nremnnnq7gclo6ucla
Exploiting Parallelism by Data Dependency Elimination: A Case Study of Circuit Simulation Algorithms
2013
IEEE design & test
To efficiently parallelize these algorithms on multicore CPUs and many-core GPUs, a few recent innovations of parallelization have been proposed [4]-[7] by reformulating the original irregular or coupled ...
Recently, multicore CPUs and many-core GPUs have become widely adopted with largely reduced cost. ...
Current multicore CPUs are usually integrated with one to four cores, or even six cores, on a single die. ...
doi:10.1109/mdt.2012.2226201
fatcat:ff5h4qyj45cinbloyhupsnf3ua
Fast Processing of Large Graph Applications Using Asynchronous Architecture
[article]
2017
arXiv
pre-print
We create a specialized ISA to support these operations. (2) The application compilation and mapping process uses a graph clustering algorithm to optimize parallel computing of graph operations and load ...
Through the clustering process, we make scalability an inherent property of the architecture where task-to-element mapping can be done at the graph node level or at node cluster level. ...
It has two FIFO structures, one to communicate with neighbors and one internal FIFO to emulate multiple graph nodes (node cluster mode execution). ...
arXiv:1706.09953v1
fatcat:t6mnivvyzbfmpmkmufq2n4ou2u
Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods
2012
2012 11th International Symposium on Parallel and Distributed Computing
We have been able to leverage the scalability and architectural adaptability of the ExaFMM library to create a Force-Directed Graph Layout implementation that runs efficiently on distributed multicore ...
Solving layout problems for truly large graphs with millions of vertices still requires a scalable algorithm and implementation. ...
On the GPU, good strong scalability is achieved only when |V | > 10 6 . The breakdown of the CPU and GPU runtime for |V | = 10 7 is shown in Fig. 4 . ...
doi:10.1109/ispdc.2012.32
dblp:conf/ispdc/YunisYA12
fatcat:7pxwxgofovh7fhxzpsac44hlb4
Fast Parallel All-Subgraph Enumeration Using Multicore Machines
2015
Scientific Programming
Subenum enumerates subgraphs using edges instead of vertices, and this approach leads to a parallel and load-balanced enumeration algorithm that can have efficient execution on current multicore and multiprocessor ...
Hence, Subenum can handle large input graphs and subgraph sizes that other solutions cannot handle. Several experiments are done using real-world input graphs. ...
Then, we evaluate scalability and parallelism performance of Subenum on multicore and multiprocessor machines. ...
doi:10.1155/2015/901321
fatcat:xy4vdw24k5g2hhl223xcrl4ddq
Deployment of query plans on multicores
2014
Proceedings of the VLDB Endowment
Efficient resource scheduling of multithreaded software on multicore hardware is difficult given the many parameters involved and the hardware heterogeneity of existing systems. ...
In this paper we explore the efficient deployment of query plans over a multicore machine. We focus on shared query systems, and implement the proposed ideas using SharedDB. ...
As presented in Figure 4 , the algorithm consists of four phases: (1) operator graph collapsing, (2) bin-packing of relational operators to clusters based on the CPU utilization dimension of the RAVs, ...
doi:10.14778/2735508.2735513
fatcat:icoyrvjbljbldo4byn3nwxu7ii
Virtual private supercomputer: Design and evaluation
2013
Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers
Virtual private supercomputer is an efficient way of conducting experiments on high-performance computational environment and the main role in this approach is played by virtualization and data consolidation ...
In between experiments data consolidation is used to store initial data and results in a distributed storage system and offers API for distributed data processing. ...
for Basic Research (project N 13-07-00747) and St. ...
doi:10.1109/csitechnol.2013.6710358
fatcat:gbeqt3ls3feebegnwljmohhk5a
Scalable HMM based inference engine in large vocabulary continuous speech recognition
2009
2009 IEEE International Conference on Multimedia and Expo
We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor ...
Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. ...
parallel on the multicore and manycore processors. ...
doi:10.1109/icme.2009.5202871
dblp:conf/icmcs/ChongYYGHSK09
fatcat:f7xpdimcwbam3nl2atjdfktbd4
FlashR: R-Programmed Parallel and Scalable Machine Learning using SSDs
[article]
2017
arXiv
pre-print
We evaluate FlashR on a variety of machine learning and statistics algorithms on inputs of up to four billion data points. FlashR out-of-core tracks closely the performance of FlashR in-memory. ...
To reduce data movement between CPU and SSDs, FlashR evaluates matrix operations lazily, fuses operations at runtime, and uses cache-aware, two-level matrix partitioning. ...
FlashR-IM and FlashR-EM run on one EC2 i2.8xlarge instance (16 CPU cores) and Spark MLlib runs on a cluster of four EC2 c4.8xlarge instances (72 CPU cores). ...
arXiv:1604.06414v4
fatcat:dsobnkm2tbbn5oe4h4orqzdzfa
Studying multicore processor scaling via reuse distance analysis
2013
SIGARCH Computer Architecture News
This paper applies RD analysis to study the scalability of multicore cache hierarchies. ...
We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. ...
Acknowledgment The authors would like to thank the anonymous reviewers for their helpful comments, and Abdel-Hameed Badawy for insightful discussions. ...
doi:10.1145/2508148.2485965
fatcat:vjuzzdw2rrekpf7ofcd76nyo2i
Studying multicore processor scaling via reuse distance analysis
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
This paper applies RD analysis to study the scalability of multicore cache hierarchies. ...
We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. ...
Acknowledgment The authors would like to thank the anonymous reviewers for their helpful comments, and Abdel-Hameed Badawy for insightful discussions. ...
doi:10.1145/2485922.2485965
dblp:conf/isca/WuZY13
fatcat:g6p2y66rjndv7natn4a5fv3atq
Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing
2022
Scientific Programming
The scalability of the parallel program is also verified in multiple nodes on a distributed memory system and is approximately linear. ...
The results of the experiments for the ligand-receptor binding process show that on a multicore server with GPUs the parallel program has achieved a speedup ratio as high as 45 times in flexible docking ...
In computing the interaction energy, because one atom of the ligand and one atom belonging to the residues must interact with each other, the two atoms are considered as an atom pair and the interaction ...
doi:10.1155/2022/9197606
fatcat:wxdgjbicmve6lncosgpc6wqkwq
Parallel scalability in speech recognition
2009
IEEE Signal Processing Magazine
We propose four application-level implementation alternatives called algorithm styles and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and ...
Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. ...
ACKNOWLEDGMENTS The authors would like to thank Pradeep Dubey, Lynda Grindstaff, and Yasser Rasheed at Intel for initiating and supporting this research and Nelson Morgan, Andreas Stolcke, and Adam Janin ...
doi:10.1109/msp.2009.934124
fatcat:jfqfjdhpjbfz5ehnxefvlkjeaq
Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for CPU cores and/or accelerators/coprocessors on a node. ...
Present trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining ...
This model is similar to running on most other CPU-only clusters. For the symmetric model, programs can run on both the host CPU and the Xeon Phi co-processor card natively. ...
doi:10.1145/2503210.2503250
dblp:conf/sc/MengHSB13
fatcat:g36u5ewh7na3zgdem377ztrqdm
« Previous
Showing results 1 — 15 out of 1,860 results