Filters








1,723 Hits in 7.9 sec

CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

Andréa Matsunaga, Maurício Tsugawa, José Fortes
2008 2008 IEEE Fourth International Conference on eScience  
The results encourage the use of the proposed approach for the execution of large-scale bioinformatics applications on emerging distributed environments that provide access to computing resources as a  ...  Both versions demonstrated performance gains as the number of available processors increased, with CloudBLAST delivering speedups of 57 against 52.4 of MPI version, when 64 processors on 2 sites were used  ...  Grid-computing approaches have been developed with the primary goal of enabling the use of distributed pools of resources.  ... 
doi:10.1109/escience.2008.62 dblp:conf/eScience/MatsunagaTF08 fatcat:nrvf6gbjrjcznnjd6f5k5aqqym

Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library

Seung-Jin Sul, Andrey Tovchigrechko
2011 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum  
We built a parallel BLAST implementation that calls the high-level methods of unmodified NCBI C++ Toolkit.  ...  We demonstrated scaling for up to 1000 cores on TACC Ranger cluster when processing the sufficiently large input datasets.  ...  The database partitions are created by running the standard NCBI BLAST tool formatdb on the entire database in FASTA format.  ... 
doi:10.1109/ipdps.2011.180 dblp:conf/ipps/SulT11 fatcat:vji37cumivgc5huw7xbcqmbdim

Protein construction-based data partitioning scheme for alignment of protein macromolecular structures through distributed querying in federated databases

Dariusz Mrozek, SMC Jacek Kwiendacz, Bozena Malysiak-Mrozek
2019 IEEE Transactions on Nanobioscience  
We solve the problem of finding proteins in Oracle relational database on the basis of the similarity of 3D protein structures with the use of distributed PAR-P3D-SQL queries.  ...  The data partitioning scheme relies on protein construction, which requires data preprocessing but results in shorter exploration times through querying federated databases.  ...  Query execution in such an environment usually relies on Distributed Partitioned Views and Distributed SQL.  ... 
doi:10.1109/tnb.2019.2930494 pmid:31329125 fatcat:idgrhqx65zewfc776j2vhpiwsi

Distributed BLAST in a Grid Computing Context [chapter]

Micha M. Bayer, Richard Sinnott
2005 Lecture Notes in Computer Science  
BLAST is one of the best known sequence comparison programs available in bioinformatics.  ...  Input consisting of multiple query sequences is partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches.  ...  One of the better known implementations of approach 3 is mpiBLAST [10] , an MPI-based implementation of BLAST in which the target database is segmented into a number of fragments that the input is then  ... 
doi:10.1007/11560500_22 fatcat:wlegxcqxqraxpnpzaee4qpqn2e

Using OGSA-DQP to Support Scientific Applications for the Grid [chapter]

M. Nedim Alpdemir, Arijit Mukherjee, Anastasios Gounaris, Norman W. Paton, Alvaro A. A. Fernandes, Rizos Sakellariou, Paul Watson, Peter Li
2005 Lecture Notes in Computer Science  
Each partition in the distributed query plan is assigned to one or more execution nodes.  ...  The number of GQES instances and their location on the grid is specified by the GDQS, based on the decisions made by a query optimiser and represented as an execution schedule for query partitions (i.e  ... 
doi:10.1007/11423287_2 fatcat:xrkmyvrwvva7hhvwvirclpmtfm

Distributed Database Research at COPPE/UFRJ

Marta Mattoso, Vanessa Braganholo, Alexandre A. B. Lima, Leonardo Murta
2011 Journal of Information and Data Management  
Our group has been working with different aspects of distributed and parallel processing of databases in the relational, object-oriented, and XML data models.  ...  Our group has been addressing these challenges by capitalizing on our extensive experience in distributed data management.  ...  Finally, we would like to thank all the students we formed in our group, and list the ones that are currently under our supervision: Anderson Marinho, Carla Rodrigues, Carlos Paulino, Daniel de  ... 
dblp:journals/jidm/MattosoBLM11 fatcat:rekvtcyqhfgfljyebqzktgzw5u

Service-Based Distributed Querying on the Grid [chapter]

M. Nedim Alpdemir, Arijit Mukherjee, Norman W. Paton, Paul Watson, Alvaro A. A. Fernandes, Anastasios Gounaris, Jim Smith
2003 Lecture Notes in Computer Science  
query plans on the one hand, and to their execution over the Grid on the other.  ...  This paper explores one aspect of service-based computing and data management, viz., how to integrate query processing technology with a service-based Grid.  ...  The ideas in this paper have benefited greatly from, and build upon, our collaboration with colleagues, from IBM, Oracle, the UK National e-Science Centre and the Edinburgh Parallel Computing Centre, in  ... 
doi:10.1007/978-3-540-24593-3_32 fatcat:2cnyu6gayjhjppw5owpd6ecjd4

A Pluggable Framework for Parallel Pairwise Sequence Search

Jeremy Archuleta, Wu-chun Feng, Eli Tilevich
2007 IEEE Engineering in Medicine and Biology Society. Conference Proceedings  
Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems.  ...  The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort.  ...  This hybrid approach to parallelizing BLAST realized the benefit of both fitting each fragment in memory through DF and executing subqueries in parallel via QS.  ... 
doi:10.1109/iembs.2007.4352239 pmid:18001905 fatcat:to6tu7dagvcxxnp5ixmgbhbumq

Analysis of Biological Sequence Search Performance in NoSQL Database

Quezia N. Flach, Arthur F. Lorenzon, Marcelo C. Luizelli, Fabio D. Rossi
2020 International Journal of Computer Applications  
The results showed that NoSQL databases have superior scalability and performance to relational databases, and perform very closely with high-performance applications over multiprocessing environments.  ...  In this work, we intend to evaluate the performance of a distributed NoSQL database and possibly present a more feasible performance solution by analyzing the behavior of the NoSQL DynamoDB database when  ...  The mpiBLAST parallelization strategy is based on partitioning the database entry into many fragments, as many as the number of computers on which the application will run.  ... 
doi:10.5120/ijca2020920416 fatcat:di5h2nabpvb4vgfidrdgtgh5yy

Communication Protocols and Message Formats for BLAST Parallelization on Cluster Systems

Hong-Soog Kim, Woo-Hyuk Jang, Dong-Soo Han
2008 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008)  
With the widespread use of BLAST, many parallel versions of BLAST on cluster systems are announced, but little work has been done for the parallel execution in the search for individual query sequence  ...  on BLAST on cluster systems.  ...  Communication Protocol and Message Format Hyper-BLAST invokes processes on remote node that search similar sequence(s) for the given query sequence within own partitioned sub-database.  ... 
doi:10.1109/waina.2008.238 dblp:conf/aina/KimJH08 fatcat:z7zt6ed3rfcypcjjkf5vxxlzsy

A case study of parallel I/O for biological sequence search on Linux clusters

Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson
2004 International Journal of High Performance Computing and Networking  
is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with  ...  to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.  ...  Each worker copies the assigned database fragments to its local storage device and then executes the NCBI blastall to search through the database fragments.  ... 
doi:10.1504/ijhpcn.2004.008350 fatcat:7rpd45uil5dslh6oh74yvylzlq

A case study of parallel I/O for biological sequence search on Linux clusters

Yifeng Zhu, Hong Jiang, Xiao Qin, Swanson
2003 Proceedings IEEE International Conference on Cluster Computing CLUSTR-03  
is also found that although the performance of the two variations improves consistently when initially increasing the number of servers, this performance gain from parallel I/O becomes insignificant with  ...  to 10 and 21 folds, respectively; whereas, the variation based on CEFT-PVFS only suffered a two-fold performance degradation.  ...  Each worker copies the assigned database fragments to its local storage device and then executes the NCBI blastall to search through the database fragments.  ... 
doi:10.1109/clustr.2003.1253329 dblp:conf/cluster/ZhuJQS03 fatcat:ieipwwqjyvb6ppnl66cv2tnomq

Semantics-based distributed I/O for mpiBLAST

Pavan Balaji, Wu-chun Feng, Jeremy Archuleta, Heshan Lin, Rajkumar Kettimuthu, Rajeev Thakur, Xiaosong Ma
2008 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08  
BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processes  ...  to search (in parallel) unique segments of the database.  ...  In spite of recent enhancements, I/O processing in mpiBLAST is still a concern, especially in environments that use distributed filesystems with limited I/O capabilities.  ... 
doi:10.1145/1345206.1345262 dblp:conf/ppopp/BalajiFALKTM08 fatcat:jugxa4vmenhmjcgyqbikuqapru

Survey of MapReduce frame operation in bioinformatics

Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, K. Chen
2013 Briefings in Bioinformatics  
reliable computing performance on Linux clusters and on cloud computing services.  ...  The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and  ...  The Apache Hadoop gives researchers a possibility of achieving scalable, efficient and reliable computing performance on Linux clusters and cloud computing services.  ... 
doi:10.1093/bib/bbs088 pmid:23396756 fatcat:sro4pk6aobcotoeozbbtpo6x5q

Semantic-based distributed i/o with the paramedic framework

Pavan Balaji, Wuchun Feng, Heshan Lin
2008 Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08  
Many large-scale applications simultaneously rely on multiple resources for efficient execution.  ...  Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network.  ...  Archuleta for his technical support on this project.  ... 
doi:10.1145/1383422.1383444 dblp:conf/hpdc/BalajiFL08 fatcat:qu22aql4k5as5cv6amho4b7sza
« Previous Showing results 1 — 15 out of 1,723 results