Filters








1,221 Hits in 6.0 sec

A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce

Tahir, Sardaraz
2020 Genes  
In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences.  ...  Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management.  ...  Acknowledgments: All the authors are grateful to those who provide guidelines and suggestions throughout this research work. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/genes11020166 pmid:32033366 pmcid:PMC7074349 fatcat:nuh3dedypvfgfn52czvwg6dss4

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

Emad A Mohammed, Behrouz H Far, Christopher Naugler
2014 BioData Mining  
Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability.  ...  This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields  ...  towards scalable solutions [74] .  ... 
doi:10.1186/1756-0381-7-22 pmid:25383096 pmcid:PMC4224309 fatcat:zpis7kklerh2vna5le2gtxc5vi

Big Data Knowledge System in Healthcare [chapter]

Gunasekaran Manogaran, Chandu Thota, Daphne Lopez, V. Vijayakumar, Kaja M. Abbas, Revathi Sundarsekar
2017 Studies in Big Data  
The chapter proposes a big data based knowledge management system to develop the clinical decisions.  ...  The proposed methodology asynchronously communicates with different data sources and produces many alternative decisions to the doctor.  ...  In order to achieve the above task, there is a need to develop the scalable computing methodologies for processing such huge amount of genomics data.  ... 
doi:10.1007/978-3-319-49736-5_7 fatcat:d7zfowy2prc7hbvitso7zcasga

DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining

Eirini Stamoulakatou, Andrea Gulino, Pietro Pinoli
2018 2018 IEEE International Conference on Big Data (Big Data)  
With the rapid growth of genomic data, the need for scalable data mining algorithms has increased.  ...  Experimental results on real-world datasets confirm our performance expectations, showing a better scalability when compared to other distributed solutions.  ...  ACKNOWLEDGMENT The authors would like to thank all the colleagues of the GeCo ERC project for their help.  ... 
doi:10.1109/bigdata.2018.8622007 dblp:conf/bigdataconf/StamoulakatouGP18 fatcat:krt7muw74bcpbcpoj2arte5feq

Towards an environment for data mining based analysis processes in bioinformatics and personalized medicine

Dennis Wegener, Simona Rossi, Francesca Buffa, Mauro Delorenzi, Stefan Rüping
2013 Network Modeling Analysis in Health Informatics and Bioinformatics  
In this paper, we present the challenges for data mining based analysis in bio-and medical informatics and our approach towards a data mining environment addressing these requirements in the p-medicine  ...  To serve these new and diverse needs, bioinformatics and data mining are teaming up to generate tools and procedures for prediction of disease recurrence and progression, response to treatment, as well  ...  Acknowledgments The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013  ... 
doi:10.1007/s13721-013-0022-1 dblp:journals/netmahib/WegenerRBDR13 fatcat:xnr6jsiko5ff5px2mxbk7ckwfy

Protein Sequence Search based on N-gram Indexing

Mi-Nyeong Hwang, Jinsuk Kim
2006 Interdisciplinary Bio Central  
According to the advancement of experimental techniques in molecular biology, genomic and protein sequence databases are increasing in size exponentially, and mean sequence lengths are also increasing.  ...  Because the sizes of these databases become larger, it is difficult to search similar sequences in biological databases with significant homologies to a query sequence.  ...  Availability The Protein Sequence Search (ProSeS) service is available at http://proses.kisti.re.kr.  ... 
doi:10.4051/ibce.2009.1.0008 fatcat:ncupnj5vubd5xechftrs6epqky

Data mining and search techniques in the biotechnology and Web environment: a comparison

B.W. Koester
2001 South African Journal of Information Management  
Like statistics, data mining is not a business solution, it is just a technology.  ...  interest in genomic sequence is increasing too fast to be managed by traditional methods.  ... 
doi:10.4102/sajim.v3i2.137 fatcat:uiadvvlfhnepnl3uaqy3gclcfa

Big Data Analytics in Bioinformatics: A Machine Learning Perspective [article]

Hirak Kashyap, Hasin Afzal Ahmed, Nazrul Hoque, Swarup Roy, Dhruba Kumar Bhattacharyya
2015 arXiv   pre-print
These methods can be scaled to handle big data using the distributed and parallel computing technologies.  ...  Similarly, graph-based architectures and in-memory big data tools have been developed to minimize I/O cost and optimize iterative processing.  ...  ACKNOWLEDGMENTS The authors would like to thank the Ministry of HRD, Govt. of India for funding as a Centre of Excellence with thrust area in Machine Learning Research and Big Data Analytics for the period  ... 
arXiv:1506.05101v1 fatcat:oix7d5hecbfgthzhepznwyi6fm

Survey of MapReduce frame operation in bioinformatics

Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, K. Chen
2013 Briefings in Bioinformatics  
The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and  ...  Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing.  ...  The Apache Hadoop gives researchers a possibility of achieving scalable, efficient and reliable computing performance on Linux clusters and cloud computing services.  ... 
doi:10.1093/bib/bbs088 pmid:23396756 fatcat:sro4pk6aobcotoeozbbtpo6x5q

Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences

S. Rajasekaran, L. Arockiam
2014 International Journal of Computer Applications  
The challenging task in pattern finding of bio-sequences is to find FCP. FCP gives clues for genetic discovery, functional analysis and also helps to assemble a whole genome of species.  ...  There is the difference in pattern finding algorithms of these two sequences. The chances of repeatedly occurring small patterns are high in bio-sequences than in the transaction sequences.  ...  Thus frequent pattern mining provides the solution for association rules formation.  ... 
doi:10.5120/16661-6646 fatcat:37rwdfdh3fawhlayuyckt4q2ju

Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data [chapter]

Niko Välimäki, Eric Rivals
2013 Lecture Notes in Computer Science  
Philippe et al. (2011) proposed a data structure called Gk arrays for indexing and querying large collections of high-throughput sequencing data in main-memory.  ...  In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query  ...  Most efficient programs for mapping reads onto a reference genome resort to a FM-index of the genome [7] , which is small enough to fit in memory (e.g. [16, 20] ).  ... 
doi:10.1007/978-3-642-38036-5_24 fatcat:fy75rgw7wvh4bamvmcukik6yqe

An Imminent Approach for Genome Sequence and Analysis using Map Reduce

C.J. Kavithapriya
2015 International Journal of Computer Applications  
Cancer prevails as a challenging issue because of its different mutations.  ...  The recent trend of BigData in Healthcare is overpowering and necessity increasing rapidly because of its data type diversity in addition to its volume, managing speed and leads to improving care even  ...  to its smaller size and ability to be indexed for search.  ... 
doi:10.5120/ijca2015906595 fatcat:blzn5hnelbhsbhirr4zo7tfwti

Cloud Based Metalearning System for Predictive Modeling of Biomedical Data

Milan Vukićević, Sandro Radovanović, Miloš Milovanović, Miroslav Minović
2014 The Scientific World Journal  
On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms.  ...  This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies  ...  Essentially, it provides a parallel read-mapping algorithm optimized for mapping sequence data to the human genome and other reference genomes, intended for use in a biological analysis including SNP discovery  ... 
doi:10.1155/2014/859279 pmid:24892101 pmcid:PMC4032768 fatcat:pv6bldhbgvakzdyximx5qt45de

Parallel computing for genome sequence processing

You Zou, Yuejie Zhu, Yaohang Li, Fang-Xiang Wu, Jianxin Wang
2021 Briefings in Bioinformatics  
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing.  ...  The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing.  ...  Pattern mining and matching Pattern mining and matching is an important case of genome sequence processing. It refers to finding out patterns from the number of sequences.  ... 
doi:10.1093/bib/bbab070 pmid:33822883 fatcat:a4hj2fhybrc6zlsq6xyiu6snmy

Report on the First and Second Interdisciplinary Time Series Analysis Workshop (ITISA)

Themis Palpanas, Volker Beckmann
2019 SIGMOD record  
In order to analyze the existing and (more importantly) future very large time series collections, new technologies and the development of more efficient and smarter algorithms are required.  ...  The analysis of time-series data associated with modernday industrial operations and scientific experiments is now pushing both computational power and resources to their limits.  ...  genome sequences).  ... 
doi:10.1145/3377391.3377400 fatcat:bwlnr4bi3zd4lagtacev5ao4sm
« Previous Showing results 1 — 15 out of 1,221 results