A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce
2020
Genes
In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences. ...
Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management. ...
Acknowledgments: All the authors are grateful to those who provide guidelines and suggestions throughout this research work.
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/genes11020166
pmid:32033366
pmcid:PMC7074349
fatcat:nuh3dedypvfgfn52czvwg6dss4
Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends
2014
BioData Mining
Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. ...
This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields ...
towards scalable solutions [74] . ...
doi:10.1186/1756-0381-7-22
pmid:25383096
pmcid:PMC4224309
fatcat:zpis7kklerh2vna5le2gtxc5vi
Big Data Knowledge System in Healthcare
[chapter]
2017
Studies in Big Data
The chapter proposes a big data based knowledge management system to develop the clinical decisions. ...
The proposed methodology asynchronously communicates with different data sources and produces many alternative decisions to the doctor. ...
In order to achieve the above task, there is a need to develop the scalable computing methodologies for processing such huge amount of genomics data. ...
doi:10.1007/978-3-319-49736-5_7
fatcat:d7zfowy2prc7hbvitso7zcasga
DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining
2018
2018 IEEE International Conference on Big Data (Big Data)
With the rapid growth of genomic data, the need for scalable data mining algorithms has increased. ...
Experimental results on real-world datasets confirm our performance expectations, showing a better scalability when compared to other distributed solutions. ...
ACKNOWLEDGMENT The authors would like to thank all the colleagues of the GeCo ERC project for their help. ...
doi:10.1109/bigdata.2018.8622007
dblp:conf/bigdataconf/StamoulakatouGP18
fatcat:krt7muw74bcpbcpoj2arte5feq
Towards an environment for data mining based analysis processes in bioinformatics and personalized medicine
2013
Network Modeling Analysis in Health Informatics and Bioinformatics
In this paper, we present the challenges for data mining based analysis in bio-and medical informatics and our approach towards a data mining environment addressing these requirements in the p-medicine ...
To serve these new and diverse needs, bioinformatics and data mining are teaming up to generate tools and procedures for prediction of disease recurrence and progression, response to treatment, as well ...
Acknowledgments The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013 ...
doi:10.1007/s13721-013-0022-1
dblp:journals/netmahib/WegenerRBDR13
fatcat:xnr6jsiko5ff5px2mxbk7ckwfy
Protein Sequence Search based on N-gram Indexing
2006
Interdisciplinary Bio Central
According to the advancement of experimental techniques in molecular biology, genomic and protein sequence databases are increasing in size exponentially, and mean sequence lengths are also increasing. ...
Because the sizes of these databases become larger, it is difficult to search similar sequences in biological databases with significant homologies to a query sequence. ...
Availability The Protein Sequence Search (ProSeS) service is available at http://proses.kisti.re.kr. ...
doi:10.4051/ibce.2009.1.0008
fatcat:ncupnj5vubd5xechftrs6epqky
Data mining and search techniques in the biotechnology and Web environment: a comparison
2001
South African Journal of Information Management
Like statistics, data mining is not a business solution, it is just a technology. ...
interest in genomic sequence is increasing too fast to be managed by traditional methods. ...
doi:10.4102/sajim.v3i2.137
fatcat:uiadvvlfhnepnl3uaqy3gclcfa
Big Data Analytics in Bioinformatics: A Machine Learning Perspective
[article]
2015
arXiv
pre-print
These methods can be scaled to handle big data using the distributed and parallel computing technologies. ...
Similarly, graph-based architectures and in-memory big data tools have been developed to minimize I/O cost and optimize iterative processing. ...
ACKNOWLEDGMENTS The authors would like to thank the Ministry of HRD, Govt. of India for funding as a Centre of Excellence with thrust area in Machine Learning Research and Big Data Analytics for the period ...
arXiv:1506.05101v1
fatcat:oix7d5hecbfgthzhepznwyi6fm
Survey of MapReduce frame operation in bioinformatics
2013
Briefings in Bioinformatics
The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and ...
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. ...
The Apache Hadoop gives researchers a possibility of achieving scalable, efficient and reliable computing performance on Linux clusters and cloud computing services. ...
doi:10.1093/bib/bbs088
pmid:23396756
fatcat:sro4pk6aobcotoeozbbtpo6x5q
Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences
2014
International Journal of Computer Applications
The challenging task in pattern finding of bio-sequences is to find FCP. FCP gives clues for genetic discovery, functional analysis and also helps to assemble a whole genome of species. ...
There is the difference in pattern finding algorithms of these two sequences. The chances of repeatedly occurring small patterns are high in bio-sequences than in the transaction sequences. ...
Thus frequent pattern mining provides the solution for association rules formation. ...
doi:10.5120/16661-6646
fatcat:37rwdfdh3fawhlayuyckt4q2ju
Scalable and Versatile k-mer Indexing for High-Throughput Sequencing Data
[chapter]
2013
Lecture Notes in Computer Science
Philippe et al. (2011) proposed a data structure called Gk arrays for indexing and querying large collections of high-throughput sequencing data in main-memory. ...
In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query ...
Most efficient programs for mapping reads onto a reference genome resort to a FM-index of the genome [7] , which is small enough to fit in memory (e.g. [16, 20] ). ...
doi:10.1007/978-3-642-38036-5_24
fatcat:fy75rgw7wvh4bamvmcukik6yqe
An Imminent Approach for Genome Sequence and Analysis using Map Reduce
2015
International Journal of Computer Applications
Cancer prevails as a challenging issue because of its different mutations. ...
The recent trend of BigData in Healthcare is overpowering and necessity increasing rapidly because of its data type diversity in addition to its volume, managing speed and leads to improving care even ...
to its smaller size and ability to be indexed for search. ...
doi:10.5120/ijca2015906595
fatcat:blzn5hnelbhsbhirr4zo7tfwti
Cloud Based Metalearning System for Predictive Modeling of Biomedical Data
2014
The Scientific World Journal
On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. ...
This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies ...
Essentially, it provides a parallel read-mapping algorithm optimized for mapping sequence data to the human genome and other reference genomes, intended for use in a biological analysis including SNP discovery ...
doi:10.1155/2014/859279
pmid:24892101
pmcid:PMC4032768
fatcat:pv6bldhbgvakzdyximx5qt45de
Parallel computing for genome sequence processing
2021
Briefings in Bioinformatics
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. ...
The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. ...
Pattern mining and matching Pattern mining and matching is an important case of genome sequence processing. It refers to finding out patterns from the number of sequences. ...
doi:10.1093/bib/bbab070
pmid:33822883
fatcat:a4hj2fhybrc6zlsq6xyiu6snmy
Report on the First and Second Interdisciplinary Time Series Analysis Workshop (ITISA)
2019
SIGMOD record
In order to analyze the existing and (more importantly) future very large time series collections, new technologies and the development of more efficient and smarter algorithms are required. ...
The analysis of time-series data associated with modernday industrial operations and scientific experiments is now pushing both computational power and resources to their limits. ...
genome sequences). ...
doi:10.1145/3377391.3377400
fatcat:bwlnr4bi3zd4lagtacev5ao4sm
« Previous
Showing results 1 — 15 out of 1,221 results