Filters








299 Hits in 5.5 sec

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences [article]

Jin Cao, Dewei Zhong
2020 arXiv   pre-print
In this paper, we develop a randomized algorithm, referred to as Random-MCS, for finding a random instance of Maximal Common Subsequence (MCS) of multiple strings.  ...  A well-known result states that finding a Longest Common Subsequence (LCS) for L strings is NP-hard, e.g., the computational complexity is exponential in L.  ...  For this experiment, we use the basic dynamic programming method to compute LCS, and run our RandomMCS algorithm 1000 times to select the longest one and compare the result with the real LCS.  ... 
arXiv:2009.03352v1 fatcat:w4gyd2r53nb23k6t5date6wviu

Approximating the true evolutionary distance between two genomes

Krister M. Swenson, Mark Marron, Joel V. Earnest-Deyoung, Bernard M. E. Moret
2008 ACM Journal of Experimental Algorithmics  
good enough to enable the simple neighbor-joining procedure to reconstruct our test trees with high accuracy.  ...  In this paper we generalize our approach to compute distances between two arbitrary genomes, but focus on approximating the true evolutionary distance rather than the edit distance.  ...  Acknowledgments This work is supported by the National Science Foundation under grants DEB 01-20709 (on a subcontract to U.  ... 
doi:10.1145/1227161.1402297 fatcat:bzmtyf7t75ha7pn62bpe5e4xae

Objective Assessment of Surgical Technical Skill and Competency in the Operating Room

S. Swaroop Vedula, Masaru Ishii, Gregory D. Hager
2017 Annual Review of Biomedical Engineering  
The algorithms and validation methodologies used for OCASE-T are highly varied; there is no uniform consensus.  ...  Traditional models to train surgeons are being challenged by rapid advances in technology, an intensified patient-safety culture, and a need for value-driven health systems.  ...  Narges Ahmidi for her insightful comments on earlier versions of this review and assistance with illustrations. LITERATURE CITED  ... 
doi:10.1146/annurev-bioeng-071516-044435 pmid:28375649 pmcid:PMC5555216 fatcat:rlspmdequzgdjlxkaqt4vcp4la

Mining Time Series Data [chapter]

Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh, Michail Vlachos, Gautam Das
2009 Data Mining and Knowledge Discovery Handbook  
This chapter gives a high-level survey of time series Data Mining tasks, with an emphasis on time series representations.  ...  While these many different techniques used to solve these problems use a multitude of different techniques, they all have one common factor; they require some high level representation of the data, rather  ...  Longest Common Subsequence Similarity The longest common subsequence similarity measure, or LCSS, is a variation of edit distance used in speech recognition and text pattern matching.  ... 
doi:10.1007/978-0-387-09823-4_56 fatcat:52km7o7aw5dw7awgsa4qa6badm

Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT)

R. Durbin
2014 Bioinformatics  
Motivation: Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly.  ...  Meanwhile, algorithmic development for genotype data has concentrated on statistical methods for phasing and imputation, based on probabilistic matching to hidden Markov model representations of the reference  ...  One approach to more efficient phasing and imputation may be to use computationally efficient approaches such as the positional prefix array methods to seed matches for statistical genotype algorithms,  ... 
doi:10.1093/bioinformatics/btu014 pmid:24413527 pmcid:PMC3998136 fatcat:vems5carsfdxpivtg5l4o7dq7e

Learning deterministic context free grammars: The Omphalos competition

Alexander Clark
2006 Machine Learning  
Our approach integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency.  ...  We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to  ...  We also would like to thank Remi Eyraud and Jean Christophe Janodet for pointing out the literature on NTS grammars.  ... 
doi:10.1007/s10994-006-9592-9 fatcat:dipqknik5needkx4cpm3owjloy

Pluribus—Exploring the Limits of Error Correction Using a Suffix Tree

Daniel Savel, Thomas LaFramboise, Ananth Grama, Mehmet Koyuturk
2017 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
In this paper, we present a novel and effective method called PLURIBUS, for correcting sequencing errors using a generalized suffix trie.  ...  Furthermore, PLURIBUS can be used in conjunction with other contemporary error correction methods to achieve higher levels of accuracy than either tool alone.  ...  Libraries of Medicine, the Center for Science of Information (CSoI), an US National Science Foundation Science and Technology Center, under grant agreement CCF-0939370, and by American Cancer Society  ... 
doi:10.1109/tcbb.2016.2586060 pmid:27362987 pmcid:PMC5754272 fatcat:b25k3nb4hfbt3kruiiouysqrma

A Heuristic Approach for Finding Similarity Indexes of Multivariate Data Sets

Rahim Khan, Muhammad Zakarya, Ayaz Ali Khan, Izaz Ur Rahman, Mohd Amiruddin Abd Rahman, Muhammad Khalis Abdul Karim, Mohd Shafie Mustafa
2020 IEEE Access  
INDEX TERMS Similarity index, multivariate data set, outliers, the longest common subsequence.  ...  Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community.  ...  In the literature, different methods were proposed to solve the longest common subsequence problem particularly for multivariate data sets [19] . Some of these techniques are described below.  ... 
doi:10.1109/access.2020.2968222 fatcat:x4dqd47nvrd7vbsqs7geww3ara

Parameterized Algorithms in Bioinformatics: An Overview

Laurent Bulteau, Mathias Weller
2019 Algorithms  
This work surveys recent developments of parameterized algorithms and complexity for important NP-hard problems in bioinformatics.  ...  Bioinformatics regularly poses new challenges to algorithm engineers and theoretical computer scientists.  ...  Acknowledgments: The authors want to thank Fran Rosamond for suggesting the topic as well as everyone who helped collect interesting results for the manuscript, in particular Jesper Jansson, Steven Kelk  ... 
doi:10.3390/a12120256 fatcat:4dhjdnpibzh43iifgan2fu6bwa

Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays

Thomas D. Wu
2016 Algorithms for Molecular Biology  
Our results on the fly, chicken, and human genomes show that bytecoding with an exception guide array is the fastest method for retrieving auxiliary information.  ...  Enhanced suffix arrays (ESAs) provide fast search speed, but require large auxiliary data structures for storing longest common prefix and child interval information.  ...  Acknowledgements The author thanks Simon Gog for advice on using his SDSL package. Competing interests The author declares that he has no competing interests.  ... 
doi:10.1186/s13015-016-0068-6 pmid:27110277 pmcid:PMC4842304 fatcat:pavthg3vu5dfrn5lqkppnzaosy

The Deletion-Insertion model applied to the genome rearrangement problem

Abra Brisbin, Manda Riehl, Noah Williams
2019 Pure Mathematics and Applications  
We use combinatorial reasoning and permutation statistics to develop a polynomial-time algorithm to approximate the minimum number of transpositions required in the transposition model and to analyze the  ...  Applying one restriction to this model, we obtain the transposition model for genome rearrangement, which was shown to be NP-hard in [4].  ...  Using the method of [10] , getLdc can be run in O(n log log n) time, so algMinLdc can be run in O(n 5 log log n).  ... 
doi:10.1515/puma-2015-0030 fatcat:44lksv2gifbgrhrw2onyidbb54

The Average Common Substring Approach to Phylogenomic Reconstruction

Igor Ulitsky, David Burstein, Tamir Tuller, Benny Chor
2006 Journal of Computational Biology  
We present an algorithm for efficiently computing these distances. In principle, the distance of two long sequences can be calculated in O( ) time. We implemented the algorithm, using suffix arrays.  ...  The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings.  ...  ACKNOWLEDGEMENTS We would like to thanks Eran Bacharach, Tal Pupko, and Jacob Ziv for helpful discussions.  ... 
doi:10.1089/cmb.2006.13.336 pmid:16597244 fatcat:l2y4ypheo5bbncforxo55ffqdi

Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning [article]

Zheng Wang, Cheng Long, Gao Cong, Yiding Liu
2020 arXiv   pre-print
We conduct experiments on real-world trajectory datasets, which verify the effectiveness and efficiency of the proposed algorithms.  ...  Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency.  ...  The authors would like to thank Eamonn Keogh for pointing out some references to the time series literature and also the anonymous reviewers for their constructive comments.  ... 
arXiv:2003.02542v2 fatcat:wupyxy3odremho5okuvg7ymolq

Analysis of Work-Stealing and Parallel Cache Complexity [article]

Yan Gu, Zachary Napier, Yihan Sun
2021 arXiv   pre-print
Our second and main contribution is some new parallel cache complexity for algorithms using the RWS scheduler.  ...  The theoretical efficiency of the RWS scheduler has been analyzed for a variety of settings, but most of them are quite complicated.  ...  A counterexample is the edit distance problem (or longest common subsequence). The recurrence for the divide-and-conquer algorithm is ( ) = 4 ( /2) + (1).  ... 
arXiv:2111.04994v1 fatcat:yjvrumbacjdcnifdf6qvwwblmq

Information Theoretic Approaches to Whole Genome Phylogenies [chapter]

David Burstein, Igor Ulitsky, Tamir Tuller, Benny Chor
2005 Lecture Notes in Computer Science  
We present an algorithm for efficiently computing these distances. In principle, the distance of two long sequences can be calculated in O( ) time. We implemented the algorithm, using suffix arrays.  ...  The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings.  ...  Acknowledgements We would like to thanks Eran Bacharach, Tal Pupko, and Jacob Ziv for helpful discussions.  ... 
doi:10.1007/11415770_22 fatcat:fhdjl343eje7de2y2bheevty44
« Previous Showing results 1 — 15 out of 299 results