Filters








3,473 Hits in 5.1 sec

On Exact and Approximation Algorithms for Distinguishing Substring Selection [chapter]

Jens Gramm, Jiong Guo, Rolf Niedermeier
2003 Lecture Notes in Computer Science  
The NP-complete Distinguishing Substring Selection problem (DSSS for short) asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to Hamming metric  ...  By way of contrast, for a special case of DSSS, we present an exact fixedparameter algorithm solving the problem efficiently.  ...  [14] initiated the research on the algorithmic complexity of distinguishing string selection problems.  ... 
doi:10.1007/978-3-540-45077-1_19 fatcat:iuu2ckc23vfdvatj4tis5o4i64

Highly Scalable Algorithms for Robust String Barcoding [article]

Bhaskar DasGupta, Kishori M. Konwar, Ion I. Mandoiu, Alex A. Shvartsman
2005 arXiv   pre-print
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further  ...  Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds  ...  Acknowledgments The authors would like to thank Claudia Prajescu for her help with the implementation of the multi-step rounding algorithm in [2] .  ... 
arXiv:cs/0502065v1 fatcat:52q5euki5zcf7an5domlp4grme

Highly scalable algorithms for robust string barcoding

Bhaskar DasGupta, Kishori M. Konwar, Ion I. Mandoiu, Alex A. Shvartsman
2005 International Journal of Bioinformatics Research and Applications  
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further  ...  Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds  ...  To see the effect of length and edit distance requirements on the number of distinguishers, for each redundancy requirement we computed both an unconstrained solution, and a solution in which distinguishers  ... 
doi:10.1504/ijbra.2005.007574 pmid:18048126 fatcat:yfwxwidwlnayxf23xxydoognwm

Highly Scalable Algorithms for Robust String Barcoding [chapter]

B. DasGupta, K. M. Konwar, I. I. Măndoiu, A. A. Shvartsman
2005 Lecture Notes in Computer Science  
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further  ...  Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds  ...  To see the effect of length and edit distance requirements on the number of distinguishers, for each redundancy requirement we computed both an unconstrained solution, and a solution in which distinguishers  ... 
doi:10.1007/11428848_129 fatcat:a7gcpo3nfbd4jeeevwromvh5qi

Parameterized Intractability of Distinguishing Substring Selection

Jens Gramm, Jiong Guo, Rolf Niedermeier
2004 Theory of Computing Systems  
In this way, we * An extended abstract of this paper appeared under the title "On exact and approximation algorithms for also exhibit a sharp border between fixed-parameter tractability and intractability  ...  This question is formalized as the NP-complete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of "good" strings and a set of "bad" strings, for a solution string which  ...  (b) Table 1 : 1 Overview on results concerning the approximation and the parameter- ized complexity of Closest (Sub)String and Distinguishing Substring Selection for alphabets of constant size.  ... 
doi:10.1007/s00224-004-1185-z fatcat:n2jdzmlmarastelx4frzhr4s3y

Distinguishing string selection problems

J. Kevin Lanctot, Ming Li, Bin Ma, Shaojiu Wang, Louxin Zhang
2003 Information and Computation  
There is a polynomial-time 4 3 + -approximation algorithm for the Closest String Problem for any small constant > 0.  ...  Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4.  ...  Acknowledgments We would thank Forbes Burkowski, Tao Jiang, Paul Kearney, Ian Munro and Lusheng Wang for discussions on this research and especially Todd Wareham for helpful comments and providing us with  ... 
doi:10.1016/s0890-5401(03)00057-9 fatcat:a4pf3tvpkrestalzhto7u2o2oe

Complexity of Approximating Closest Substring Problems [chapter]

Patricia A. Evans, Andrew D. Smith
2003 Lecture Notes in Computer Science  
For this last problem of length maximization, the approximation bound of 2 is proved to be tight by presenting a 2-approximation algorithm.  ...  on their approximability proved.  ...  In [8] , Sagot presents an exponential exact algorithm for the decision problem version of closest substring, also known as common approximate substring.  ... 
doi:10.1007/978-3-540-45077-1_20 fatcat:zztwrvffgvhuflwxeb33qaipwq

Bounded-Length Smith-Waterman Alignment

Alexander Tiskin, Michael Wagner
2019 Workshop on Algorithms in Bioinformatics  
Our algorithms rely on the techniques of fast window-substring alignment and implicit unit-Monge matrix searching, developed previously by the author and others.  ...  They proposed a dynamic programming algorithm solving the problem in time O(mn 2 ), and also an approximation algorithm running in time O(rmn), where r is a parameter controlling the accuracy of approximation  ...  In fact, our exact algorithm is asymptotically faster not only than the exact algorithm of [1] , but also than their approximation algorithm.  ... 
doi:10.4230/lipics.wabi.2019.16 dblp:conf/wabi/Tiskin19 fatcat:kguhdyuwcjcenlbptgr4wa43fy

Entity Extraction with Knowledge from Web Scale Corpora [article]

Zeyi Wen, Zeyu Huang, Rui Zhang
2019 arXiv   pre-print
A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities.  ...  Experiments show that our techniques bring a notable improvement on efficiency and effectiveness.  ...  A common approach for approximate entity extraction is by comparing a substring against an entity.  ... 
arXiv:1911.09373v1 fatcat:jl4yd2iknvatjkgo3ehhrpbadi

Fast Exact Search in Hamming Space with Multi-Index Hashing [article]

Mohammad Norouzi, Ali Punjani, David J. Fleet
2014 arXiv   pre-print
We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space.  ...  The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes.  ...  The authors would also like to thank Mohamed Aly, Rob Fergus, Ryan Johnson, Abbas Mehrabian, and Pietro Perona for useful discussions about this work.  ... 
arXiv:1307.2982v3 fatcat:ntijza35bja47jpxzlvntcf5py

Fast Exact Search in Hamming Space With Multi-Index Hashing

Mohammad Norouzi, Ali Punjani, David J. Fleet
2014 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space.  ...  The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes.  ...  The authors would also like to thank Mohamed Aly, Rob Fergus, Ryan Johnson, Abbas Mehrabian, and Pietro Perona for useful discussions about this work.  ... 
doi:10.1109/tpami.2013.231 pmid:26353274 fatcat:zpziarqtt5cxjfadmndbyr3uju

Contrasting Sequence Groups by Emerging Sequences [chapter]

Kang Deng, Osmar R. Zaïane
2009 Lecture Notes in Computer Science  
Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our similar ESs-based classification model outperforms  ...  There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered.  ...  For comparison, we design two other models, one based on frequencies, where frequent subsequences in the positive class are considered discriminant, and one identical to our approach but doing exact matches  ... 
doi:10.1007/978-3-642-04747-3_29 fatcat:s2qkguafmjc37pme5x7cyhvbsq

Iterative Dictionary Construction for Compression of Large DNA Data Sets

S. Kuruppu, Bryan Beresford-Smith, T. Conway, J. Zobel
2012 IEEE/ACM Transactions on Computational Biology & Bioinformatics  
COMRAD allows for random access to individual sequences and subsequences without decompressing the whole data set.  ...  However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected.  ...  The authors thank Simon Puglisi for his assistance with this work.  ... 
doi:10.1109/tcbb.2011.82 pmid:21576758 fatcat:dpg74y6gf5gilddqqh256nmgwm

An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets

Qiang Yu, Hongwei Huo, Xiaoyang Chen, Haitao Guo, Jeffrey Scott Vitter, Jun Huan
2015 IEEE Transactions on Nanobioscience  
To cater this need, we propose a new planted motif discovery algorithm named MCES, which identifies motifs by mining and combining emerging substrings.  ...  motif discovery algorithms, such as F-motif and TraverStringsR; ii) MCES is able to identify motifs without known lengths, and has a better identification accuracy than the competing algorithm CisFinder  ...  Then, we repeatedly select a new node (substring) linked to an already selected node (substring) and align to , until all substrings are selected and aligned .  ... 
doi:10.1109/tnb.2015.2421340 pmid:25872217 fatcat:l5confpodbaozeife2ydjxckxu

Fast Discerning Repeats in DNA Sequences with a Compression Algorithm

Éric Rivals, Jean-Paul Delahaye, Max Dauchet, Olivier Delgrange
1997 Genome Informatics Series  
We present a new heuristic algorithm, Search_Repeats, where the selection of exact repeats is guided by two biologically sound criteria: their length and the absence of overlap between those repeats.  ...  Search_Repeats detects approximate repeats, as clusters of exact sub-repeats, and points out large insertions/deletions in them.  ...  Acknowledgments: We thank gratefully Herrn Bornberg, Heber, Muller, Spang and Vingron for commenting the manuscript. E.  ... 
doi:10.11234/gi1990.8.215 fatcat:oo3b543bxfbxhmayer6oteveoy
« Previous Showing results 1 — 15 out of 3,473 results