A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
On Exact and Approximation Algorithms for Distinguishing Substring Selection
[chapter]
2003
Lecture Notes in Computer Science
The NP-complete Distinguishing Substring Selection problem (DSSS for short) asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to Hamming metric ...
By way of contrast, for a special case of DSSS, we present an exact fixedparameter algorithm solving the problem efficiently. ...
[14] initiated the research on the algorithmic complexity of distinguishing string selection problems. ...
doi:10.1007/978-3-540-45077-1_19
fatcat:iuu2ckc23vfdvatj4tis5o4i64
Highly Scalable Algorithms for Robust String Barcoding
[article]
2005
arXiv
pre-print
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further ...
Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds ...
Acknowledgments The authors would like to thank Claudia Prajescu for her help with the implementation of the multi-step rounding algorithm in [2] . ...
arXiv:cs/0502065v1
fatcat:52q5euki5zcf7an5domlp4grme
Highly scalable algorithms for robust string barcoding
2005
International Journal of Bioinformatics Research and Applications
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further ...
Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds ...
To see the effect of length and edit distance requirements on the number of distinguishers, for each redundancy requirement we computed both an unconstrained solution, and a solution in which distinguishers ...
doi:10.1504/ijbra.2005.007574
pmid:18048126
fatcat:yfwxwidwlnayxf23xxydoognwm
Highly Scalable Algorithms for Robust String Barcoding
[chapter]
2005
Lecture Notes in Computer Science
Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further ...
Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds ...
To see the effect of length and edit distance requirements on the number of distinguishers, for each redundancy requirement we computed both an unconstrained solution, and a solution in which distinguishers ...
doi:10.1007/11428848_129
fatcat:a7gcpo3nfbd4jeeevwromvh5qi
Parameterized Intractability of Distinguishing Substring Selection
2004
Theory of Computing Systems
In this way, we * An extended abstract of this paper appeared under the title "On exact and approximation algorithms for also exhibit a sharp border between fixed-parameter tractability and intractability ...
This question is formalized as the NP-complete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of "good" strings and a set of "bad" strings, for a solution string which ...
(b)
Table 1 : 1 Overview on results concerning the approximation and the parameter-
ized complexity of Closest (Sub)String and Distinguishing Substring
Selection for alphabets of constant size. ...
doi:10.1007/s00224-004-1185-z
fatcat:n2jdzmlmarastelx4frzhr4s3y
Distinguishing string selection problems
2003
Information and Computation
There is a polynomial-time 4 3 + -approximation algorithm for the Closest String Problem for any small constant > 0. ...
Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. ...
Acknowledgments We would thank Forbes Burkowski, Tao Jiang, Paul Kearney, Ian Munro and Lusheng Wang for discussions on this research and especially Todd Wareham for helpful comments and providing us with ...
doi:10.1016/s0890-5401(03)00057-9
fatcat:a4pf3tvpkrestalzhto7u2o2oe
Complexity of Approximating Closest Substring Problems
[chapter]
2003
Lecture Notes in Computer Science
For this last problem of length maximization, the approximation bound of 2 is proved to be tight by presenting a 2-approximation algorithm. ...
on their approximability proved. ...
In [8] , Sagot presents an exponential exact algorithm for the decision problem version of closest substring, also known as common approximate substring. ...
doi:10.1007/978-3-540-45077-1_20
fatcat:zztwrvffgvhuflwxeb33qaipwq
Bounded-Length Smith-Waterman Alignment
2019
Workshop on Algorithms in Bioinformatics
Our algorithms rely on the techniques of fast window-substring alignment and implicit unit-Monge matrix searching, developed previously by the author and others. ...
They proposed a dynamic programming algorithm solving the problem in time O(mn 2 ), and also an approximation algorithm running in time O(rmn), where r is a parameter controlling the accuracy of approximation ...
In fact, our exact algorithm is asymptotically faster not only than the exact algorithm of [1] , but also than their approximation algorithm. ...
doi:10.4230/lipics.wabi.2019.16
dblp:conf/wabi/Tiskin19
fatcat:kguhdyuwcjcenlbptgr4wa43fy
Entity Extraction with Knowledge from Web Scale Corpora
[article]
2019
arXiv
pre-print
A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. ...
Experiments show that our techniques bring a notable improvement on efficiency and effectiveness. ...
A common approach for approximate entity extraction is by comparing a substring against an entity. ...
arXiv:1911.09373v1
fatcat:jl4yd2iknvatjkgo3ehhrpbadi
Fast Exact Search in Hamming Space with Multi-Index Hashing
[article]
2014
arXiv
pre-print
We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. ...
The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. ...
The authors would also like to thank Mohamed Aly, Rob Fergus, Ryan Johnson, Abbas Mehrabian, and Pietro Perona for useful discussions about this work. ...
arXiv:1307.2982v3
fatcat:ntijza35bja47jpxzlvntcf5py
Fast Exact Search in Hamming Space With Multi-Index Hashing
2014
IEEE Transactions on Pattern Analysis and Machine Intelligence
We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. ...
The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. ...
The authors would also like to thank Mohamed Aly, Rob Fergus, Ryan Johnson, Abbas Mehrabian, and Pietro Perona for useful discussions about this work. ...
doi:10.1109/tpami.2013.231
pmid:26353274
fatcat:zpziarqtt5cxjfadmndbyr3uju
Contrasting Sequence Groups by Emerging Sequences
[chapter]
2009
Lecture Notes in Computer Science
Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our similar ESs-based classification model outperforms ...
There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. ...
For comparison, we design two other models, one based on frequencies, where frequent subsequences in the positive class are considered discriminant, and one identical to our approach but doing exact matches ...
doi:10.1007/978-3-642-04747-3_29
fatcat:s2qkguafmjc37pme5x7cyhvbsq
Iterative Dictionary Construction for Compression of Large DNA Data Sets
2012
IEEE/ACM Transactions on Computational Biology & Bioinformatics
COMRAD allows for random access to individual sequences and subsequences without decompressing the whole data set. ...
However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. ...
The authors thank Simon Puglisi for his assistance with this work. ...
doi:10.1109/tcbb.2011.82
pmid:21576758
fatcat:dpg74y6gf5gilddqqh256nmgwm
An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets
2015
IEEE Transactions on Nanobioscience
To cater this need, we propose a new planted motif discovery algorithm named MCES, which identifies motifs by mining and combining emerging substrings. ...
motif discovery algorithms, such as F-motif and TraverStringsR; ii) MCES is able to identify motifs without known lengths, and has a better identification accuracy than the competing algorithm CisFinder ...
Then, we repeatedly select a new node (substring) linked to an already selected node (substring) and align to , until all substrings are selected and aligned . ...
doi:10.1109/tnb.2015.2421340
pmid:25872217
fatcat:l5confpodbaozeife2ydjxckxu
Fast Discerning Repeats in DNA Sequences with a Compression Algorithm
1997
Genome Informatics Series
We present a new heuristic algorithm, Search_Repeats, where the selection of exact repeats is guided by two biologically sound criteria: their length and the absence of overlap between those repeats. ...
Search_Repeats detects approximate repeats, as clusters of exact sub-repeats, and points out large insertions/deletions in them. ...
Acknowledgments: We thank gratefully Herrn Bornberg, Heber, Muller, Spang and Vingron for commenting the manuscript. E. ...
doi:10.11234/gi1990.8.215
fatcat:oo3b543bxfbxhmayer6oteveoy
« Previous
Showing results 1 — 15 out of 3,473 results