112 Hits in 1.1 sec

Multiple Sequence Alignment Using Probcons and Probalign [chapter]

Usman Roshan
2013 Msphere  
Sequence alignment remains a fundamental task in bioinformatics. The literature contains programs that employ a host of exact and heuristic strategies available in computer science. Probcons was the first program to construct maximum expected accuracy sequence alignments with hidden Markov models and at the time of its publication achieved the highest accuracies on standard protein multiple alignment benchmarks. Probalign followed this strategy except that it used a partition function approach
more » ... nstead of hidden Markov models. Several programs employing both strategies have been published since then. In this chapter we describe Probcons and Probalign.
doi:10.1007/978-1-62703-646-7_9 pmid:24170400 fatcat:xl6br6tznfcdhcc5exk7wr3sxe

Robust binary classification with the 01 loss [article]

Yunzhe Xue, Meiyan Xie, Usman Roshan
2020 arXiv   pre-print
The 01 loss is robust to outliers and tolerant to noisy data compared to convex loss functions. We conjecture that the 01 loss may also be more robust to adversarial attacks. To study this empirically we have developed a stochastic coordinate descent algorithm for a linear 01 loss classifier and a single hidden layer 01 loss neural network. Due to the absence of the gradient we iteratively update coordinates on random subsets of the data for fixed epochs. We show our algorithms to be fast and
more » ... mparable in accuracy to the linear support vector machine and logistic loss single hidden layer network for binary classification on several image benchmarks, thus establishing that our method is on-par in test accuracy with convex losses. We then subject them to accurately trained substitute model black box attacks on the same image benchmarks and find them to be more robust than convex counterparts. On CIFAR10 binary classification task between classes 0 and 1 with adversarial perturbation of 0.0625 we see that the MLP01 network loses 27\% in accuracy whereas the MLP-logistic counterpart loses 83\%. Similarly on STL10 and ImageNet binary classification between classes 0 and 1 the MLP01 network loses 21\% and 20\% while MLP-logistic loses 67\% and 45\% respectively. On MNIST that is a well-separable dataset we find MLP01 comparable to MLP-logistic and show under simulation how and why our 01 loss solver is less robust there. We then propose adversarial training for our linear 01 loss solver that significantly improves its robustness on MNIST and all other datasets and retains clean test accuracy. Finally we show practical applications of our method to deter traffic sign and facial recognition adversarial attacks. We discuss attacks with 01 loss, substitute model accuracy, and several future avenues like multiclass, 01 loss convolutions, and further adversarial training.
arXiv:2002.03444v1 fatcat:ylvdwpi2kvc5phf4nutnwqvk2y

Towards adversarial robustness with 01 loss neural networks [article]

Yunzhe Xue, Meiyan Xie, Usman Roshan
2020 arXiv   pre-print
Motivated by the general robustness properties of the 01 loss we propose a single hidden layer 01 loss neural network trained with stochastic coordinate descent as a defense against adversarial attacks in machine learning. One measure of a model's robustness is the minimum distortion required to make the input adversarial. This can be approximated with the Boundary Attack (Brendel et. al. 2018) and HopSkipJump (Chen et. al. 2019) methods. We compare the minimum distortion of the 01 loss network
more » ... to the binarized neural network and the standard sigmoid activation network with cross-entropy loss all trained with and without Gaussian noise on the CIFAR10 benchmark binary classification between classes 0 and 1. Both with and without noise training we find our 01 loss network to have the largest adversarial distortion of the three models by non-trivial margins. To further validate these results we subject all models to substitute model black box attacks under different distortion thresholds and find that the 01 loss network is the hardest to attack across all distortions. At a distortion of 0.125 both sigmoid activated cross-entropy loss and binarized networks have almost 0% accuracy on adversarial examples whereas the 01 loss network is at 40%. Even though both 01 loss and the binarized network use sign activations their training algorithms are different which in turn give different solutions for robustness. Finally we compare our network to simple convolutional models under substitute model black box attacks and find their accuracies to be comparable. Our work shows that the 01 loss network has the potential to defend against black box adversarial attacks better than convex loss and binarized networks.
arXiv:2008.09148v1 fatcat:lkvy5tztazco7djd7r2rnwcdgq

Weighted Maximum Variance Dimensionality Reduction [chapter]

Turki Turki, Usman Roshan
2014 Lecture Notes in Computer Science  
We wrote our code in C and R and make it freely available at  ... 
doi:10.1007/978-3-319-07491-7_2 fatcat:we6lqkjjmbajhia76dllojqms4

Sequence-Length Requirements for Phylogenetic Methods [chapter]

Bernard M.E. Moret, Usman Roshan, Tandy Warnow
2002 Lecture Notes in Computer Science  
We study the sequence lengths required by neighbor-joining, greedy parsimony, and a phylogenetic reconstruction method (DCMNJ+MP) based on disk-covering and the maximum parsimony criterion. We use extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity, to collect data on the scaling of sequence-length requirements for each of the three methods as a function of the number of taxa, the rate of evolution on the tree, and the deviation from
more » ... city. Our experiments show that DCMNJ+MP has consistently lower sequence-length requirements than the other two methods when trees of high topological accuracy are desired, although all methods require much longer sequences as the deviation from ultrametricity or the height of the tree grows. Our study has significant implications for large-scale phylogenetic reconstruction (where sequencelength requirements are a crucial factor), but also for future performance analyses in phylogenetics (since deviations from ultrametricity are proving pivotal).
doi:10.1007/3-540-45784-4_26 fatcat:aemoqpgdgjflvet77j7pyyjvnu

Defending against substitute model black box adversarial attacks with the 01 loss [article]

Yunzhe Xue, Meiyan Xie, Usman Roshan
2020 arXiv   pre-print
Substitute model black box attacks can create adversarial examples for a target model just by accessing its output labels. This poses a major challenge to machine learning models in practice, particularly in security sensitive applications. The 01 loss model is known to be more robust to outliers and noise than convex models that are typically used in practice. Motivated by these properties we present 01 loss linear and 01 loss dual layer neural network models as a defense against transfer
more » ... substitute model black box attacks. We compare the accuracy of adversarial examples from substitute model black box attacks targeting our 01 loss models and their convex counterparts for binary classification on popular image benchmarks. Our 01 loss dual layer neural network has an adversarial accuracy of 66.2%, 58%, 60.5%, and 57% on MNIST, CIFAR10, STL10, and ImageNet respectively whereas the sigmoid activated logistic loss counterpart has accuracies of 63.5%, 19.3%, 14.9%, and 27.6%. Except for MNIST the convex counterparts have substantially lower adversarial accuracies. We show practical applications of our models to deter traffic sign and facial recognition adversarial attacks. On GTSRB street sign and CelebA facial detection our 01 loss network has 34.6% and 37.1% adversarial accuracy respectively whereas the convex logistic counterpart has accuracy 24% and 1.9%. Finally we show that our 01 loss network can attain robustness on par with simple convolutional neural networks and much higher than its convex counterpart even when attacked with a convolutional network substitute model. Our work shows that 01 loss models offer a powerful defense against substitute model black box attacks.
arXiv:2009.09803v1 fatcat:ez3mtguokbhhvcllhzf4w63udy

Top-k Parametrized Boost [chapter]

Turki Turki, Muhammad Ihsan, Nouf Turki, Jie Zhang, Usman Roshan, Zhi Wei
2014 Lecture Notes in Computer Science  
Ensemble methods such as AdaBoost are popular machine learning methods that create highly accurate classifier by combining the predictions from several classifiers. We present a parametrized method of AdaBoost that we call Top-k Parametrized Boost. We evaluate our and other popular ensemble methods from a classification perspective on several real datasets. Our empirical study shows that our method gives the minimum average error with statistical significance on the datasets.
doi:10.1007/978-3-319-13817-6_10 fatcat:fx6efvpnonekrpk2bxhvs73pde

Estimating the Deviation from a Molecular Clock [chapter]

Luay Nakhleh, Usman Roshan, Lisa Vawter, Tandy Warnow
2002 Lecture Notes in Computer Science  
We address the problem of estimating the degree to which the evolutionary history of a set of molecular sequences violates the strong molecular clock hypothesis. We quantify this deviation formally, by defining the "stretch" of a model tree, with respect to the underlying ultrametric tree (indicated by time). We then define the "minimum stretch" of a dataset on a tree, and show how this can be computed optimally in polynomial time. We also present a polynomial time algorithm for computing a
more » ... r bound on the stretch of a given dataset on any tree. We then explore the performance of standard techniques in systematics for estimating the deviation of a dataset from a molecular clock. We show that standard methods, whether based upon maximum parsimony or maximum likelihood, can return infeasible values (i.e. values for the stretch which cannot be realized on any model tree), and often under-estimate the true stretch. This suggests that current estimations of the degree to which datasets deviate from a molecular clock may significantly underestimate these deviations. We conclude with some suggestions for further research. § © ¡ § © ¡ time. Simulation Study.
doi:10.1007/3-540-45784-4_22 fatcat:mkj74tyc7bfabbkocnijdisl3m

Performance of Supertree Methods on Various Data Set Decompositions [chapter]

Usman Roshan, Bernard M. E. Moret, Tiffani L. Williams, Tandy Warnow
2004 Computational Biology  
Many large-scale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are severely limited by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this problem is the divide-and-conquer strategy: decompose the dataset into smaller subsets, solve the subsets (i.e., use MP or ML on each subset to obtain trees), then combine the solutions to the subsets
more » ... nto a solution to the original dataset. This last step, combining given trees into a single tree, is known as supertree construction in computational phylogenetics. The traditional application of supertree methods is to combine existing, published phylogenies into a single phylogeny. Here, we study supertree construction in the context of divide-and-conquer methods for large-scale tree reconstruction. We study several divide-and-conquer approaches and experimentally demonstrate their advantage over Matrix Representation Parsimony (MRP), a traditional supertree technique, and over global heuristics such as the parsimony ratchet. On the ten large biological datasets under investigation, our study shows that the techniques used for dividing the dataset into subproblems as well as those used for merging them into a single solution strongly influence the quality of the supertree construction. In most cases, our merging technique-the Strict Consensus Merger (SCM)-outperforms MRP with respect to MP scores and running time. Divideand-conquer techniques are also a highly competitive alternative to global heuristics such as the parsimony ratchet, especially on the more challenging datasets. Supertree methods combine smaller, overlapping subtrees into a larger tree. Their traditional application has been to combine existing, published phylogenies, on which the community agrees, into a tree leaf-labeled by the entire set of species. The most popular supertree method is Matrix Representation Parsimony (MRP) (Baum, 1992; Ragan, 1992) , which has been used in a number of phylogenetic studies (Purvis, 1995;
doi:10.1007/978-1-4020-2330-9_15 fatcat:vg7ayumoczbtjkbs3k3bgxjnvy

Reconstruction of large phylogenetic trees: A parallel approach

Zhihua Du, Feng Lin, Usman W. Roshan
2005 Computational biology and chemistry  
It was previously shown that Rec-I-DCM3 was able to improve upon the unboosted default heuristics of TNT (Roshan ,2004b .  ...  We defined the "best" MP score on each dataset to be the best score found to date ( .html).  ... 
doi:10.1016/j.compbiolchem.2005.06.003 pmid:16040277 fatcat:mi2aej3uu5dpfdv5zjw57mgd5y

The Accuracy of Fast Phylogenetic Methods for Large Datasets

Luay Nakhleh, Bernard M. E. Moret, Usman Roshan, Katherine St. John, Jerry Sun, Tandy Warnow
2001 Biocomputing 2002  
Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data-large numbers of sequences-as well as large phylogenetic distances, so that reconstruction methods must be both fast and
more » ... obust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on diskcovering and parsimony search (DCM-NJ+MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.
doi:10.1142/9789812799623_0020 fatcat:7feoggxspjhy7jthfez2jnpsra

The Performance of Phylogenetic Methods on Trees of Bounded Diameter [chapter]

Luay Nakhleh, Usman Roshan, Katherine St. John, Jerry Sun, Tandy Warnow
2001 Lecture Notes in Computer Science  
doi:10.1007/3-540-44696-6_17 fatcat:vgibqp67uvexdpb66sd2fe3zea

Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

Usman Roshan, Satish Chikkagoudar, Dennis R Livesay
2008 BMC Bioinformatics  
Contact: usman Background The importance of RNA within cellular machinery and regulation is well established (1,2).  ...  The RNA-genome alignment benchmark, training benchmark, false positive datasets, and the modified Probalign program are available at  ...  However, to produce multiple alignments of  ... 
doi:10.1186/1471-2105-9-61 pmid:18226231 pmcid:PMC2248559 fatcat:f76xlgo5fvctti4ovnf2bpdvha

Laparoscopic cholecystectomy in a type Va Mirizzi syndrome patient

Fahad Yasin, Wasim Hayat Khan, Usman Ismat Butt, Muhammad Umar, Roshan Butt, Abid Klasra
Mirizzi syndrome is a rare syndrome, caused by the compression of gall stones which may result in CBD obstruction or fistula formation. It may sometimes present without any prior symptoms. It has been classified into five types by Csendes. Usually open surgical approach is recommended for the condition, especially for Types III-V. We present the case of a patient who presented with right hypochondrial pain and was intra-operatively discovered to have type Va Mirrizi syndrome and was managed
more » ... essfully laparoscopically. Key words: Mirizzi syndrome, type Va, Laparoscopy.
doi:10.47391/jpma.3775 fatcat:jqxw5hfu6ney7ox5qorv6gxuzi

On the transferability of adversarial examples between convex and 01 loss models [article]

Yunzhe Xue, Meiyan Xie, Usman Roshan
2020 arXiv   pre-print
The 01 loss gives different and more accurate boundaries than convex loss models in the presence of outliers. Could the difference of boundaries translate to adversarial examples that are non-transferable between 01 loss and convex models? We explore this empirically in this paper by studying transferability of adversarial examples between linear 01 loss and convex (hinge) loss models, and between dual layer neural networks with sign activation and 01 loss vs sigmoid activation and logistic
more » ... . We first show that white box adversarial examples do not transfer effectively between convex and 01 loss and between 01 loss models compared to between convex models. As a result of this non-transferability we see that convex substitute model black box attacks are less effective on 01 loss than convex models. Interestingly we also see that 01 loss substitute model attacks are ineffective on both convex and 01 loss models mostly likely due to the non-uniqueness of 01 loss models. We show intuitively by example how the presence of outliers can cause different decision boundaries between 01 and convex loss models which in turn produces adversaries that are non-transferable. Indeed we see on MNIST that adversaries transfer between 01 loss and convex models more easily than on CIFAR10 and ImageNet which are likely to contain outliers. We show intuitively by example how the non-continuity of 01 loss makes adversaries non-transferable in a dual layer neural network. We discretize CIFAR10 features to be more like MNIST and find that it does not improve transferability, thus suggesting that different boundaries due to outliers are more likely the cause of non-transferability. As a result of this non-transferability we show that our dual layer sign activation network with 01 loss can attain robustness on par with simple convolutional networks.
arXiv:2006.07800v2 fatcat:gjqy6juvijhn7fqqyp5g7xathq
« Previous Showing results 1 — 15 out of 112 results