Filters








12,223 Hits in 3.0 sec

The Complexity of the Dirichlet Model for Multiple Alignment Data

Yi-Kuo Yu, Stephen F. Altschul
2011 Journal of Computational Biology  
We here derive, in the limit of large n and c, a closed-form expression for the complexity of the Dirichlet model applied to such data.  ...  Although our results are confined to the Dirichlet model, they may cast light as well on the complexity of Dirichlet mixture models, which have been applied fruitfully to the study of protein multiple  ...  John Spouge and Xugang Ye for helpful conversations. This work was supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health.  ... 
doi:10.1089/cmb.2011.0039 pmid:21702692 pmcid:PMC3145953 fatcat:5wod5hfizfgqpj4ltfcdkofaiu

On the Inference of Dirichlet Mixture Priors for Protein Sequence Comparison

Xugang Ye, Yi-Kuo Yu, Stephen F. Altschul
2011 Journal of Computational Biology  
To apply the Minimum Description Length principle to the first question, we extend an analytic formula for the complexity of a Dirichlet model to Dirichlet mixtures by informal argument.  ...  Dirichlet mixtures provide an elegant formalism for constructing and evaluating protein multiple sequence alignments.  ...  ACKNOWLEDGMENTS This work was supported by the Intramural Research Program of the National Library of Medicine at the National Institutes of Health.  ... 
doi:10.1089/cmb.2011.0040 pmid:21702690 pmcid:PMC3145951 fatcat:wnbe4derarbvdo333g762bs3ta

Using Dirichlet mixture priors to derive hidden Markov models for protein families

M Brown, R Hughey, A Krogh, I S Mian, K Sjölander, D Haussler
1993 Proceedings. International Conference on Intelligent Systems for Molecular Biology  
A Bayesian method for estimating the amino acid distributions in the states of a hidden Markov model (HMM) for a protein family or the columns of a multiple alignment of that family is introduced.  ...  This method uses Dirichlet mixture densities as priors over amino acid distributions. These mixture densities are determined from examination of previously constructed HMMs or multiple alignments.  ...  Acknowledgments We wish to thank James Bowie and his colleagues for providing us with an additional 18 distributions spanning a large variety of secondary structures.  ... 
pmid:7584370 fatcat:6gdt6fwlwfa5liv3b5l7y3tpmq

Topic Model Stability for Hierarchical Summarization

John Miller, Kathleen McCoy
2017 Proceedings of the Workshop on New Frontiers in Summarization  
To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a representative centroid model, and reporting  ...  stability of the centroid model.  ...  Test LL(x) shows the predictability of words on test data given the model fit to training data (corpus topics and compositions).  ... 
doi:10.18653/v1/w17-4509 dblp:conf/emnlp/MillerM17 fatcat:z4m7tkjzknavdgljx3urcylxca

A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart J. Russell
2002 Neural Information Processing Systems  
Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component  ...  We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way.  ...  We also propose a framework for decomposing the general motif model into a local alignment model for motif pattern and a global model for motif instance distribution, which allows complex models to be  ... 
dblp:conf/nips/XingJKR02 fatcat:u66zv4zvwbanleljeihts3gf74

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology

Kimmen Sjölander, Kevin Karplus, Michael Brown, Richard Hughey, Anders Krogh, I.Saira Mian, David Haussler
1996 Bioinformatics  
We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions.  ...  This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures  ...  Richard Lathrop made numerous suggestions that improved the quality of the manuscript greatly, as did the anonymous referees.  ... 
doi:10.1093/bioinformatics/12.4.327 fatcat:2wwura4ut5az3j6paswxuohjte

The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

Stephen F. Altschul, John C. Wootton, Elena Zaslavsky, Yi-Kuo Yu, Adam Siepel
2010 PLoS Computational Biology  
BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and  ...  Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form.  ...  Acknowledgments The authors thank Dr. Richa Agarwala for assistance in the benchmarking of Program 1 and other multiple alignment programs. Author Contributions  ... 
doi:10.1371/journal.pcbi.1000852 pmid:20657661 pmcid:PMC2904766 fatcat:j5axcknmanajhhglybtiun45hi

Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space

Viet-An Nguyen, Jordan Boyd-Graber, Stephen F. Altschul
2013 Journal of Computational Biology  
The resulting Dirichlet mixtures model multiple alignment data substantially better than do previously derived ones.  ...  The Dirichlet process is used to model probability distributions that are mixtures of an unknown number of components.  ...  One such model is the Dirichlet process (DP), which we apply to multiple alignment data.  ... 
doi:10.1089/cmb.2012.0244 pmid:23294268 pmcid:PMC3541698 fatcat:hyrcqj7r4nhpvms4e2245awpem

LOGOS: A MODULAR BAYESIAN MODEL FOR DE NOVO MOTIF DETECTION

ERIC P. XING, WEI WU, MICHAEL I. JORDAN, RICHARD M. KARP
2004 Journal of Bioinformatics and Computational Biology  
a Not to be confused with model-based motif scan, the task of searching known motifs based on given position weight matrices, as addressed by Frith et al. 10 and Huang et al. 15 c Heuristics are generally  ...  employed -such as throwing away overlapping sampled motifs (in the Gibbs sampler) or rescaling the joint posterior of x (in MEME) -to enforce the non-overlapping constraint.  ...  Michael Eisen for helpful discussions on motif structures, and two anonymous reviewers for careful examination of the manuscript and for many valuable comments and suggestions.  ... 
doi:10.1142/s0219720004000508 pmid:15272436 fatcat:3w6isq5oqnaxfdagygptsw7iia

PhyloBayes MPI: Phylogenetic Reconstruction with Infinite Mixtures of Profiles in a Parallel Environment

Nicolas Lartillot, Nicolas Rodrigue, Daniel Stubbs, Jacques Richer
2013 Systematic Biology  
for the Dirichlet process prior.  ...  The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models.  ...  ACKNOWLEDGMENTS We wish to thank Éric Fournier, Bastien Boussau, Andrew Roger, Matthew Brown and Hervé Philippe for their extensive testing of the code, as well as Jeremy Brown and Leonardo Martins for  ... 
doi:10.1093/sysbio/syt022 pmid:23564032 fatcat:uminwpp6inhvvob6aiqegjy4g4

Bayesian Analysis of Partitioned Data [article]

Brian R. Moore, Jim McGuire, Fredrik Ronquist, John P. Huelsenbeck
2014 arXiv   pre-print
is estimated by integrating over all possible process partitions for the specified data subsets.  ...  Variation in the evolutionary process across the sites of nucleotide sequence alignments is well established, and is an increasingly pervasive feature of datasets composed of gene regions sampled from  ...  We also wish to thank Nicolas Rodrigue for offering an exceptionally helpful review, and for encouraging us to discuss the relationship between the DPP approach and finite mixture models.  ... 
arXiv:1409.0906v1 fatcat:nhfqs5dvlre6dlf6y2zggpazma

Cytosine Variant Calling with High-throughput Nanopore Sequencing [article]

Arthur C Rand, Miten Jain, Jordan Eizenga, Audrey Musselman-Brown, Hugh E Olsen, Mark Akeson, Benedict Paten
2016 bioRxiv   pre-print
The Oxford Nanopore MinION is a portable single-molecule DNA sequencer that can sequence long fragments of genomic DNA.  ...  We present a probabilistic method that enables expansion of the nucleotide alphabet to include bases containing chemical modifications.  ...  All the data was base-called using Metrichor (versions 1.15.0 and 1.19.0) . For this manuscript, we restricted all downstream analysis to pass 2D reads.  ... 
doi:10.1101/047134 fatcat:adfuoag4hzdwzdn2zosspklxwy

Bayesian Restoration of a Hidden Markov Chain with Applications to DNA Sequencing

GARY A. CHURCHILL, BETTY LAZAREVA
1999 Journal of Computational Biology  
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data.  ...  The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail.  ...  Variations on this model can easily be developed to allow for multiple occurences (or absence) of the pattern in some of the sequences. A Gibbs sampling algorithm for the model of Lawrence et al.  ... 
doi:10.1089/cmb.1999.6.261 pmid:10421527 fatcat:75ihgbkthzddppuvb5pt3l7g3y

Efficient Word Alignment with Markov Chain Monte Carlo

Robert Östling, Jörg Tiedemann
2016 Prague Bulletin of Mathematical Linguistics  
Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency  ...  More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others  ...  Acknowledgments Computational resources for this project were provided by CSC, the Finnish IT Center for Science. 4  ... 
doi:10.1515/pralin-2016-0013 fatcat:ouqsa6gysfbzxbeo342uksyicq

LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system

Renaud Vanhoutreve, Arnaud Kress, Baptiste Legrand, Hélène Gass, Olivier Poch, Julie D. Thompson
2016 BMC Bioinformatics  
A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference.  ...  Conclusions: LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies.  ...  Acknowledgements We would like to thank the members of the BISTRO and BICS Bioinformatics Platforms in Strasbourg for their support.  ... 
doi:10.1186/s12859-016-1146-y pmid:27387560 pmcid:PMC4936259 fatcat:d7rlg76umnhbhbzzx75rk37pum
« Previous Showing results 1 — 15 out of 12,223 results