Filters








4,818 Hits in 9.6 sec

Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis [article]

Yang Shi, Mengqiao Wang, Weiping Shi, Ji-Hyun Lee, Huining Kang and Hui Jiang
2018 arXiv   pre-print
We propose a general approach for accurately and efficiently calculating small p-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain  ...  Small p-values are often required to be accurately estimated in large scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical  ...  Maureen Sartor and Xiaoquan Wen (University of Michigan) for reading and helpful discussions on Section 2, which is part of his doctoral dissertation .  ... 
arXiv:1803.03373v2 fatcat:hlbtjqfh4ndf5hbk3d2aeytnxq

Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer

Dharmesh D. Bhuva, Joseph Cursons, Gordon K. Smyth, Melissa J. Davis
2019 Genome Biology  
This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation  ...  of the resulting networks has been hindered by the lack of known context-specific regulatory interactions.  ...  All methods were applied to the dataset with the same parameters as those used for simulated data. An adjusted p value threshold of 1 × 10 − 10 was applied to generate the DC network.  ... 
doi:10.1186/s13059-019-1851-8 pmid:31727119 pmcid:PMC6857226 fatcat:b7r56eln7fct5cg3vv5qckkixu

Information Theory in Living Systems, Methods, Applications, and Challenges

Robert A. Gatenby, B. Roy Frieden
2006 Bulletin of Mathematical Biology  
Initial biological applications of information theory (IT) used Shannon's methods to measure the information content in strings of monomers such as genes, RNA, and proteins.  ...  Insights into evolution may be gained by analysis of the the fitness contributions from specific segments of genetic information as well as the optimization process in which the fitness are constrained  ...  The ability of cells to import energy and export entropy requires, among other things, accurate identification of atomic and molecular structures so that carbon chains can be imported and efficiently metabolized  ... 
doi:10.1007/s11538-006-9141-5 pmid:17083004 fatcat:rxeaq4kcmjfmlpcefuwexo72de

A reexamination of information theory-based methods for DNA-binding site identification

Ivan Erill, Michael C O'Neill
2009 BMC Bioinformatics  
Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial  ...  Furthermore, results on skewed genomes show that methods integrating skew information, such as Relative Entropy, are not effective because their assumptions may not hold in real genomes.  ...  Acknowledgements The authors wish to thank Andrew Cameron and Rosie Redfield for kindly providing the sequences of CRP sites of H. influenzae.  ... 
doi:10.1186/1471-2105-10-57 pmid:19210776 pmcid:PMC2680408 fatcat:syrx7l2m6vdijlv5rbfv2cxnii

A Computational Method Including Protein Flexibility to Estimate Affinities with Small Ligands

Ariane Nunes-Alves, Guilherme M. Arantes
2014 Biophysical Journal  
We describe an efficient method to obtain highly accurate conformational free energies of biopolymers having arbitrary ratios of contour length L to persistence length P.  ...  Obtaining accurate values of the conformational free energy of macromolecular systems is one of the most challenging problems in computational chemistry and biology.  ... 
doi:10.1016/j.bpj.2013.11.2301 fatcat:obvuojxem5h6hnqzp2b7447v2e

Genomic prediction with the additive-dominant model by dimensionality reduction methods

Jaquicele Aparecida da Costa, Camila Ferreira Azevedo, Moysés Nascimento, Fabyano Fonseca e Silva, Marcos Deon Vilela de Resende, Ana Carolina Campana Nascimento
2020 Pesquisa Agropecuária Brasileira  
Abstract: The objective of this work was to evaluate the application of different dimensionality reduction methods in the additive-dominant model and to compare them with the genomic best linear unbiased  ...  However, none of the methodologies are able to recover true genomic heritabilities and all of them present biased estimates, under- or overestimating the genomic genetic values.  ...  The objective of this work was to evaluate the application of different dimensionality reduction methods in the additive-dominant model and to compare them with the G-BLUP method.  ... 
doi:10.1590/s1678-3921.pab2020.v55.01713 fatcat:33h3vzmau5a7vbaaktqyw3hkxq

A comprehensive survey on computational learning methods for analysis of gene expression data in genomics [article]

Nikita Bhandari, Rahee Walambe, Ketan Kotecha, Satyajeet Khare
2022 arXiv   pre-print
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine.  ...  We discuss the types of missing values and the methods and approaches usually employed in their imputation.  ...  An accurate estimation of missing values is an essential step for further analysis of microarray gene expression data.  ... 
arXiv:2202.02958v4 fatcat:uipvs7ribzdondwraf64n5mzf4

A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation

M. G. B. Blum, M. A. Nunes, D. Prangle, S. A. Sisson
2013 Statistical Science  
the observed data with minimal loss of information.  ...  We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.  ...  The dimension reduction methods are compared through the analysis of three challenging models and data sets.  ... 
doi:10.1214/12-sts406 fatcat:5jw7eozqyjdmxk2toiw5jfevgm

Change-Point Detection in Autoregressive Processes via the Cross-Entropy Method

Lijing Ma, Georgy Sofronov
2020 Algorithms  
In this paper, we develop a flexible method to estimate the unknown number and the locations of change-points in autoregressive time series.  ...  In order to find the optimal value of a performance function, which is based on the Minimum Description Length principle, we develop a Cross-Entropy algorithm for the combinatorial optimization problem  ...  In this paper, we apply the Cross-Entropy (CE) method with the MDL principle to identify the number of and locations of change-points.  ... 
doi:10.3390/a13050128 fatcat:hlj7irtytnchrhsczsc7knomd4

A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values

Pål V Johnsen, Signe Riemer-Sørensen, Andrew Thomas DeWan, Megan E Cahill, Mette Langaas
2021 BMC Bioinformatics  
A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype.  ...  The identification of gene-gene and gene-environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of  ...  Availability of data and materials The research has been conducted using the UK Biobank Resource under Application Number 32285.  ... 
doi:10.1186/s12859-021-04041-7 pmid:33947323 pmcid:PMC8097909 fatcat:34hxhgfqovenvgdjf5c7r4manm

Fast Inference of Admixture Coefficients Using Sparse Non-negative Matrix Factorization Algorithms [article]

Eric Frichot, François Mathieu, Théo Trouillon, Guillaume Bouchard, Olivier François
2013 arXiv   pre-print
We implemented our method in the computer program sNMF, and applied it to human and plant genomic data sets.  ...  With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention.  ...  With the use of dense genomic data and increased sample sizes, reducing the time lag necessary to perform estimation is a major challenge in population genetic data analysis.  ... 
arXiv:1309.6208v1 fatcat:utimsf7pb5hp3klsohvh2nghte

Reverse Engineering Cellular Networks with Information Theoretic Methods

Alejandro Villaverde, John Ross, Julio Banga
2013 Cells  
nonlinear relations or feedback loops, and computational burden of dealing with large data sets.  ...  A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas.  ...  project "BioREDES" (PIE-201170E018), and the National Science Foundation grant CHE 0847073.  ... 
doi:10.3390/cells2020306 pmid:24709703 pmcid:PMC3972682 fatcat:f2hu6lcbgjfcxcubvxnaculjoq

Fast and Efficient Estimation of Individual Ancestry Coefficients

Eric Frichot, François Mathieu, Théo Trouillon, Guillaume Bouchard, Olivier François
2014 Genetics  
We implemented our method in the computer program sNMF, and applied it to human and plant data sets.  ...  With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention.  ...  With the use of dense genomic data and increased sample sizes, reducing the time lag necessary to perform estimation is a major challenge of population genetic data analysis.  ... 
doi:10.1534/genetics.113.160572 pmid:24496008 pmcid:PMC3982712 fatcat:kjxm5g6eenflrf43su3usf2iri

Supervised learning with decision tree-based methods in computational and systems biology

Pierre Geurts, Alexandre Irrthum, Louis Wehenkel
2009 Molecular Biosystems  
During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications  ...  Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of  ...  This paper presents research results of the Belgian Network BIOMAGNET (Bioinformatics and Modeling: from Genomes to Networks), funded by the Interuniversity Attraction Poles Programme, initiated by the  ... 
doi:10.1039/b907946g pmid:20023720 fatcat:25bpsowcznco5f6xs2cn73ke4u

Efficient n-gram analysis in R with cmscu

David W. Vinson, Jason K. Davis, Suzanne S. Sindi, Rick Dale
2016 Behavior Research Methods  
We end by highlighting the important use of new efficient tools to explore behavioral phenomena in large, relatively noisy data sets.  ...  We present a new R package, cmscu, which implements a Count-Min-Sketch with conservative updating (Cormode and Muthukrishnan Journal of Algorithms, 55(1), 58-75, 2005), and its application to n-gram analyses  ...  accuracy of specific Information-Theoretic models on estimating unseen data that vary in the length of n or the complexity of the algorithm can be determined by measuring its cross-entropy or more specifically  ... 
doi:10.3758/s13428-016-0766-5 pmid:27496173 fatcat:7skgmebau5gyxl2baponubpq2i
« Previous Showing results 1 — 15 out of 4,818 results