A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis
[article]
2018
arXiv
pre-print
We propose a general approach for accurately and efficiently calculating small p-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain ...
Small p-values are often required to be accurately estimated in large scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical ...
Maureen Sartor and Xiaoquan Wen (University of Michigan) for reading and helpful discussions on Section 2, which is part of his doctoral dissertation . ...
arXiv:1803.03373v2
fatcat:hlbtjqfh4ndf5hbk3d2aeytnxq
Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer
2019
Genome Biology
This has motivated the move from co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation ...
of the resulting networks has been hindered by the lack of known context-specific regulatory interactions. ...
All methods were applied to the dataset with the same parameters as those used for simulated data. An adjusted p value threshold of 1 × 10 − 10 was applied to generate the DC network. ...
doi:10.1186/s13059-019-1851-8
pmid:31727119
pmcid:PMC6857226
fatcat:b7r56eln7fct5cg3vv5qckkixu
Information Theory in Living Systems, Methods, Applications, and Challenges
2006
Bulletin of Mathematical Biology
Initial biological applications of information theory (IT) used Shannon's methods to measure the information content in strings of monomers such as genes, RNA, and proteins. ...
Insights into evolution may be gained by analysis of the the fitness contributions from specific segments of genetic information as well as the optimization process in which the fitness are constrained ...
The ability of cells to import energy and export entropy requires, among other things, accurate identification of atomic and molecular structures so that carbon chains can be imported and efficiently metabolized ...
doi:10.1007/s11538-006-9141-5
pmid:17083004
fatcat:rxeaq4kcmjfmlpcefuwexo72de
A reexamination of information theory-based methods for DNA-binding site identification
2009
BMC Bioinformatics
Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial ...
Furthermore, results on skewed genomes show that methods integrating skew information, such as Relative Entropy, are not effective because their assumptions may not hold in real genomes. ...
Acknowledgements The authors wish to thank Andrew Cameron and Rosie Redfield for kindly providing the sequences of CRP sites of H. influenzae. ...
doi:10.1186/1471-2105-10-57
pmid:19210776
pmcid:PMC2680408
fatcat:syrx7l2m6vdijlv5rbfv2cxnii
A Computational Method Including Protein Flexibility to Estimate Affinities with Small Ligands
2014
Biophysical Journal
We describe an efficient method to obtain highly accurate conformational free energies of biopolymers having arbitrary ratios of contour length L to persistence length P. ...
Obtaining accurate values of the conformational free energy of macromolecular systems is one of the most challenging problems in computational chemistry and biology. ...
doi:10.1016/j.bpj.2013.11.2301
fatcat:obvuojxem5h6hnqzp2b7447v2e
Genomic prediction with the additive-dominant model by dimensionality reduction methods
2020
Pesquisa Agropecuária Brasileira
Abstract: The objective of this work was to evaluate the application of different dimensionality reduction methods in the additive-dominant model and to compare them with the genomic best linear unbiased ...
However, none of the methodologies are able to recover true genomic heritabilities and all of them present biased estimates, under- or overestimating the genomic genetic values. ...
The objective of this work was to evaluate the application of different dimensionality reduction methods in the additive-dominant model and to compare them with the G-BLUP method. ...
doi:10.1590/s1678-3921.pab2020.v55.01713
fatcat:33h3vzmau5a7vbaaktqyw3hkxq
A comprehensive survey on computational learning methods for analysis of gene expression data in genomics
[article]
2022
arXiv
pre-print
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. ...
We discuss the types of missing values and the methods and approaches usually employed in their imputation. ...
An accurate estimation of missing values is an essential step for further analysis of microarray gene expression data. ...
arXiv:2202.02958v4
fatcat:uipvs7ribzdondwraf64n5mzf4
A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation
2013
Statistical Science
the observed data with minimal loss of information. ...
We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets. ...
The dimension reduction methods are compared through the analysis of three challenging models and data sets. ...
doi:10.1214/12-sts406
fatcat:5jw7eozqyjdmxk2toiw5jfevgm
Change-Point Detection in Autoregressive Processes via the Cross-Entropy Method
2020
Algorithms
In this paper, we develop a flexible method to estimate the unknown number and the locations of change-points in autoregressive time series. ...
In order to find the optimal value of a performance function, which is based on the Minimum Description Length principle, we develop a Cross-Entropy algorithm for the combinatorial optimization problem ...
In this paper, we apply the Cross-Entropy (CE) method with the MDL principle to identify the number of and locations of change-points. ...
doi:10.3390/a13050128
fatcat:hlj7irtytnchrhsczsc7knomd4
A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values
2021
BMC Bioinformatics
A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype. ...
The identification of gene-gene and gene-environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of ...
Availability of data and materials The research has been conducted using the UK Biobank Resource under Application Number 32285. ...
doi:10.1186/s12859-021-04041-7
pmid:33947323
pmcid:PMC8097909
fatcat:34hxhgfqovenvgdjf5c7r4manm
Fast Inference of Admixture Coefficients Using Sparse Non-negative Matrix Factorization Algorithms
[article]
2013
arXiv
pre-print
We implemented our method in the computer program sNMF, and applied it to human and plant genomic data sets. ...
With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. ...
With the use of dense genomic data and increased sample sizes, reducing the time lag necessary to perform estimation is a major challenge in population genetic data analysis. ...
arXiv:1309.6208v1
fatcat:utimsf7pb5hp3klsohvh2nghte
Reverse Engineering Cellular Networks with Information Theoretic Methods
2013
Cells
nonlinear relations or feedback loops, and computational burden of dealing with large data sets. ...
A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. ...
project "BioREDES" (PIE-201170E018), and the National Science Foundation grant CHE 0847073. ...
doi:10.3390/cells2020306
pmid:24709703
pmcid:PMC3972682
fatcat:f2hu6lcbgjfcxcubvxnaculjoq
Fast and Efficient Estimation of Individual Ancestry Coefficients
2014
Genetics
We implemented our method in the computer program sNMF, and applied it to human and plant data sets. ...
With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. ...
With the use of dense genomic data and increased sample sizes, reducing the time lag necessary to perform estimation is a major challenge of population genetic data analysis. ...
doi:10.1534/genetics.113.160572
pmid:24496008
pmcid:PMC3982712
fatcat:kjxm5g6eenflrf43su3usf2iri
Supervised learning with decision tree-based methods in computational and systems biology
2009
Molecular Biosystems
During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications ...
Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of ...
This paper presents research results of the Belgian Network BIOMAGNET (Bioinformatics and Modeling: from Genomes to Networks), funded by the Interuniversity Attraction Poles Programme, initiated by the ...
doi:10.1039/b907946g
pmid:20023720
fatcat:25bpsowcznco5f6xs2cn73ke4u
Efficient n-gram analysis in R with cmscu
2016
Behavior Research Methods
We end by highlighting the important use of new efficient tools to explore behavioral phenomena in large, relatively noisy data sets. ...
We present a new R package, cmscu, which implements a Count-Min-Sketch with conservative updating (Cormode and Muthukrishnan Journal of Algorithms, 55(1), 58-75, 2005), and its application to n-gram analyses ...
accuracy of specific Information-Theoretic models on estimating unseen data that vary in the length of n or the complexity of the algorithm can be determined by measuring its cross-entropy or more specifically ...
doi:10.3758/s13428-016-0766-5
pmid:27496173
fatcat:7skgmebau5gyxl2baponubpq2i
« Previous
Showing results 1 — 15 out of 4,818 results