129,504 Hits in 3.4 sec

Identifying Statistical Bias in Dataset Replication [article]

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry
2020 arXiv   pre-print
In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations.  ...  We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication.  ...  Acknowledgements We thank Will Fithian for discussions and advice, particularly around the spline modeling done in Section 5.  ... 
arXiv:2005.09619v2 fatcat:tuq3yuqy3vez5knc5k74nc3uii

A hierarchical statistical modeling approach to analyze proteomic isobaric tag for relative and absolute quantitation data

Cong Zhou, Michael J. Walker, Andrew J. K. Williamson, Andrew Pierce, Carlo Berzuini, Caroline Dive, Anthony D. Whetton
2013 Computer applications in the biosciences : CABIOS  
Further validated using a biological iTRAQ dataset including multiple biological replicates from varied murine cell lines, WHATraq performed consistently and identified 375% more proteins as being differentially  ...  Technical variation inherent in this iTRAQ dataset was systematically investigated.  ...  Statistical modeling of variation present in iTRAQ datasets In general terms, the purpose of modeling an iTRAQ dataset was to estimate the technical variation inherent in the dataset so that the quality  ... 
doi:10.1093/bioinformatics/btt722 pmid:24344193 fatcat:4fwyaozczjauldk6lyzghbmpcq

Navigating sample overlap, winner's curse and weak instrument bias in Mendelian randomization studies using the UK Biobank [article]

Ildar I Sadreev, Benjamin L Elsworth, Ruth E Mitchell, Lavinia Paternoster, Eleanor Sanderson, Neil M Davies, Louise AC Millard, George Davey Smith, Philip C Haycock, Jack Bowden, Tom R Gaunt, Gibran Hemani
2021 medRxiv   pre-print
We designed a process of pseudo-replication within the UK Biobank data to generate GWAS estimates that minimise bias in MR studies using these data.  ...  AbstractWe performed GWAS on 2514 complex traits from the UK Biobank using a linear mixed model, identifying 40,620 independent significant associations (p<5×10−8).  ...  JB is supported by an Expanding Excellence in England (E3) research grant awarded to the University of Exeter.  ... 
doi:10.1101/2021.06.28.21259622 fatcat:ivo3hbeh45ac3f4ropfqoxum64

Fully synthetic neuroimaging data for replication and exploration

Kenneth I. Vaden, Mulugeta Gebregziabher, Dyslexia Data Consortium, Mark A. Eckert
2020 NeuroImage  
The magnitude and spatial distribution of gray matter effects in the observed imaging data were replicated in the pooled results from the synthetic datasets.  ...  The synthetic predictor tables demonstrated pooled variance and statistical estimates that closely approximated the observed data, as reflected in measures of efficiency and statistical bias.  ...  in a facility constructed with support from Research Facilities Improvement Program (C06 RR 014516) from the NIH / National Center for Research Resources.  ... 
doi:10.1016/j.neuroimage.2020.117284 pmid:32828925 pmcid:PMC7688496 fatcat:pjfjbfrmu5burlzvcliijkr7w4

Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

Sébastien Lemieux
2006 BMC Bioinformatics  
This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition.  ...  Conclusion: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition.  ...  Halfon for providing the Choe et al. dataset. This work was supported by a NSERC discovery grant to SL and the Institute for Research in Immunology and Cancer at the Université de Montréal.  ... 
doi:10.1186/1471-2105-7-391 fatcat:z4uc2i4wlbgobmvsvjqhruqxci

qSVA framework for RNA quality correction in differential expression analysis

Andrew E. Jaffe, Ran Tao, Alexis L. Norris, Marc Kealhofer, Abhinav Nellore, Joo Heon Shin, Dewey Kim, Yankai Jia, Thomas M. Hyde, Joel E. Kleinman, Richard E. Straub, Jeffrey T. Leek (+1 others)
2017 Proceedings of the National Academy of Sciences of the United States of America  
In this article, we describe an analysis framework for identifying and removing previously uncharacterized quality biases in measurements of RNA.  ...  We show that this approach results in greatly improved replication rates (>3×) across two large independent postmortem human brain studies of schizophrenia and also removes potential RNA quality biases  ...  Data deposition: The sequences reported in this paper have been deposited with the National Center for Biotechnology Information (NCBI BioProject number PRJNA389171 and NCBI SRA project SRP108559).  ... 
doi:10.1073/pnas.1617384114 pmid:28634288 pmcid:PMC5502589 fatcat:fvh5xzyxvrf6xlg3kux3ojte2a

Consistent Reanalysis of Genome-wide Imprinting Studies in Plants Using Generalized Linear Models Increases Concordance across Datasets [article]

Stefan Wyder, Michael T Raissig, Ueli Grossniklaus
2017 bioRxiv   pre-print
We show that our statistical pipeline outperforms other methods in identifying imprinted genes in simulated and real data.  ...  Here, we adopt a statistical approach, frequently used in RNA-seq data analysis, which properly models count overdispersion and considers replicate information of reciprocal crosses.  ...  There are several ways to prioritize a gene list after statistically calling genes with an allelic bias in a given tissue.  ... 
doi:10.1101/180745 fatcat:e22iszt6rnhippued6fg6z33ee

Internal replication of computational workflows in scientific research

Jade Benjamin-Chung, John M. Colford, Jr., Andrew Mertens, Alan E. Hubbard, Benjamin F. Arnold
2020 Gates Open Research  
External replication of published findings by outside investigators has emerged as a method to detect errors and bias in the published literature.  ...  Here we summarize the rationale and best practices for internal replication, a process in which multiple independent data analysts replicate an analysis and correct errors prior to publication.  ...  in each dataset and same values in each column).  ... 
doi:10.12688/gatesopenres.13108.1 pmid:32803129 pmcid:PMC7403855 fatcat:jzi2s27tgvhovanpf7uq6clpta

RECAP reveals the true statistical significance of ChIP-seq peak calls

Justin G. Chitpin, Aseel Awdeh, Theodore J Perkins, Inanc Birol
2019 Bioinformatics  
A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified.  ...  classical statistical hypothesis testing.  ...  This work was supported in part by Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). Conflict of Interest: none declared.  ... 
doi:10.1093/bioinformatics/btz150 pmid:30824903 pmcid:PMC6761936 fatcat:okx6zzvpm5ggblvap7d7p2mnjm

Bayesian integrated modeling of expression data: a case study on RhoG

Rashi Gupta, Dario Greco, Petri Auvinen, Elja Arjas
2010 BMC Bioinformatics  
The model was applied and tested on two different datasets.  ...  , which properly accounts for various sources of potential error in the process.  ...  Assessing dye bias Dataset-2 was used to assess the dye-biasness (β i ) as it has three replicates of which one has dye orientation reversed.  ... 
doi:10.1186/1471-2105-11-295 pmid:20515463 pmcid:PMC2894040 fatcat:3sjoywa42vexfjpvr2ca4hhuku

Internal replication of computational workflows in scientific research

Jade Benjamin-Chung, John M. Colford, Jr., Andrew Mertens, Alan E. Hubbard, Benjamin F. Arnold
2020 Gates Open Research  
External replication of published findings by outside investigators has emerged as a method to detect errors and bias in the published literature.  ...  Here we summarize the rationale and best practices for internal replication, a process in which multiple independent data analysts replicate an analysis and correct errors prior to publication.  ...  Thus, we believe that internal replication is more likely to identify failures to replicate results due to coding errors and biases than pair programming." Details in order of appearance: 1.  ... 
doi:10.12688/gatesopenres.13108.2 fatcat:4mn3nl2fcvbc7ncyewwrmnsu2a

Confronting false discoveries in single-cell differential expression [article]

Jordan W. Squair, Matthieu Gautier, Claudi Kathe, Mark A. Anderson, Nicholas D. James, Thomas H. Hutson, Rémi Hudelle, Taha Qaiser, Kaya J.E. Matson, Quentin Barraud, Ariel J. Levine, Gioele La Manno (+2 others)
2021 bioRxiv   pre-print
While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear.  ...  Methods that ignore this inevitable variation are biased and prone to false discoveries.  ...  To address this concern comprehensively, we identified a total of fourteen datasets that included at least six replicates in the control condition.  ... 
doi:10.1101/2021.03.12.435024 fatcat:uu5kny5p6zes3i53uakxyqpjwy

Comprehensive Field Synopsis and Systematic Meta-analyses of Genetic Association Studies in Cutaneous Melanoma

F. Chatzinasiou, C. M. Lill, K. Kypreou, I. Stefanaki, V. Nicolaou, G. Spyrou, E. Evangelou, J. T. Roehr, E. Kodela, A. Katsambas, H. Tsao, J. P. A. Ioannidis (+2 others)
2011 Journal of the National Cancer Institute  
The main reasons for low grades in the last criterion (protection from bias) were the presence of a summary odds ratio less than 1.15 that can easily be dissipated even by relatively small biases in a  ...  CM-GWAS and/or GWAS-replication datasets were available (27) (28) (29) .  ... 
doi:10.1093/jnci/djr219 pmid:21693730 pmcid:PMC4719704 fatcat:mfpqwiotzjevnld245tolnlisq

Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression

Michele A. Busby, Chip Stewart, Chase A. Miller, Krzysztof R. Grzeda, Gabor T. Marth
2013 Computer applications in the biosciences : CABIOS  
ACKNOWLEDGEMENTS We thank our system administrator Tony Schreiner for substantial help in deploying Scotty on the web and Kourosh Zarringhalam, Patricia Smith, and Igor Lasic for helpful suggestions.  ...  many replicates, are too expensive, do not have sufficient power or result in datasets where large subsets of genes are measured with a high measurement bias.  ...  Further, measuring a large fraction of the genes with low read counts can produce a dataset that is biased against identifying differentially expressed genes with low read counts because these genes will  ... 
doi:10.1093/bioinformatics/btt015 pmid:23314327 pmcid:PMC3582267 fatcat:p4tygpyg3ngudomueihcctsfhm

How biased is the sample? Reverse engineering the ranking algorithm of Facebook's Graph application programming interface

Justin Chun-Ting Ho
2020 Big Data & Society  
assess the bias caused by the new limitation.  ...  Top-term analysis reveals that there are significant differences in the most prominent terms between the full and partial dataset.  ...  The summary of this paper was published in the Proceedings of the 30th ACM Conference on Hypertext and Social Media in Hof, Germany, 2019.  ... 
doi:10.1177/2053951720905874 fatcat:n2qo3kgcpfbr3jku5qapevawya
« Previous Showing results 1 — 15 out of 129,504 results