A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Identifying Statistical Bias in Dataset Replication
[article]
2020
arXiv
pre-print
In this work, we present unintuitive yet significant ways in which standard approaches to dataset replication introduce statistical bias, skewing the resulting observations. ...
We conclude with concrete recommendations for recognizing and avoiding bias in dataset replication. ...
Acknowledgements We thank Will Fithian for discussions and advice, particularly around the spline modeling done in Section 5. ...
arXiv:2005.09619v2
fatcat:tuq3yuqy3vez5knc5k74nc3uii
A hierarchical statistical modeling approach to analyze proteomic isobaric tag for relative and absolute quantitation data
2013
Computer applications in the biosciences : CABIOS
Further validated using a biological iTRAQ dataset including multiple biological replicates from varied murine cell lines, WHATraq performed consistently and identified 375% more proteins as being differentially ...
Technical variation inherent in this iTRAQ dataset was systematically investigated. ...
Statistical modeling of variation present in iTRAQ datasets In general terms, the purpose of modeling an iTRAQ dataset was to estimate the technical variation inherent in the dataset so that the quality ...
doi:10.1093/bioinformatics/btt722
pmid:24344193
fatcat:4fwyaozczjauldk6lyzghbmpcq
Navigating sample overlap, winner's curse and weak instrument bias in Mendelian randomization studies using the UK Biobank
[article]
2021
medRxiv
pre-print
We designed a process of pseudo-replication within the UK Biobank data to generate GWAS estimates that minimise bias in MR studies using these data. ...
AbstractWe performed GWAS on 2514 complex traits from the UK Biobank using a linear mixed model, identifying 40,620 independent significant associations (p<5×10−8). ...
JB is supported by an Expanding Excellence in England (E3) research grant awarded to the University of Exeter. ...
doi:10.1101/2021.06.28.21259622
fatcat:ivo3hbeh45ac3f4ropfqoxum64
Fully synthetic neuroimaging data for replication and exploration
2020
NeuroImage
The magnitude and spatial distribution of gray matter effects in the observed imaging data were replicated in the pooled results from the synthetic datasets. ...
The synthetic predictor tables demonstrated pooled variance and statistical estimates that closely approximated the observed data, as reflected in measures of efficiency and statistical bias. ...
in a facility constructed with support from Research Facilities Improvement Program (C06 RR 014516) from the NIH / National Center for Research Resources. ...
doi:10.1016/j.neuroimage.2020.117284
pmid:32828925
pmcid:PMC7688496
fatcat:pjfjbfrmu5burlzvcliijkr7w4
Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression
2006
BMC Bioinformatics
This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition. ...
Conclusion: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition. ...
Halfon for providing the Choe et al. dataset. This work was supported by a NSERC discovery grant to SL and the Institute for Research in Immunology and Cancer at the Université de Montréal. ...
doi:10.1186/1471-2105-7-391
fatcat:z4uc2i4wlbgobmvsvjqhruqxci
qSVA framework for RNA quality correction in differential expression analysis
2017
Proceedings of the National Academy of Sciences of the United States of America
In this article, we describe an analysis framework for identifying and removing previously uncharacterized quality biases in measurements of RNA. ...
We show that this approach results in greatly improved replication rates (>3×) across two large independent postmortem human brain studies of schizophrenia and also removes potential RNA quality biases ...
Data deposition: The sequences reported in this paper have been deposited with the National Center for Biotechnology Information (NCBI BioProject number PRJNA389171 and NCBI SRA project SRP108559). ...
doi:10.1073/pnas.1617384114
pmid:28634288
pmcid:PMC5502589
fatcat:fvh5xzyxvrf6xlg3kux3ojte2a
Consistent Reanalysis of Genome-wide Imprinting Studies in Plants Using Generalized Linear Models Increases Concordance across Datasets
[article]
2017
bioRxiv
pre-print
We show that our statistical pipeline outperforms other methods in identifying imprinted genes in simulated and real data. ...
Here, we adopt a statistical approach, frequently used in RNA-seq data analysis, which properly models count overdispersion and considers replicate information of reciprocal crosses. ...
There are several ways to prioritize a gene list after statistically calling genes with an allelic bias in a given tissue. ...
doi:10.1101/180745
fatcat:e22iszt6rnhippued6fg6z33ee
Internal replication of computational workflows in scientific research
2020
Gates Open Research
External replication of published findings by outside investigators has emerged as a method to detect errors and bias in the published literature. ...
Here we summarize the rationale and best practices for internal replication, a process in which multiple independent data analysts replicate an analysis and correct errors prior to publication. ...
in each dataset and same values in each column). ...
doi:10.12688/gatesopenres.13108.1
pmid:32803129
pmcid:PMC7403855
fatcat:jzi2s27tgvhovanpf7uq6clpta
RECAP reveals the true statistical significance of ChIP-seq peak calls
2019
Bioinformatics
A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. ...
classical statistical hypothesis testing. ...
This work was supported in part by Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). Conflict of Interest: none declared. ...
doi:10.1093/bioinformatics/btz150
pmid:30824903
pmcid:PMC6761936
fatcat:okx6zzvpm5ggblvap7d7p2mnjm
Bayesian integrated modeling of expression data: a case study on RhoG
2010
BMC Bioinformatics
The model was applied and tested on two different datasets. ...
, which properly accounts for various sources of potential error in the process. ...
Assessing dye bias Dataset-2 was used to assess the dye-biasness (β i ) as it has three replicates of which one has dye orientation reversed. ...
doi:10.1186/1471-2105-11-295
pmid:20515463
pmcid:PMC2894040
fatcat:3sjoywa42vexfjpvr2ca4hhuku
Internal replication of computational workflows in scientific research
2020
Gates Open Research
External replication of published findings by outside investigators has emerged as a method to detect errors and bias in the published literature. ...
Here we summarize the rationale and best practices for internal replication, a process in which multiple independent data analysts replicate an analysis and correct errors prior to publication. ...
Thus, we believe that internal replication is more likely to identify failures to replicate results due to coding errors and biases than pair programming." Details in order of appearance: 1. ...
doi:10.12688/gatesopenres.13108.2
fatcat:4mn3nl2fcvbc7ncyewwrmnsu2a
Confronting false discoveries in single-cell differential expression
[article]
2021
bioRxiv
pre-print
While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. ...
Methods that ignore this inevitable variation are biased and prone to false discoveries. ...
To address this concern comprehensively, we identified a total of fourteen datasets that included at least six replicates in the control condition. ...
doi:10.1101/2021.03.12.435024
fatcat:uu5kny5p6zes3i53uakxyqpjwy
Comprehensive Field Synopsis and Systematic Meta-analyses of Genetic Association Studies in Cutaneous Melanoma
2011
Journal of the National Cancer Institute
The main reasons for low grades in the last criterion (protection from bias) were the presence of a summary odds ratio less than 1.15 that can easily be dissipated even by relatively small biases in a ...
CM-GWAS and/or GWAS-replication datasets were available (27) (28) (29) . ...
doi:10.1093/jnci/djr219
pmid:21693730
pmcid:PMC4719704
fatcat:mfpqwiotzjevnld245tolnlisq
Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression
2013
Computer applications in the biosciences : CABIOS
ACKNOWLEDGEMENTS We thank our system administrator Tony Schreiner for substantial help in deploying Scotty on the web and Kourosh Zarringhalam, Patricia Smith, and Igor Lasic for helpful suggestions. ...
many replicates, are too expensive, do not have sufficient power or result in datasets where large subsets of genes are measured with a high measurement bias. ...
Further, measuring a large fraction of the genes with low read counts can produce a dataset that is biased against identifying differentially expressed genes with low read counts because these genes will ...
doi:10.1093/bioinformatics/btt015
pmid:23314327
pmcid:PMC3582267
fatcat:p4tygpyg3ngudomueihcctsfhm
How biased is the sample? Reverse engineering the ranking algorithm of Facebook's Graph application programming interface
2020
Big Data & Society
assess the bias caused by the new limitation. ...
Top-term analysis reveals that there are significant differences in the most prominent terms between the full and partial dataset. ...
The summary of this paper was published in the Proceedings of the 30th ACM Conference on Hypertext and Social Media in Hof, Germany, 2019. ...
doi:10.1177/2053951720905874
fatcat:n2qo3kgcpfbr3jku5qapevawya
« Previous
Showing results 1 — 15 out of 129,504 results