13,667 Hits in 4.8 sec

Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation

Julián Urbano
2015 Information retrieval (Boston)  
Third, through large-scale simulation from TREC data, we analyze the bias of a range of estimators of test collection accuracy.  ...  Second, we detail a method for stochastic simulation of evaluation results under different statistical assumptions, which can be used for a variety of evaluation research where we need to know the true  ...  Acknowledgements This work was supported by an A4U postdoctoral grant, a Juan de la Cierva postdoctoral fellowship and the Spanish Government (HAR2011-27540).  ... 
doi:10.1007/s10791-015-9274-y fatcat:t2ukbg7ekbhrdhskvkpikfsyxu

Stochastic Simulation of Test Collections

Julián Urbano, Thomas Nagler
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
To overcome these limitations, we propose a method based on vine copulas for stochastic simulation of evaluation results where the true system distributions are known upfront.  ...  In the basic use case, it takes the scores from an existing collection to build a semi-parametric model representing the set of systems and the population of topics, which can then be used to make realistic  ...  ACKNOWLEDGMENTS This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. JU dedicates this work to the brave people of Namek.  ... 
doi:10.1145/3209978.3210043 dblp:conf/sigir/UrbanoN18 fatcat:5rfrgu3ss5dspazsoumdxt7g4y

Collective Risk Minimization via a Bayesian Model for Statistical Software Testing [article]

Joachim Haensel, Christian M. Adriano, Johannes Dyck, Holger Giese
2020 arXiv   pre-print
We studied this problem from the perspective of reliability engineering in which a given risk of an accident has severity and probability of occurring.  ...  Our contributions comprise (1) a set of strategies to monitor the operational data of multiple autonomous vehicles, (2) a Bayesian model that estimates changes in the risk of accidents, and (3) a feedback  ...  Rare events were also obtained via simulation and used to test autonomous vehicles for scalability [43] .  ... 
arXiv:2005.07460v1 fatcat:dubxpmatk5hbtc7tvralnmg7ta

Reliability Assessment and Safety Arguments for Machine Learning Components in Assuring Learning-Enabled Autonomous Systems [article]

Xingyu Zhao, Wei Huang, Vibhav Bharti, Yi Dong, Victoria Cox, Alec Banks, Sen Wang, Sven Schewe, Xiaowei Huang
2021 arXiv   pre-print
Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling  ...  We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose practical solutions.  ...  for Certification of Assets [EP/R026173/1, EP/W001136/1] and End-to-End Conceptual Guarding of Neural Architectures [EP/T026995/1]).  ... 
arXiv:2112.00646v1 fatcat:ggujgl5exbh7rck4sxavdoyziu

Reliable Off-policy Evaluation for Reinforcement Learning [article]

Jie Wang, Rui Gao, Hongyuan Zha
2021 arXiv   pre-print
non-asymptotic and asymptotic guarantees under stochastic or adversarial environments.  ...  In this paper, we propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged trajectories data.  ...  Motivated by these problems, the main goal of this paper is to develop reliable confidence interval (CI) estimates for OPE with provable statistical guarantees.  ... 
arXiv:2011.04102v2 fatcat:hd4mbmxxtjd65iprydrf2xygxy

An inquiry into the reliability of window operation models in building performance simulation

Farhang Tahmasebi, Ardeshir Mahdavi
2016 Building and Environment  
To address this issue, the present study deploys long-term monitored data from an office area and its calibrated simulation model to conduct an external evaluation of a number of stochastic and non-stochastic  ...  window operation models in view of their a) potential in predicting occupants' operation of windows, and b) effectiveness to enhance the reliability of building performance simulation efforts.  ...  Conclusions We studied a number of stochastic and non-stochastic window operation models to evaluate their predictive performance and their effectiveness to enhance the reliability of common building performance  ... 
doi:10.1016/j.buildenv.2016.06.013 fatcat:x55xkehnuvdnxgzoadbojr2tay

How Reliable are Bootstrap-based Heteroskedasticity Robust Tests? [article]

Benedikt M. Pötscher, David Preinerstorfer
2021 arXiv   pre-print
This allows us to assess the reliability of a large variety of wild bootstrap-based tests in an extensive numerical study.  ...  We develop theoretical finite-sample results concerning the size of wild bootstrap-based heteroskedasticity robust tests in linear regression models.  ...  While these studies provide helpful information, there is -as always with simulation studies -an issue to what extent conclusions of such a study generalize.  ... 
arXiv:2005.04089v2 fatcat:m3sanxsgefgmbcjj3uuqaoh2uy

Assessing the reliability of species distribution projections in climate change research

Luca Santini, Ana Benítez‐López, Luigi Maiorano, Mirza Čengić, Mark A. J. Huijbregts, Yoan Fourcade
2021 Diversity and Distributions: A journal of biological invasions and biodiversity  
In this study, we provide an overview of common modelling practices in the field and assess the reliability of model predictions using a virtual species approach. Location: Global.  ...  A robust validation by spatially independent samples is required, but does not rule out inflation of model accuracy by assumption violation.  ...  ; Fourcade et al., 2018; Vale et al., 2014; Wenger & Olden, 2012) , to our knowledge, no study has yet tested the reliability of both present and future predictions while considering the effects of different  ... 
doi:10.1111/ddi.13252 fatcat:5dca4qiazjfkffjvwozcgfotdy

Formal models of source reliability

Christoph Merdes, Momme von Sydow, Ulrike Hahn
2020 Synthese  
All are Bayesian models seeking to provide normative guidance, yet they differ subtly in assumptions and resulting behavior.  ...  The paper introduces, compares and contrasts formal models of source reliability proposed in the epistemology literature, in particular the prominent models of Bovens and Hartmann (Bayesian epistemology  ...  This project was funded by the Humboldt Foundation's "Anneliese Meier Research Award" to Ulrike Hahn.  ... 
doi:10.1007/s11229-020-02595-2 fatcat:rohc5flcgzgspkez626dze5hxq

Implementation of Fog computing for reliable E-health applications

Razvan Craciunescu, Albena Mihovska, Mihail Mihaylov, Sofoklis Kyriazakos, Ramjee Prasad, Simona Halunga
2015 2015 49th Asilomar Conference on Signals, Systems and Computers  
The final evaluation is done with computer simulation of a multi-cell system.  ...  An important aspect is robust and resource efficient preamble design to minimize missed detection and false alarm probabilities of service requests.  ...  and bias a community.  ... 
doi:10.1109/acssc.2015.7421170 dblp:conf/acssc/CraciunescuMMKP15 fatcat:qm6mki5z6bcvrfimkmqjyrxaxm

Fast and Reliable Primary Frequency Reserves From Refrigerators with Decentralized Stochastic Control [article]

Evangelos Vrettos, Charalampos Ziras, Göran Andersson
2016 arXiv   pre-print
In addition, we propose a procedure to dynamically reset the thermostat temperature limits in order to provide reliable PFC reserves, as well as a corrective temperature feedback loop to build robustness  ...  to biased frequency deviations.  ...  In the future, we plan to investigate the controller's robustness to excessive compressor locking, and perform dynamic simulation studies in a two-area power system model.  ... 
arXiv:1610.00953v1 fatcat:retiv2wjfngrzf3avpo2wc2pgu

A survey of design methods for failure detection in dynamic systems

1977 Microelectronics and reliability  
The methods surveyed range from the design of specific failure-sensitive filters, to the use of statistical tests on filter innovations, to the development of jump process formulations.  ...  In this paper we survey a number of methods for the detection of abrupt changes (such as failures) in stochastic dynamical systems.  ...  Acknowledgements--The author is in the debt of many colleagues for their comments during numerous discussions on the subject of failure detection. In particular, special thanks must go to Dr. R. C.  ... 
doi:10.1016/0026-2714(77)90326-2 fatcat:y7dr2nmoavhjhdbmxl6ppasxa4

The latent state hazard model, with application to wind turbine reliability

Ramin Moghaddass, Cynthia Rudin
2015 Annals of Applied Statistics  
The ability to identify the underlying latent state can help better understand the effects of external sources and thus lead to more robust decision-making.  ...  We present a new model for reliability analysis that is able to distinguish the latent internal vulnerability state of the equipment from the vulnerability caused by temporary external sources.  ...  We would also like to thank Şeyda Ertekin and Ken Cohn for helpful discussions. APPENDIX A: PROOF OF PROPOSITION 1.  ... 
doi:10.1214/15-aoas859 fatcat:ycfnjkknejdcloqpo4mmp2wzde

Akaike-type criteria and the reliability of inference: Model selection versus statistical model specification

Aris Spanos
2010 Journal of Econometrics  
so as to render the particular data a truly typical realization of the stochastic process underlying the model in question.  ...  The paper argues for a return to the original statistical model specification problem, as envisaged by Fisher (1922) , where the task is understood as one of selecting a statistical model in such a way  ...  and pre-test bias The question sometimes raised is whether the above error statistical strategies of M-S testing and respecification are vulnerable to the charge of pre-test bias.  ... 
doi:10.1016/j.jeconom.2010.01.011 fatcat:pmpkf7ypffcztpdcys77o6dh4e

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation

Désirée Baumann, Knut Baumann
2014 Journal of Cheminformatics  
For the simulated data, a bias-variance decomposition is provided.  ...  As compared to a single test set, double cross-validation provided a more realistic picture of model quality and should be preferred over a single test set.  ...  Similar to the theoretical simulation study, the additional 'oracle' data set was used as a large and truly independent test set in order to investigate the validity and performance of double cross-validation  ... 
doi:10.1186/s13321-014-0047-1 pmid:25506400 pmcid:PMC4260165 fatcat:eojtn3ezlncplopwlifxfph42q
« Previous Showing results 1 — 15 out of 13,667 results