44,967 Hits in 3.8 sec

Stochastic Simulation of Test Collections

Julián Urbano, Thomas Nagler
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
To overcome these limitations, we propose a method based on vine copulas for stochastic simulation of evaluation results where the true system distributions are known upfront.  ...  simulations of the scores by the same systems but on random new topics.  ...  ACKNOWLEDGMENTS This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. JU dedicates this work to the brave people of Namek.  ... 
doi:10.1145/3209978.3210043 dblp:conf/sigir/UrbanoN18 fatcat:5rfrgu3ss5dspazsoumdxt7g4y

Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation

Julián Urbano
2015 Information retrieval (Boston)  
Third, through large-scale simulation from TREC data, we analyze the bias of a range of estimators of test collection accuracy.  ...  Second, we detail a method for stochastic simulation of evaluation results under different statistical assumptions, which can be used for a variety of evaluation research where we need to know the true  ...  In Sect. 5 we propose the algorithm for stochastic simulation of evaluation results.  ... 
doi:10.1007/s10791-015-9274-y fatcat:t2ukbg7ekbhrdhskvkpikfsyxu

Evaluating the Equal-Interval Hypothesis with Test Score Scales

Ben Domingue
2013 Psychometrika  
Domingue, Benjamin Webre (Ph.D., Research and Evaluation Methodology) Evaluating the Equal-Interval Hypothesis with Test Score Scales Thesis directed by Dr.  ...  In this dissertation, an improved version of Karabatsos's methodology is applied to simulated and empirical data to test whether such data are consistent with the axioms.  ...  While it would seem like the minority students have grown more than White students over that time period, any kind of quantification of this growth requires that the NAEP scale scores be on an interval  ... 
doi:10.1007/s11336-013-9342-4 pmid:24532164 fatcat:lsgv3ine3nfs3ipo5k2ajzmx2i

Proper scoring rules for evaluating asymmetry in density forecasting [article]

Matteo Iacopini and Francesco Ravazzolo and Luca Rossini
2020 arXiv   pre-print
Then, the proposed score and test are applied to assess and compare density forecasts of macroeconomic relevant datasets (US employment growth) and of commodity prices (oil and electricity prices) with  ...  This paper proposes a novel asymmetric continuous probabilistic score (ACPS) for evaluating and comparing density forecasts.  ...  scoring rules for evaluating density forecasts.  ... 
arXiv:2006.11265v2 fatcat:wkjzr6ymkjdttkorhdg3xqswjq

Using score distributions to compare statistical significance tests for information retrieval evaluation

Javier Parapar, David E. Losada, Manuel A. Presedo‐Quindimil, Alvaro Barreiro
2019 Journal of the Association for Information Science and Technology  
Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests.  ...  This new method for studying the power of significance tests in Information Retrieval evaluation is formal and innovative.  ...  Acknowledgements This work has received financial support from the i) "Ministerio de Economía y Competitividad" of the Government of Spain and FEDER Funds under the research project TIN2015-64282-R, ii  ... 
doi:10.1002/asi.24203 fatcat:jgohta6wmvfhbm3owh4zdoq42u

An Empirical Evaluation of Sketched SVD and its Application to Leverage Score Ordering [article]

Hui Han Chin, Paul Pu Liang
2018 arXiv   pre-print
We provide a comprehensive empirical evaluation of these algorithms and provide guidelines on how to ensure accurate deployment to real-world data.  ...  Our technique is based on the distributed computation of leverage scores using random projections.  ...  The optimization of deep neural networks is largely based on stochastic gradient methods.  ... 
arXiv:1812.07903v1 fatcat:uai3p2q3zfgaznagbvznyjaxua

Studying Summarization Evaluation Metrics in the Appropriate Scoring Range

Maxime Peyrard
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
This is a call for collecting human judgments for high-scoring summaries as this would resolve the debate over which metrics to trust.  ...  We show that, surprisingly, evaluation metrics which behave similarly on these datasets (average-scoring range) strongly disagree in the higher-scoring range in which current systems now operate.  ...  Acknowledgements This work was partly supported by the German Research Foundation (DFG) as part of the Research Training Group "Adaptive Preparation of Information from Heterogeneous Sources" (AIPHES)  ... 
doi:10.18653/v1/p19-1502 dblp:conf/acl/Peyrard19a fatcat:t35x2mb3ejbp5bx5cqps3njla4

Evaluating High Availability-aware Deployments Using Stochastic Petri Net Model and Cloud Scoring Selection Tool

Manar Jammal, Ali Kanso, Parisa Heidari, Abdallah Shami
2017 IEEE Transactions on Services Computing  
While the Petri Net model evaluates the availability of cloud applications deployments, the scoring system selects the optimal HA-aware deployment in terms of energy, operational expenditure (OPEX), and  ...  Therefore, this paper proposes a cloud scoring system and integrates it with a Stochastic Petri Net model.  ...  ACKNOWLEDGMENT This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC-STPGP 447230) and Ericsson Research. We would like to thank Prof.  ... 
doi:10.1109/tsc.2017.2781730 fatcat:a552n7rjpncq3lobgr5tb35llm

Database, protocols and tools for evaluating score-level fusion algorithms in biometric authentication

Norman Poh, Samy Bengio
2006 Pattern Recognition  
It then proposes several fusion protocols and provides some state-of-the-art tools to evaluate the fusion performance.  ...  Fusing the scores of several biometric systems is a very promising approach to improve the overall system's accuracy.  ...  Finally, the test set (LP Test) was used to estimate the performance. The 295 subjects were divided into a set of 200 clients, 25 evaluation impostors and 70 test impostors.  ... 
doi:10.1016/j.patcog.2005.06.011 fatcat:kjskktn53jhxhc47ayia3gdnnm

A Validity Argument Approach to Evaluating Teacher Value-Added Scores

Heather C. Hill, Laura Kapitula, Kristin Umland
2011 The American Educational Research Journal  
This analysis found teachers' value-added scores correlated not only with their mathematical knowledge and quality of instruction but also with the population of students they teach.  ...  This article focuses on the extent to which value-added scores correspond to other indicators of teacher and teaching quality.  ...  Although covariate adjustment models are known to be biased due to measurement error, Sanders (2006) argues, based on simulation and empirical data results, that when at least three previous test scores  ... 
doi:10.3102/0002831210387916 fatcat:kiz7y7wefncf7od6nhqweceoq4

Collective Learning [article]

Francesco Farina
2021 arXiv   pre-print
In this paper, we introduce the concept of collective learning (CL) which exploits the notion of collective intelligence in the field of distributed semi-supervised learning.  ...  The proposed framework draws inspiration from the learning behavior of human beings, who alternate phases involving collaboration, confrontation and exchange of views with other consisting of studying  ...  We run a simulation in this setup for 3 epochs over the shared set D s and the results at the end of the simulation are reported in Figure 9 in terms of the accuracy on the test set D test .  ... 
arXiv:1912.02580v2 fatcat:d3ejresqgfczpdwz5kh5uzn7xe

DIF Assessment for Polytomously Scored Items: A Framework for Classification and Evaluation

Maria T. Potenza, Neil J. Dorans
1995 Applied Psychological Measurement  
Partial funding for this paper was provided by the College Board Division of the Educational Testing Service.  ...  A previous version of this paper was presented at the 1993 annual meeting of the National Council on Measurement in Education, Atlanta GA.  ...  Both of these latent variable approaches are closely linked to a test theory that decomposes an observed score into a systematic true score (or a monotone transformation thereof), and a stochastic error  ... 
doi:10.1177/014662169501900104 fatcat:holjyoiz5jgyjliakaim65fevq

A Bayesian Spatial Propensity Score Matching Evaluation of the Regional Impact of Micro-finance

Rolando Gonzales, Patricia Aranda, Joel Mendizabal
2017 Review of Economic Analysis  
The impact of microfinance in Bolivia was tested with this estimator, using census and household survey data.  ...  A Bayesian Spatial-Propensity Score Matching estimator is proposed to measure the regional impact of microfinance on poverty reduction and women's empowerment.  ...  Appendix I at the end of the paper shows a falsification test performed to evaluate the quality of the BS-PSM algorithm by testing the effects of microfinance on deafness.  ... 
doi:10.15353/rea.v9i2.1438 fatcat:hz6s64lkgzdxjppgu7tt7x7f4u

Propensity Score Estimation With Boosted Regression for Evaluating Causal Effects in Observational Studies

Daniel F. McCaffrey, Greg Ridgeway, Andrew R. Morral
2004 Psychological methods  
Propensity score methods can theoretically eliminate these confounds for all observed covariates, but accurate estimation of propensity scores is impeded by large numbers of covariates, uncertain functional  ...  Propensity score weights estimated using boosting eliminate most pretreatment group differences, and substantially alter the apparent relative effects of adolescent substance abuse treatment.  ...  GBM is an algorithm for iteratively forming a collection of simple regression tree models to add together to estimate the propensity score.  ... 
doi:10.1037/1082-989x.9.4.403 pmid:15598095 fatcat:npkwlxp775hhlkpmgzjf7hln24

A comprehensive evaluation of polygenic score methods across cohorts in psychiatric disorders [article]

Guiyan Ni, Jian Zeng, Joana R Revez, Ying Wang, Tian Ge, Restaudi Restaudi, Jacqueline Kiewa, Dale R Nyholt, Jonathan R I Coleman, Jordan W Smoller, Jian Yang, Peter M Visscher (+3 others)
2020 biorxiv/medrxiv   pre-print
Evaluation of new PGS methods are made using simulated data or single target cohort, however, in real data sets there can be heterogeneity between target sample cohorts, which could reflect a number of  ...  PGS methods differ in terms of which DNA variants are included in the score and the weights assigned to them.  ...  simulations.  ... 
doi:10.1101/2020.09.10.20192310 fatcat:5lkm45odgvh7dlhxbwbjre34gm
« Previous Showing results 1 — 15 out of 44,967 results