Filters








139,756 Hits in 6.7 sec

Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance

Julián Urbano, Mónica Marrero
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
However, in this paper we focus on the expected rank correlation between the mean scores observed with a test collection and the true, unobservable means under the same conditions.  ...  In general, such estimates of expected correlation with the true ranking may accompany the results reported from an evaluation experiment, as an easy to understand figure of reliability.  ...  In this paper we tackle the problem of estimating the correlation between the ranking of systems obtained with a test collection and the true ranking under the same conditions.  ... 
doi:10.1145/2911451.2914752 dblp:conf/sigir/UrbanoM16 fatcat:6a7oy466eff2van3h2mqcxjhii

Towards minimal test collections for evaluation of audio music similarity and retrieval

Julián Urbano, Markus Schedl
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
A low-cost alternative is the application of Minimal Test Collection algorithms, which offer quite reliable results while significantly reducing the annotation effort.  ...  The idea is to incrementally select what documents to judge so that we can compute estimates of the effectiveness differences between systems with a certain degree of confidence.  ...  A traditional way of comparing the estimated ranking and the true ranking is to compute the Kendall's τ correlation coefficient between the two.  ... 
doi:10.1145/2187980.2188223 dblp:conf/www/UrbanoS12 fatcat:egqfzrvt3ncljbsj7ym2mwc26i

Supervised Off-Policy Ranking [article]

Yue Jin, Yue Zhang, Tao Qin, Xudong Zhang, Jian Yuan, Houqiang Li, Tie-Yan Liu
2021 arXiv   pre-print
Experiments on different games, datasets, training policy sets, and test policy sets show that our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between  ...  Previous OPE methods mainly focus on precisely estimating the true performance of a policy.  ...  (a), (b) and (c): rank correlation comparison between SOPR-T and SOPR-MLP in Test Set II.  ... 
arXiv:2107.01360v1 fatcat:h5vp3fuk5rhzjdwm3ysa2butty

Evaluating Classifiers Without Expert Labels [article]

Hyun Joon Jung, Matthew Lease
2012 arXiv   pre-print
Finally, we measure both score and rank correlations between estimated classifier performance vs. actual performance according to expert judgments.  ...  Instead, we seek methodology for estimating performance of the classifiers which is more scalable than expert labeling yet preserves high correlation with evaluation based on expert labels.  ...  Score and rank correlation is then measured between estimated vs. actual scores and ranks (c1).  ... 
arXiv:1212.0960v1 fatcat:rwwlywasunaufa67gxrxdeo3wa

Predicting the effectiveness of queries and retrieval systems

Claudia Hauff
2010 SIGIR Forum  
The weakest approach, ACSimScore estimates the ranking of systems with a correlation between τ = 0.42 and τ = 0.65.  ...  The results are shown in Table 5 The results are very regular across all data sets and system ranking estimation methods: the spread in correlation between the best and worst case are extremely wide  ...  Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.  ... 
doi:10.1145/1842890.1842906 fatcat:jkesk5hrvfe77bg5xr7yg7bbie

Test collection reliability: a study of bias and robustness to statistical assumptions via stochastic simulation

Julián Urbano
2015 Information retrieval (Boston)  
The number of topics that a test collection contains has a direct impact on how well the evaluation results reflect the true performance of systems.  ...  In this paper we first compare measures and estimators of test collection accuracy and propose unbiased statistical estimators of the Kendall tau and tau AP correlation coefficients.  ...  I am very thankful to Mónica Marrero, the anonymous reviewers and the editors for their help in making this paper.  ... 
doi:10.1007/s10791-015-9274-y fatcat:t2ukbg7ekbhrdhskvkpikfsyxu

If I Had a Million Queries [chapter]

Ben Carterette, Virgil Pavlu, Evangelos Kanoulas, Javed A. Aslam, James Allan
2009 Lecture Notes in Computer Science  
As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results  ...  In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems.  ...  Minimal Test Collections The Minimal Test Collections (MTC) method works by identifying documents that will be most informative for understanding performance differences between systems by some evaluation  ... 
doi:10.1007/978-3-642-00958-7_27 fatcat:ikc7siy45jcj5a5y32fnnoiy5u

The Simpson's Paradox in the Offline Evaluation of Recommendation Systems [article]

Amir H. Jadidinejad, Craig Macdonald, Iadh Ounis
2021 arXiv   pre-print
and Coat), respectively, in reflecting the true ranking of systems with an open loop (randomised) evaluation in comparison to the standard evaluation.  ...  Using the relative comparison of many recommendation models as in the typical offline evaluation of recommender systems, and based on the Kendall rank correlation coefficient, we show that our proposed  ...  We use Kendall's rank correlation coefficient to measure the correlation between the relative order of the examined models in each evaluation method In addition, we use Steiger's method [39] to test  ... 
arXiv:2104.08912v1 fatcat:fto33uml6bfsnmbypdzne77beq

SEM Model Analysis on the Effect of Antecedents of the University of Nairobi and Jiangsu University's Academic Quality within the Higher Education Institutions

Joseph Muiruri Thige, Hongbo Li, Ssali Max William, Falvian Athiambo
2021 Creative Education  
robustness diagnostics for the impacts and directions identified towards academic quality in previous study as well as affirming the consistency of the result; 2) innovatively implicates anew objective  ...  The study harmonizes the structural estimations concocted in previous studies concerning the impact, relations and associations of and/or amid the five main parameters to and/or amid the academic quality  ...  Acknowledgements Conflicts of Interest The authors declare no conflicts of interest regarding the publication of this paper.  ... 
doi:10.4236/ce.2021.128145 fatcat:ndfcuutajnbe3lya66iagipko4

Evaluating epistemic uncertainty under incomplete assessments

Mark Baillie, Leif Azzopardi, Ian Ruthven
2008 Information Processing & Management  
evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections.  ...  The adoption of this methodology is advantageous, because the detection of epistemic uncertainty -the amount of knowledge (or ignorance) we have about the estimate of a system's performance -during the  ...  ranks between systems when estimating MAP 14 .  ... 
doi:10.1016/j.ipm.2007.04.002 fatcat:gabq7nsuyrhwjewo2eph2zi4rm

Estimating Video Streaming QoE in the 5G Architecture Using Machine Learning

Susanna Schwarzmann, Clarissa Cassales Marquezan, Marcin Bosk, Huiran Liu, Riccardo Trivisonno, Thomas Zinner
2019 Proceedings of the 4th Internet-QoE Workshop on QoE-based Analysis and Management of Data Communication Networks - Internet-QoE'19  
It is, however, unclear how 5G networks can collect monitoring data and application metrics, how they correlate to each other, and which techniques can be used in 5G systems for QoE estimation.  ...  This paper studies the feasibility of Machine Learning (ML) techniques for QoE estimation and evaluates their performance for a mobile video streaming use-case.  ...  Acknowledgments This work has been funded by the German BMBF Software Campus Grant "BigQoE" (01IS17052).  ... 
doi:10.1145/3349611.3355547 dblp:conf/mobicom/SchwarzmannMBLT19 fatcat:xboz2hvofbejvf6fey2mbprwwq

A Case for Automatic System Evaluation [chapter]

Claudia Hauff, Djoerd Hiemstra, Leif Azzopardi, Franciska de Jong
2010 Lecture Notes in Computer Science  
We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation.  ...  Our analysis reveals that the performance of system ranking estimation approaches varies across topics.  ...  Across the data sets and system ranking estimation methods, the spread in correlation between the best and worst case is very wide; in the worst case, there is no significant correlation between the ground  ... 
doi:10.1007/978-3-642-12275-0_16 fatcat:gozggcd5qbgjhnyhgugk4nzrha

An Evaluation of Empirical Bayes's Estimation of Value-Added Teacher Performance Measures

Cassandra M. Guarino, Michelle Maxfield, Mark D. Reckase, Paul N. Thompson, Jeffrey M. Wooldridge
2015 Journal of educational and behavioral statistics  
In this paper we review the theory of EB estimation and use simulated data to study its ability to properly rank teachers.  ...  Under nonrandom assignment, estimators that explicitly (if imperfectly) control for the teacher assignment mechanism perform the best out of all the estimators we examine.  ...  As the theory suggests, EB LAG performs well in the four cohort case, with rank correlations between the estimated and the true teacher effects near 0.86, which is nearly the same as the 0.85 rank correlation  ... 
doi:10.3102/1076998615574771 fatcat:h5ccynklpngd7ehxg4yjmvhshe

Improved Estimation and Interpretation of Correlations in Neural Circuits

Dimitri Yatsenko, Krešimir Josić, Alexander S. Ecker, Emmanouil Froudarakis, R. James Cotton, Andreas S. Tolias, Jonathan W. Pillow
2015 PLoS Computational Biology  
Importantly, the identity and structure of the most efficient estimator informs about the types of dominant dependencies governing the system.  ...  In our data obtained from fast 3D two-photon imaging of calcium signals of large and dense groups of neurons in mouse visual cortex, the best estimation performance was attained by decomposing the correlation  ...  Acknowledgments We thank Genevera Allen for a helpful discussion, and Eftychios Pnevmatikakis for helpful suggestions and feedback on the manuscript. Author Contributions  ... 
doi:10.1371/journal.pcbi.1004083 pmid:25826696 pmcid:PMC4380429 fatcat:fbxgnwsednfcflrfeladq4on4e

Automatic Ground Truth Expansion for Timeline Evaluation

Richard McCreadie, Craig Macdonald, Iadh Ounis
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low).  ...  Our results show that our proposed expansion techniques can be effective for increasing the robustness of the TREC-TS test collections, markedly reducing the number of miss-rankings by up to 50% on average  ...  They reported high correlations between system rankings pre and post pooling, indicating that the test collections are reusable.  ... 
doi:10.1145/3209978.3210034 dblp:conf/sigir/McCreadieMO18 fatcat:e72afifhmzcffphepperyqhuny
« Previous Showing results 1 — 15 out of 139,756 results