Filters








76,665 Hits in 4.6 sec

A document rating system for preference judgements

Maryam Bashir, Jesse Anderton, Jie Wu, Peter B. Golbus, Virgil Pavlu, Javed A. Aslam
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
We show how to combine a linear number of pairwise preference judgments from multiple assessors to compute relevance scores for every document.  ...  In this work, we consider the problem of inferring document relevance scores from pairwise preference judgments by analogy to tournaments using the Elo rating system.  ...  Preference based Relevance Judgements.  ... 
doi:10.1145/2484028.2484170 dblp:conf/sigir/BashirAWGPA13 fatcat:mugbn5csvzaehkqw6ibejo6wfm

Northeastern University Runs at the TREC12 Crowdsourcing Track

Maryam Bashir, Jesse Anderton, Jie Wu, Matthew Ekstrand-Abueg, Peter B. Golbus, Virgil Pavlu, Javed A. Aslam
2012 Text Retrieval Conference  
These preferences are then extended to relevance judgments through the use of expectation maximization and the Elo rating system. Our third approach is based on our Nugget-based evaluation paradigm.  ...  Our first two approaches are based on collecting a limited number of preference judgments from Amazon Mechanical Turk workers.  ...  The Elo rating system updates the score of a document when a certain number of preference comparisons for that document have been made.  ... 
dblp:conf/trec/BashirAWEGPA12 fatcat:v2ja3fo7wfgwxamffgyjc7nrsm

University of Amsterdam at the TREC 2013 Contextual Suggestion Track: Learning User Preferences from Wikitravel Categories

Marijn Koolen, Hugo C. Huurdeman, Jaap Kamps
2013 Text Retrieval Conference  
The goal of the track is to evaluate systems that provide suggestions for activities to users in a specific location, taking into account their personal preferences.  ...  As a source for travel suggestions we use Wikitravel, which is a community-based travel guide for destinations all over the world.  ...  Is a category-based document prior effective for ranking? Or are category preferences already captured by using only the descriptions of positively rated examples?  ... 
dblp:conf/trec/KoolenHK13 fatcat:m45lneov25fszkb3p3ladv24ey

Evaluation by comparing result sets in context

Paul Thomas, David Hawking
2006 Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06  
Familiar evaluation methodologies for information retrieval (IR) are not well suited to the task of comparing systems in many real settings.  ...  A tool which provides a unified search interface to all of them is desirable, but a challenge to evaluate.  ...  ACKNOWLEDGEMENTS We would like to thank the Statistical Consulting Unit at the Australian National University for their advice, and Tom Rowlands of the CSIRO ICT Centre for the analysis of search engine  ... 
doi:10.1145/1183614.1183632 dblp:conf/cikm/ThomasH06 fatcat:pwjwhrptaverlookziz2k4hzda

Overview of the TREC 2013 Contextual Suggestion Track

Adriel Dean-Hall, Charles L. A. Clarke, Nicole Simone, Jaap Kamps, Paul Thomas, Ellen M. Voorhees
2013 Text Retrieval Conference  
The second file contains a list of ratings for each suggestion in examples2013.csv given by each user, below are a few example lines from profiles2013.csv: . . .  ...  Note that, for this metric, the user always gives a rating of 0 to the document if the document has a geographical rating of 0. The four parameters for this metric are taken from Dean-Hall et al.  ...  -If the description judgement is 2 or above then the user reads the document which takes time T doc . • A(k) is 1 if the user gives a judgement of 2 or above to the description and 3 or above to the document  ... 
dblp:conf/trec/Dean-HallCSKTV13 fatcat:gbwvueatkvdarisw2u6e37pixm

On Obtaining Effort Based Judgements for Information Retrieval

Manisha Verma, Emine Yilmaz, Nick Craswell
2016 Proceedings of the Ninth ACM International Conference on Web Search and Data Mining - WSDM '16  
as user preference between these documents.  ...  Traditional test collections are constructed by asking judges the relevance grade for a document with respect to an input query.  ...  Hence, preference based judgements are useful in getting unbiased decisions about what users prefer to see in a document without making the judges think about particular aspects associated with a document  ... 
doi:10.1145/2835776.2835840 dblp:conf/wsdm/VermaYC16 fatcat:fgobal42gfhexcmzmnyoxq4t6u

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation [article]

Samuel Läubli, Rico Sennrich, Martin Volk
2018 arXiv   pre-print
In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences.  ...  We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents.  ...  Acknowledgements We thank Xin Sennrich for her help with the analysis of translation errors. We also thank Antonio Toral and the anonymous reviewers for their helpful comments.  ... 
arXiv:1808.07048v1 fatcat:hiz4r5htbjfqnj6f2evqiqr6qy

Better Rewards Yield Better Summaries: Learning to Summarise Without References

Florian Böhm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, Iryna Gurevych
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries.  ...  However, summaries with high ROUGE scores often receive low human judgement.  ...  Introduction Document summarisation aims at generating a summary for a long document or multiple documents on the same topic.  ... 
doi:10.18653/v1/d19-1307 dblp:conf/emnlp/BohmGMSDG19 fatcat:lbbpshx7izdaporfhdb2nkpeyq

Better Rewards Yield Better Summaries: Learning to Summarise Without References [article]

Florian Böhm and Yang Gao and Christian M. Meyer and Ori Shapira and Ido Dagan and Iryna Gurevych
2019 arXiv   pre-print
To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries.  ...  However, summaries with high ROUGE scores often receive low human judgement.  ...  Introduction Document summarisation aims at generating a summary for a long document or multiple documents on the same topic.  ... 
arXiv:1909.01214v1 fatcat:s3567mb5yjcfdmshzo5afccuwu

Study of Relevance and Effort across Devices

Manisha Verma, Emine Yilmaz, Nick Craswell
2018 Proceedings of the 2018 Conference on Human Information Interaction&Retrieval - CHIIR '18  
Relevance judgements are essential for designing information retrieval systems. Traditionally, judgements have been judgements have been gathered via desktop interfaces.  ...  Analysis of these judgements indicates that high agreement rate between desktop and mobile judges for relevance, followed by usefulness and findability.  ...  For generalizability, we obtain judgements for documents of TREC Web track, a publicly available dataset. Our work aims to further answer two research questions.  ... 
doi:10.1145/3176349.3176888 dblp:conf/chiir/VermaYC18 fatcat:b6x3yolz7ngzziyn56p5ktrd6a

Northeastern University Runs at the TREC13 Crowdsourcing Track

Maryam Bashir, Jesse Anderton, Virgil Pavlu, Javed A. Aslam
2013 Text Retrieval Conference  
Our approach is based on collecting a linear number of preference judgements, and combining these into nominal grades using a modified version of QuickSort algorithm.  ...  Participants of this track were required to assess documents judged on a six-point scale.  ...  Grades from Preference After comparing all documents with pivot documents, we sorted the documents using preference judgements.  ... 
dblp:conf/trec/BashirAPA13 fatcat:ejfkuk4gqbaehisqaa7cpmr4ki

>Influence of lay assessors and giving reasons for the judgement in German mixed courts

Christoph Rennig
2001 Revue internationale de droit pénal  
Powered by TCPDF (www.tcpdf.org) Document téléchargé depuis www.cairn.info ---207.241.231.83 -25/07/2018 19h45. © ERES Document téléchargé depuis www.cairn.info ---207.241.231.83 -25/07/2018 19h45. © ERES  ...  International Review of Penal Law (Vol. 72) b) Relevance of the written judgement As in every criminal justice system where the courts have to give reasons for their judgements, the considerations given  ...  . 22 Giving written reasons for the judgement may again be a difficult task for a professional judge who was in the majority.  ... 
doi:10.3917/ridp.721.0481 fatcat:gbv2i2yreveaxca4wthidiq5di

Quality through flow and immersion

Carsten Eickhoff, Christopher G. Harris, Arjen P. de Vries, Padmini Srinivasan
2012 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12  
lower pay rates, facing fewer malicious submissions.  ...  Based on previous experience as well as psychological insights, we propose the use of a game in order to attract and retain a larger share of reliable workers to frequentlyrequested crowdsourcing tasks  ...  Acknowledgements We would like to thank Jiyin He and the Fish4Knowledge project for providing us with the fish images and expert judgements.  ... 
doi:10.1145/2348283.2348400 dblp:conf/sigir/EickhoffHVS12 fatcat:t26baigetzboplrulu6cj76zci

Boosting the ranking function learning process using clustering

Giorgos Giannopoulos, Theodore Dalamagas, Magdalini Eirinaki, Timos Sellis
2008 Proceeding of the 10th ACM workshop on Web information and data management - WIDM '08  
Given a small initial set of user feedback for some search results, we first perform clustering on all results returned by the search.  ...  The experiments show that our method approximates sufficiently the results of an "ideal" system where all results of each query should be rated in order to be used as training data, something that is not  ...  An absolute preference suggests that a search result is either relevant or irrelevant to a query. A relative preference suggests that a search result is more relevant to a query than another result.  ... 
doi:10.1145/1458502.1458523 dblp:conf/widm/GiannopoulosDES08 fatcat:f3ojnwummbgxpbmplhndfd7nqu

Spatial diversity, do users appreciate it?

Jiayu Tang, Mark Sanderson
2010 Proceedings of the 6th Workshop on Geographic Information Retrieval - GIR '10  
Spatial diversity is a relatively new branch of research in the context of spatial information retrieval.  ...  In this paper, we will show our follow-up work on the novel approach to investigating user preference on spatial diversity by using Amazon Mechanical Turk.  ...  We have used a combination of several methods to control the quality of judgements. Firstly, we required the users to have a minimal approval rate of 90%.  ... 
doi:10.1145/1722080.1722108 dblp:conf/gir/TangS10 fatcat:gjsqzc2425cwzi7ew255ltetjm
« Previous Showing results 1 — 15 out of 76,665 results