A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Cross-replication Reliability – An Empirical Approach to Interpreting Inter-rater Reliability
[article]
2021
arXiv
pre-print
We argue this framework can be used to measure the quality of crowdsourced datasets. ...
It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. ...
Acknowledgments We like to thank Gautam Prasad and Alan Cowen for their work on collecting and sharing the IRep dataset and opensourcing it. ...
arXiv:2106.07393v1
fatcat:5prn3rqktzhudjd2y7to2glule
When zero may not be zero: A cautionary note on the use of inter‐rater reliability in evaluating grant peer review
2021
Journal of the Royal Statistical Society: Series A (Statistics in Society)
Considerable attention has focused on studying reviewer agreement via inter-rater reliability (IRR) as a way to assess the quality of the peer review process. ...
Inspired by a recent study that reported an IRR of zero in the mock peer review of top-quality grant proposals, we use real data from a complete range of submissions to the National Institutes of Health ...
for assistance with implementing the restricted-range reliability functionalities into the R package and interactive Shiny application ShinyItemAnalysis. ...
doi:10.1111/rssa.12681
fatcat:ekwjcbzuuzgldir5jtm75stgda
RecSys'17 Joint Workshop on Interfaces and Human Decision Making for Recommender Systems
2017
Proceedings of the Eleventh ACM Conference on Recommender Systems - RecSys '17
We experiment on real-world and artificially generated data, finding that treating label ratings as ordinal, rather than interval data results in an increased inter-rater reliability. ...
We hypothesize that the issues arising from rater bias may be mitigated by treating the data received as an ordered set of preferences rather than a collection of absolute values. ...
This proposition relies on the intuition that corpora of relatively-valued labels will produce higher levels of inter-rater reliability (IRR) than those consisting of absolute values. ...
doi:10.1145/3109859.3109961
dblp:conf/recsys/BrusilovskyGFLO17
fatcat:vishcvo5jrdbnnj24ncydrbcfm
Further Data on the Reliability of the Mentalization Imbalances Scale and of the Modes of Mentalization Scale
2020
Research in Psychotherapy Psychopathology Process and Outcome
The aim of this study was to provide data on the Inter-Rater Reliability (IRR) and the test-retest reliability of the Mentalization Imbalances Scale (MIS) and the Modes of Mentalization Scale (MMS) in ...
Our results provide support to the inter-rater reliability of the MIS and the MMS. ...
Acknowledgments: The authors would like to thank the clinicians and raters who participated to this study by providing their evaluations. ...
doi:10.4081/ripppo.2020.450
pmid:32913829
pmcid:PMC7451392
fatcat:hewcgb4r3bgbxii66vclg34yny
Page 1369 of Psychological Abstracts Vol. 85, Issue 4
[page]
1998
Psychological Abstracts
—The Neurological Eval- uation Scale (NES), the most widely used structured neurological examination in schizophrenia research, has had limited study of its inter-rater reliability (IRR) An augmented version ...
(Wright State U, School of Medicine, Dept of Psychiatry, Dayton, OH) inter-rater reliability of the neurological examination in schizophrenia. Schizophrenia Re- search, 1998(Feb), Vol 29(3), 287-292. ...
Development of the Patient Education Materials Assessment Tool (PEMAT): A new measure of understandability and actionability for print and audiovisual patient information
2014
Patient Education and Counseling
Four rounds of reliability testing and refinement were conducted using raters untrained on the PEMAT. Agreement improved across rounds. ...
We completed four rounds of reliability testing, and produced evidence of construct validity with consumers and readability assessments. Results-The experts deemed the PEMAT items face/content valid. ...
Ross Davies for her guidance on instrument development, and Ken Carlson and Mark Spranca from Abt Associates for their valuable engagement with the reliability and validity testing of the PEMAT. ...
doi:10.1016/j.pec.2014.05.027
pmid:24973195
pmcid:PMC5085258
fatcat:2mcsz6spwvbotafweqpetphpjy
Assessment tool for hospital admissions related to medications: development and validation in older patients
2018
International Journal of Clinical Pharmacy
The tool's inter-rater reliability (IRR) and criterion-related validity (CRV) were assessed: four pairs of either final-year undergraduate or postgraduate pharmacy students applied the tool to one of two ...
Method We reviewed existing literature on methods to identify MRAs. The tool AT-HARM10 was developed using an iterative process including content validity and feasibility testing. ...
Acknowledgements We are sincerely grateful to Dr. Christina Grzechnik Mörk for her contribution as one of the gold standard experts. ...
doi:10.1007/s11096-018-0768-8
fatcat:yp3oe2j7tbfohkhb2zdgtdgaye
Determining Intervention Fidelity From Chronological Field Notes
2015
Journal of Nursing Measurement
We computed inter-reliability (IRR) between the two raters and internal consistency on adherence for each of the theoretically-derived subscales: interpersonal psychotherapy (IPT) and cognitive behavior ...
Inter-rater reliability (IRR) helps to establish the extent of consensus between the two raters using CSPRS instrument in rating field notes. ...
Do not put your name on the form. Your response is anonymous. We encourage you to be frank and honest in your evaluation. Please indicate your answers on the computerized answer sheet. ...
doi:10.1891/1061-3749.23.2.e67
pmid:26284832
fatcat:cbwrbduzcrci7fklrgfoypuslm
Level of personality functioning as a predictor of psychosocial functioning—Concurrent validity of criterion A
2019
Personality Disorders: Theory, Research, and Treatment
The association between the Level of Personality Functioning Scale and psychosocial impairment based on other previously established psychosocial functioning instruments has not been reported. ...
These four domains constitute the Level of Personality Functioning Scale, a trans-diagnostic measure of PD severity. ...
The research on the LPFS is still in its adolescence, but it is to be expected that the increasing amount of research on the AMPD will pave the way for an empirically supported diagnostic model for PDs ...
doi:10.1037/per0000352
pmid:31580097
fatcat:e6lzxskhk5f25niidwzfwo26p4
Bridging the "last mile" gap between AI implementation and operation: "data awareness" that matters
2020
Annals of Translational Medicine
in those medical disciplines that extensively rely on digital imaging. ...
The latter hiatus, on the other hand, relates to the production and availability of a sufficient amount of reliable and accurate clinical data that is suitable to be the "experience" with which a machine ...
Acknowledgments The authors are grateful to the Scientific Department of the IRCCS Galeazzi, and to its head, prof. Giuseppe Banfi, for their continuous and unconditional support. Funding: None. ...
doi:10.21037/atm.2020.03.63
pmid:32395545
pmcid:PMC7210125
fatcat:q5ynl23k5jfztegm77lakxwvwi
Interrater Reliability at the Top End: Measures of Pilots' Nontechnical Performance
2015
The International journal of aviation psychology
For cognitive aspects of 19 performance, inter-rater reliability was higher than for social aspects of performance. 20 Agreement was lower on the pass/fail level than for the distinguished performance ...
The aim of this study is to analyze influences on inter-rater reliability and 6 within-group agreement within a highly experienced rater group when assessing pilots' 7 non-technical skills. 8 Background ...
It 381 can be concluded that the agreement of raters depended on the level of performance that was 382 ICC(3) for inter-rater reliability was found to be poor for the dimensions communication384 (.12), ...
doi:10.1080/10508414.2015.1162636
fatcat:uxxo5sjykfg3lfyy6k3zdkhnza
Development and validation of the Cerebral Performance Categories-Extended (CPC-E)
2015
Resuscitation
We tested the CPC-E's intra-rater reliability (IR) percent agreement (n = 30; range = 73.3% -100%) and inter-rater reliability (IRR) (n = 50; range = 60% -100%) using retrospective chart reviews of the ...
The specific aims were to establish the CPC-E's content validity, and to test its reliability, and feasibility in the hospital setting. ...
Complex Activities of Daily Living (CADLs): Responsible for own medication (medication management), food preparation, shopping and transportation (drives or uses public transportation) ...
doi:10.1016/j.resuscitation.2015.05.013
pmid:26025569
fatcat:qrsmqnekjbebbfhldcsaxnj5w4
Metrology for AI: From Benchmarks to Instruments
[article]
2019
arXiv
pre-print
We begin with the intuitive observation that evaluating the performance of an AI system is a form of measurement. ...
One does not report mass, speed, or length, for example, of a studied object without disclosing the precision (measurement variance) and resolution (smallest detectable change) of the instrument used. ...
Thus, a slew of inter-annotator agreement (also called inter-rater reliability, or IRR) metrics such as Fleiss' Kappa, or Cohen's pi, which was then generalized to all different scales by Krippendorff's ...
arXiv:1911.01875v1
fatcat:clcnimrspbhwvbsvfc4h5rv3hq
Reliability of infarct volumetry: Its relevance and the improvement by a software-assisted approach
2016
Journal of Cerebral Blood Flow and Metabolism
to an unrecognized low inter-rater and test-retest reliability with strong implications for statistical power and bias. ...
In addition, we show the probable consequences of increased reliability for precision, p-values, effect inflation, and power calculation, exemplified by a systematic analysis of experimental stroke studies ...
In order to analyze the effect of reliability on the precision of the observed effect, we took advantage of the assumptions that a. the t-test can be seen as a linear model stroke size ¼ Ã treatment group ...
doi:10.1177/0271678x16681311
pmid:27909266
pmcid:PMC5536806
fatcat:zt2l2opqnncijgdsgoqbrkpoci
Assessing Momentary Well-Being in People Living With Dementia: A Systematic Review of Observational Instruments
2021
Frontiers in Psychology
, measurement invariance, cross-cultural validity, measurement error and inter-rater/intra-rater/test–retest reliability and responsiveness. ...
Twenty-two instruments assessing well-being were included for evaluation of measurement properties based on the systematic approach of the COnsensus-based Standards for the selection of health Measurement ...
ACKNOWLEDGMENTS We wish to acknowledge librarian Kjersti Aksnes-Hopland at the University of Bergen Library for her important advice about search strategies, databases and tools for deduplication and management ...
doi:10.3389/fpsyg.2021.742510
pmid:34887803
pmcid:PMC8649635
fatcat:rfbqlnvsunbgdelq654dmg4vpq
« Previous
Showing results 1 — 15 out of 444 results