"On the same page"? The effect of GP examiner feedback on differences in rating severity in clinical assessments: a pre/post intervention study

Nancy Sturman, Remo Ostini, Wai Yee Wong, Jianzhen Zhang, Michael David
2017 BMC Medical Education  
Robust and defensible clinical assessments attempt to minimise differences in student grades which are due to differences in examiner severity (stringency and leniency). Unfortunately there is little evidence to date that examiner training and feedback interventions are effective; "physician raters" have indeed been deemed "impervious to feedback". Our aim was to investigate the effectiveness of a general practitioner examiner feedback intervention, and explore examiner attitudes to this.
more » ... s: Sixteen examiners were provided with a written summary of all examiner ratings in medical student clinical case examinations over the preceding 18 months, enabling them to identify their own rating data and compare it with other examiners. Examiner ratings and examiner severity self-estimates were analysed pre and post intervention, using non-parametric bootstrapping, multivariable linear regression, intra-class correlation and Spearman's correlation analyses. Examiners completed a survey exploring their perceptions of the usefulness and acceptability of the intervention, including what (if anything) examiners planned to do differently as a result of the feedback. Results: Examiner severity self-estimates were relatively poorly correlated with measured severity on the two clinical case examination types pre-intervention (0.29 and 0.67) and were less accurate post-intervention. No significant effect of the intervention was identified, when differences in case difficulty were controlled for, although there were fewer outlier examiners post-intervention. Drift in examiner severity over time prior to the intervention was observed. Participants rated the intervention as interesting and useful, and survey comments indicated that fairness, reassurance, and understanding examiner colleagues are important to examiners. Conclusions: Despite our participants being receptive to our feedback and wanting to be "on the same page", we did not demonstrate effective use of the feedback to change their rating behaviours. Calibration of severity appears to be difficult for examiners, and further research into better ways of providing more effective feedback is indicated.
doi:10.1186/s12909-017-0929-9 pmid:28587597 pmcid:PMC5461633 fatcat:ufjoztg45jf55jp67fd4mgecwa