Inter-observer agreement on the interpretation of capsule endoscopy findings based on capsule endoscopy structured terminology: A multicenter study by the Korean Gut Image Study Group

Byung Ik Jang, Si Hyung Lee, Jeong-Seop Moon, Dae Young Cheung, In Seok Lee, Jin Oh Kim, Jae Hee Cheon, Cheol Hee Park, Jeong-Sik Byeon, Youn Sun Park, Ki-Nam Shim, Yong-Sik Kim (+6 others)
2010 Scandinavian Journal of Gastroenterology  
Objective. Capsule endoscopy (CE) is a novel investigation for the diagnosis of small-bowel disease but its interpretation is highly subjective. We studied the inter-observer agreement and accuracy of the interpretation of CE findings based on capsule endoscopy structured terminology (CEST). Material and methods. Fifty-six CE video clips were collected from eight university hospitals in South Korea and were independently reviewed by 13 gastroenterology experts and 10 trainees. All investigators
more » ... . All investigators recorded their findings based on CEST. To determine the accuracy of individual viewers, we defined the 'gold standard' as a joint review by four experts. Results. The 56 CE video clips included five normal cases, 19 cases of protruding lesions, 21 cases of depressed lesions, three cases of flat lesions, one case of abnormal mucosa, six cases with blood in the lumen, and one case of stenotic lumen. The overall mean accuracies for the experts and trainees were 74.3% ± 22.6% and 61.7% ± 25.4%, respectively. The overall accuracy for the trainee group was significantly lower than that for the expert group (P < 0.001), especially in normal, tumor, venous structure, and ulcer cases. The accuracies of the two groups varied with the CE findings. The accuracies were higher in cases with more prominent intraluminal changes (e.g. active small-bowel bleeding, ulcer, tumor, stenotic lumen). In contrast, subtle mucosal lesions, such as erosion, angioectasia, and diverticulum, had lower accuracies. The mean kappa values for the experts and trainees were 0.61 (range 0.39-0.97) and 0.46 (range 0.17-0.66), respectively. Conclusions. Our results showed that there was substantial agreement between experts and moderate agreement between trainees. In order to achieve higher accuracies and better inter-observer agreement, we need not only more experience with CE but also consensus regarding CEST terminology.
doi:10.3109/00365520903521574 pmid:20148733 fatcat:dos4e5lb2zemhdplyc2wh4657u