The effect of item format on mathematics achievement set scores

Leslie Hubert Dukowski
1982
The purpose of the study was to determine whether item format significantly affected scores on a mathematics achievement test. A forty-two item test was constructed and cast in both multiple-choice and constructed-response formats. The items were chosen in such a way that in each of three content domains, Computation, Application, and Algebra, there were seven items at each of two difficulty levels. The two tests were then administered on separate occasions to a sample of 213 Grade 7 students
more » ... Grade 7 students from a suburban/ rural community in British Columbia, Canada. The data gathered was analysed according to a repeated measures analysis of variance procedure using item format and item difficulty as trial factors and using student ability and gender as grouping factors. Item format did have a significant (p < 0.05) effect on test score. In all domains multiple-choice scores were higher than constructed-response scores. The multiple-choice scores were also transformed using the traditional correction for guessing procedure and analysed. Multiple-choice scores were still significantly higher in two of the three domains, Application and Algebra. There were significant omnibus F-statistics obtained for a number of interactions for both corrected and uncorrected data but there were significant Tetrad differences (p < 0.10) only for interactions involving format and difficulty. The results indicate that students score higher on a multiple-choice form of a mathematics achievement test than on a constructed-response form, and therefore the two scores cannot be considered equal or interchangeable. However, because of the lack of interactions involving format, the two scores may be considered equivalent in the sense that they rank students in the same manner and that the intervals between scores may be interpretable in the same manner under both formats. Therefore, although the traditional correction for chance formula is not sufficient to remove differences between multiple-choice and constructed-response scores, it [...]
doi:10.14288/1.0055084 fatcat:zo3xsibzznggpjvxfyfdbz7qgi