The Development and Evaluation of Interactional Competence Elicitor for Oral Language Assessments

Evgeny Chukharev‐Hudilainen, Gary J. Ockey
2021 ETS Research Report Series  
Evgeny Chukharev-Hudilainen Gary J. Ockey The TOEFL ® test is the world's most widely respected English language assessment, used for admissions purposes in more than 130 countries including Australia, Canada, New Zealand, the United Kingdom, and the United States. Since its initial launch in 1964, the TOEFL test has undergone several major revisions motivated by advances in theories of language ability and changes in English teaching practices. The most recent revision, the TOEFL iBT ® test,
more » ... ntains a number of innovative design features, including integrated tasks that engage multiple skills to simulate language use in academic settings and test materials that reflect the reading, listening, speaking, and writing demands of real-world academic environments. In addition to the TOEFL iBT, the TOEFL Family of Assessments has expanded to provide high-quality English proficiency assessments for a variety of academic uses and contexts. The TOEFL Young Students Series (YSS) features the TOEFL ® Primary ™ and TOEFL Junior ® tests, designed to help teachers and learners of English in school settings. The TOEFL ITP ® Assessment Series offers colleges, universities, and others an affordable test for placement and progress monitoring within English programs. Since the 1970s, the TOEFL tests have had a rigorous, productive, and far-ranging research program. ETS has made the establishment of a strong research base a consistent feature of the development and evolution of the TOEFL tests, because only through a rigorous program of research can a testing company demonstrate its forward-looking vision and substantiate claims about what test takers know or can do based on their test scores. In addition to the 20-30 TOEFL-related research projects conducted by ETS Research & Development staff each year, the TOEFL Committee of Examiners (COE), composed of distinguished language-learning and testing experts from the academic community, funds an annual program of research supporting the TOEFL family of assessments, including projects carried out by external researchers from all over the world. To date, hundreds of studies on the TOEFL tests have been published in refereed academic journals and books. In addition, more than 300 peer-reviewed reports about TOEFL research have been published by ETS. These publications have appeared in several different series historically: TOEFL Monographs, TOEFL Technical Reports, TOEFL iBT Research Reports, and TOEFL Junior Research Reports. It is the purpose of the current TOEFL Research Report Series to serve as the primary venue for all ETS publications on research conducted in relation to all members of the TOEFL Family of Assessments. Current (2020-2021) members of the TOEFL COE are: To obtain more information about the TOEFL programs and services, use one of the following: E-mail: toefl@ets.org Web site: www.ets.org/toefl ETS is an Equal Opportunity/Affirmative Action Employer. As part of its educational and social mission and in fulfilling the organization's non-profit Charter and Bylaws, ETS has and continues to learn from and also to lead research that furthers educational and measurement research to advance quality and equity in education and assessment for all users of the organization's products and services. This paper describes the development and evaluation of Interaction Competence Elicitor (ICE), a spoken dialog system (SDS) for the delivery of a paired oral discussion task in the context of language assessment. The purpose of ICE is to sustain a topic-specific conversation with a test taker in order to elicit discourse that can be later judged to assess the test taker's oral language ability, including interactional competence. The development of ICE is reported in detail to provide guidance for future developers of similar systems. The performance of ICE is evaluated on two aspects: (a) by analyzing system errors that occur at different stages in the natural language processing (NLP) pipeline in terms of both their preventability and their impact on the downstream stages of the pipeline, and (b) by analyzing questionnaire and semistructured interview data to establish the test takers' experience with the system. Findings suggest that ICE was robust in 90% of the dialog turns it produced, and test takers noted both positive and negative aspects of communicating with the system as opposed to a human interlocutor. We conclude that this prototype system lays important groundwork for the development and use of specialized SDSs in the assessment of oral communication, which includes interactional competence.
doi:10.1002/ets2.12319 fatcat:6dx64psmv5aa3k37hlay5lwwfe