Exploring Online Activities to Predict the Final Grade of Student
Student success rate is a significant indicator of the quality of the educational services offered at higher education institutions (HEIs). It allows students to make their plans to achieve the set goals and helps teachers to identify the at-risk students and make timely interventions. University decision-makers need reliable data on student success rates to formulate specific and coherent decisions to improve students' academic performance. In recent years, EDM has become an effective tool for
... exploring data from student activities to predict their final grades. This study presents a case study for predicting the students' final grades based on their activities in Moodle Learning Management System (LMS) and attendance in online lectures conducted via Zoom by applying statistical and machine learning techniques. The data set consists of the final grades for 105 students who study Object-Oriented Programming at the University of Plovdiv during the 2021–2022 year, data for their activities in the online course (7057 records), and attendance to lectures (738). The predictions are based on 46 attributes. The Chi-square test is utilized to assess the association between students' final grades and event context (lectures, source code, exercise, and assignment) and the relationships between attendance at lectures and final results. The logistic regression model is utilized to assess the actual impact of event context on "Fail" students in a multivariate setup. Four machine learning algorithms (Random Forest, XGBoost, KNN, and SVM) are applied using 70% of training data and 30% of test data to predict the students' final grades. Five-fold cross validation was also utilized. The results show correlations between the students' final grades and their activity in the online course and between students' final grades and attendance at lectures. All applied machine learning algorithms performed moderately well predicting the students' final results, as the Random Forest algorithm obtained the highest prediction accuracy—78%. The findings of the study clearly show that the Random Forest algorithm may be used to predict which students will fail after eight weeks. Such data-driven predictions are significant for teachers and decision-makers and allow them to take measures to reduce the number of failed students and identify which types of learning resources or student activities are better predictors of the student's academic performance.