Application and Comparison of Clustering Methods to Educational Process Data
Loading...
Date
Authors
Luo, Yu
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cluster analysis has great potential for analyzing the vast amounts of process data that record the online learning behaviors of students. It can be used to develop profiles of student groups that help instructors understand students’ online learning patterns. However, one of the major challenges in employing cluster analysis is to select a suitable one among many clustering algorithms. This methodological paper introduces and compares three clustering algorithms, including two popular non-hierarchical clustering methods, k-means and k-medoids, and one hierarchical method called agglomerative hierarchical clustering analysis (HCA). The dataset used for demonstration is a publicly available dataset, Open University Learning Analytics (OULA), which contains information on online modules, student demographics, and students' clicks on the virtual learning environment (VLE). To examine the utility of process features and performance of the selected clustering algorithms in predicting students' module outcome (i.e., pass or fail), one module was selected (N = 1299), and 18 process features were developed. After obtaining the clustering results from each algorithm, logistic regression was used to compare and validate the cluster memberships with students' module outcomes (i.e., pass or fail). Multiple logistic regression was employed to explore the demographics and process feature compositions of the most predictive clustering results. The results of the present study showed that k-means and k-medoids generated comparable results, while agglomerative HCA produced the most dissimilar yet most predictive results compared to k-means and k-medoids. Multiple logistic regression results showed that students who engaged in certain VLE activities such as taking quizzes or joining discussion forums had a higher chance of being in the high-performance group (i.e., the group with a higher probability of passing the module). Limitations and future research directions were discussed.
Description
Thesis (Master's)--University of Washington, 2022
