Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume Iii and Lise Getoor
The large number of students participating and the availability of interaction data in massive open online courses (MOOCs) provides an opportunity to study and uncover student engagement patterns, develop technology that can help improve student engagement and facilitate instructor interventions. Our work looks at several problems in MOOCs: 1) modeling student engagement to predict course completion, 2) analyzing changes in engagement patterns, and 3) understanding discussion forum content and the relationship to course completion.
When users interact on a MOOC, they leave behind cues suggestive of their engagement with the course and intention to complete the course. We have developed a data-driven model of student engagement in MOOCs using features from users’ interaction with the MOOC, and use that to predict course completion (course survival) . Our model uses behavioral cues (course related activities such as viewing lectures, giving assignments, participating in discussion forum), forum content (po- larity and subjectivity of forum posts), and forum interaction structure to distinguish between forms of student engagement (active and passive). The engagement types are represented as latent variables in our model and are learned from observed data. We then use the latent variables to predict student survival. We use probabilistic soft logic (PSL) , to represent ob- served features, (latent and target) variables as logical predicates and construct rules over these to capture domain knowledge. We evaluated our models on predicting learner survival across three MOOCs—Surviving Disruptive Technologies, Women and the Civil Rights Movement and Gene and Human Condition. We demonstrated that incorporating latent engagement variables helps in predicting student survival.
In order to design effective interventions, we need to identify students at a risk of dropping out early-on in the course. We conducted experiments to predict student survival early-on in the course, by training our models on data from the initial part of the course. Our experiments show that our models, especially the latent model, is able to predict student survival reliably at an early stage when compared to the model without latent variables. In addition to improved prediction accuracy, our latent engagement model also unveils interesting patterns in student engagement. Analyzing latent engagement estimates predicted by our model, we find that passive engagement dominates in the beginning and there is an increase in active forms of en- gagement toward the end. Examining the transition between engagement types, we observed that most passive users show an increase in active engagement levels prior to dropping out. This is suggestive of help-seeking/complaining/expressing dissat- isfaction or difficulty in following course materials in discussion forums before dropping out. Probing these forum-posts, we can uncover reasons leading to student disengagement and dropout and identify students that can be helped via intervention. This leads us to perform a more close analysis of forum content.
Our second contribution is looking more closely at discussion forum content for student survival indicators. MOOC discussion forums are the principal means of interaction among MOOC participants. Negative sentiment can indicate dissatisfaction with the class, however it can also be used to express an opinion showing high levels of engagement. Negative sentiment in course related discussions does not imply attitude toward the course and does not mean disengagement, but in logistics or feedback posts it signifies disengagement. Discerning between the two types of sentiment is vital as we are trying to identify students at a risk of dropping out. We make use of recent improvements in topic modeling, Seeded Topic Modeling (SeededLDA) , to extract posts corresponding to course-logistics, feedback and course-related material [under review]. We leverage the knowledge of course syllabus and general nature of logistics and feedback posts to seed our model. We enhance our survival models described above with topic distribution from SeededLDA. Our rules capture sentiment and topic of posts to assess signs of engagement and disengagement. We demonstrate that inclusion of features from topic distribution in our survival models helps in predicting student survival.
Our current research focuses on applying our models to courses as they progress and identify possible instructor interventions.
 Jagadeesh Jagarlamudi, Hal Daume ́, III, and Raghavendra Udupa. Incorporating lexical priors into topic models. In Proceedings of EACL, 2012.
 Angelika Kimmig, Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. A short introduction to proba- bilistic soft logic. In NIPS Workshop on Probabilistic Programming: Foundations and Applications, 2012.
 Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume ́, III, and Lise Getoor. Learning latent engagement patterns of students in online courses. to appear in AAAI 2014.