Organized by the International Educational Data Mining Society (IEDMS).


To discover more about sponsorship contact David Lindrum

Workshops & Tutorials

The workshops and tutorial day will be on Wednesday June 29th, 2016. The schedule will be as follows:

09:00 - 10:30 Session 1
10:30 - 11:00 coffee break
11:00 - 12:30 Session 2
12:30 - 14:00 Lunch
14:00 - 15:·30 Session 3
15:30 - 16:00 coffee break
16:00 - 17:30 Session 4


Workshop Proceedings

Citation Information:

Jonathan Rowe and Erica Snow (eds.) Proceedings of the EDM 2016 Workshops and Tutorials co-located with the 9th International Conference on Educational Data Mining.


WS-1: Computer-Supported Peer Review in Education (CSPRED-2016) - Full day

Edward Gehringer, North Carolina State University
Ferry Pramudianto, North Carolina State University
Yang Song, North Carolina State University

Computer-supported peer review is drawing increasing attention from educators and researchers. It produces more copious feedback than the instructor or course staff could provide, and delivers it more quickly. It provides authors with multiple perspectives on their work, rather than the singular voice of a teacher. For the instructor, it generates multiple performance measures that can be used to judge the class’s progress.

As an inherently interdisciplinary topic, peer review stands to benefit from the perspectives of learning scientists, technologists, and instructors, as well as psychologists, anthropologists, statisticians, designers, and other interested parties. The workshop calls for presentation of both early and mature research. Technology demonstrations are also welcome.

WS-2: Writing Analytics, Data Mining, and Writing Studies - Full day

Val Ross, University of Pennsylvania
Alex Rudniy, Fairleigh Dickinson University
Joe Moxley, University of South Florida
David Eubanks, Furman University

The primary goal of this workshop is to facilitate a research community around the topic of large-scale data analysis with a particular focus on writing studies, data mining, and analytics. The workshop hopes to generate cross-disciplinary research among writing program directors and faculty, computational linguists, computer scientists, and educational measurement specialists.

Presenters will address

Digital tools such as My Reviewers that enable instructors to grade and comment on student papers and peer reviews online are transforming how instructors and students critique documents and have the potential to transform how writing and writing programs are assessed. Beyond profoundly altering how faculty and students respond to writing, these tools aggregate e-portfolios, facilitate distributive evaluation, and archive data that allow researchers to mine texts and map student outcomes in order to produce analytics that inform users, researchers, and administrators. Rather than limit assessment to cognitive measures, these toolsets facilitate gathering authentic assessment information about students’ intrapersonal and interpersonal competencies.

WS-3: Educational Data Analysis using LearnSphere - Full day

John Stamper, Carnegie Mellon University
Kenneth Koedinger, Carnegie Mellon University
Philip Pavlik, University of Memphis
Carolyn Rose, Carnegie Mellon University
Ran Liu, Carnegie Mellon University
Michael Eagle, Carnegie Mellon University
Michael Yudelson, Carnegie Mellon University
Kalyan Veeramachaneni, Massachusetts Institute of Technology

LearnSphere’s goal is to support any custom analysis workflow that can be applied to educational datasets (such as those in DataShop, DiscourseDB, MOOCdb) and to produce standardized workflow outputs that facilitate quantitative and qualitative model comparisons. We invite researchers to submit 2-4 page analysis descriptions. Strong submissions will have high level descriptions of the analysis workflow and detailed information on the format of input data and resulting outputs. Accepted participants will be eligible for a travel scholarship and have the opportunity to publish outcomes in the EDM workshop proceedings.

This workshop will explore the application and refinement of novel educational data mining workflows using LearnSphere, a new $5 million NSF funded data sharing and analysis portal that extends the existing DataShop infrastructure and includes teams from Carnegie Mellon, Stanford, Memphis, and MIT. Increased flexibility to accommodate custom educational analysis workflows is one of the core ways in which LearnSphere expands upon DataShop.


Tutorial 1: SAS Tools for Educational Data Mining - Full day

Jennifer Sabourin, SAS Institute
Scott McQuiggan, SAS Institute
Andre De Waal, SAS Institute

Researchers in the EDM community have always relied on sophisticated tools to analyze data and build models. As the amount of data that can be collected and stored grows, the need for tools capable of handling "big data" becomes ever more prevalent. SAS® Analytics U is a new initiative for making SAS data analysis and mining tools available for free to educational researchers and instructors. These tools are designed for handling very large data sets and can be run in the cloud, saving researchers valuable time and resources. Furthermore, SAS Analytics U provides a community of SAS educators and learners to share resources and information about SAS tools and techniques. This tutorial aims to introduce researchers to the tools available through SAS Analytics U and how they can be applied to the field of Educational Data Mining. We will provide an overview of the SAS architecture and provide instruction on the key features of each tool in the suite.

If you intend to participate in the hands-on activities, please bring a laptop with SAS University Edition already installed. The process can take up to an hour so there will not be time for it on the day of the tutorial. The free download is available at

Tutorial 2: Massively Scalable EDM with Spark - Full day

Tristan Nixon, Institute for Intelligent Systems, University of Memphis

The creation and availability of ever-larger datasets is motivating the development of new distributed technologies to store and process data across clusters of servers. Apache Spark has emerged as the new standard platform for developing highly scalable cluster computing applications. It offers a wide range of connectors to numerous databases and enterprise data management systems, an ever-growing library of machine-learning algorithms and the ability to process streaming data in near-realtime. Developers can write their applications in Java, Scala, Python and R. Applications can be run locally (for easy development and testing), and deployed to dedicated clusters or on clusters leased from cloud-computing providers. This will be a full-day tutorial at EDM 2016 on developing massively scalable machine learning and data mining applications with Spark. Participants will be expected to follow along with all examples on their own laptops throughout the tutorial. All code used in the tutorial will either be taken from publicly available examples, or be available for download from the IEDMS github repository, and made available under a very liberal open source license. All examples will be designed to process a modestly sized sample of the KDD cup dataset available from the DataShop.