Reproducible Data Science with R

by Dr Mine Çetinkaya-Rundel

About the course

For data analysis to be reproducible, the data and code should be assembled in a way such that results (e.g. tables and figures) can be re-created. While the scientific community is by and large in agreement that reproducibility is a minimal standard by which data analyses should be evaluated, and a myriad of software tools for reproducible computing exist, it is still not trivial to reproduce someone’s (sometimes your own!) results without fiddling with unavailable analysis data, external dependencies, missing packages, out of date software, etc. In this workshop, we will demonstrate a workflow for reproducible data science with R, R Markdown, Git, and GitHub. Experience with R is expected but familiarity with the other tools is not required. The workshop will consist of demonstrations and hands-on exercises.

About the speaker

Dr Mine Çetinkaya-Rundel is a Senior Lecturer in the School of Mathematics at the University of Edinburgh as well as Professional Educator and Data Scientist at RStudio. Her work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centred learning, and open-source education as well as pedagogical approaches for enhancing retention of women and underrepresented minorities in STEM. Dr Çetinkaya-Rundel works on integrating computation into the undergraduate statistics curriculum, using reproducible research methodologies and analysis of real and complex datasets. She also organises ASA DataFest, an annual two-day competition in which teams of undergraduate students work to reveal insights into a rich and complex data set. She works on the OpenIntro project, whose mission is to make educational products that are free, transparent, and lower barriers to education. As part of this project, she co-authored three open-source introductory statistics textbooks. Dr Çetinkaya-Rundel is also the creator and maintainer of datasciencebox.org and teaches the popular Statistics with R MOOC on Coursera. In 2018 she received the David Pickard Teaching Award and in 2016 the ASA Waller Education Award. She is also the recipient of the 2015 JSM Best Paper Award in the Section on Teaching Statistics in the Health Sciences and the 2014 Duke University David and Janet Vaughan Brooks Award for Teaching Excellence.

The slides and other workshop materials are available here.