Data Manipulation in R


Date
Event
College of Forestry Workshop

An introduction to data manipulation in R via dplyr and tidyr.

This two-hour workshop is aimed at graduate students who have been introduced to R in statistics classes but haven’t had any training on how to work with data in R.

The workshop covers how to:

  • Make data summaries by group
  • Filter out rows
  • Select specific columns
  • Add new variables
  • Change the format of datasets (i.e., reshape datasets)
  • Join datasets together

Along the way students learn how to use the pipe operator to chain several data manipulation steps together. Students have time to practice data manipulation and reshaping using the babynames dataset from package babynames.

I provide an R script that we’ll run code from during the workshop as well as a PDF document. The PDF is a written version of the workshop, including code and output, to be used as a reference. Make sure you save the R script as a file ending in .R and not as a text document. You may need to set it to “All Files” and manually add .R when saving.

Interactive tutorials

In addition to the materials above, I have converted the workshop material into three interactive tutorials that can be run on your own computer. To use this you will need to have package learnr installed (install.packages("learnr")).

Click on a tutorial listed below and save the file as a .Rmd when the download window opens. Open this saved Rmd file with RStudio and click the Run Document button at the top to start the tutorial.

Tutorial 1: Data manipulation basics
Tutorial 2: Reshaping
Tutorial 3: Joins

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.