Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: practicing how to use the map family of functions from purrr, and how to perform basic computations on data frames using dplyr.

Installing and loading packages

Below we install tidyverse which gives us the packages we need (purrr and dplyr) needed to complete this lab. We also install the repurrrsive package which has the Game of Thrones data set that we’ll use for the first couple of questions. Since this may be the first time installing packages for some of you, we’ll show you how. If you already have these packages installed, then you can of course skip this part. Note: do not remove eval=FALSE from the above code chunk, just run the lines below in your console. You can also select “Tools” –> “Install Packages” from the RStudio menu.

install.packages("tidyverse")
install.packages("repurrrsive")

Now we’ll load these packages. Note: the code chunk below will cause errors if you try to knit this file without installing the packages first.

Post edit. Loading the tidyverse package in its entirety includes plyr, and this can cause namespace issues with the dplyr package. Better to just load only what you need.

library(purrr)
library(dplyr)
library(repurrrsive)

Game of Thrones data set

Below we inspect a data set on the 30 characters from Game of Thrones from the repurrrsive package. It’s stored in a list called got_chars, which is automatically loaded into your R session when you load the `repurrrsive`` package.

class(got_chars)
## [1] "list"
length(got_chars)
## [1] 30
names(got_chars[[1]])
##  [1] "url"         "id"          "name"        "gender"      "culture"    
##  [6] "born"        "died"        "alive"       "titles"      "aliases"    
## [11] "father"      "mother"      "spouse"      "allegiances" "books"      
## [16] "povBooks"    "tvSeries"    "playedBy"
got_chars[[1]]$name
## [1] "Theon Greyjoy"
got_chars[[1]]$aliases
## [1] "Prince of Fools" "Theon Turncloak" "Reek"            "Theon Kinslayer"

Q1. Warming up with map

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q2. Cultural studies

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Rio Olympics data set

This is data set from the Rio Olympics data set that we saw in Lab 3. In the next question, we’re going to repeat some calculations from Lab 3 but using dplyr.

rio = read.csv("https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/rio.csv")

Q3. Practice with grouping and summarizing

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Fastest 100m sprint times

Below, we read two data sets of the 1000 fastest times ever recorded for the 100m sprint, in men’s and women’s track. We scraped this data from http://www.alltime-athletics.com/m_100ok.htm and http://www.alltime-athletics.com/w_100ok.htm, in early September 2021. (Interestingly, the 2nd, 3rd, 4th, 7th, and 8th fastest women’s times were all set at the most recent Tokyo Olympics, or after! Meanwhile, the top 10 men’s times are all from about a decade ago.)

sprint.m.df = read.table(
  file="https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/sprint.m.txt", 
  sep="\t", quote="", header=TRUE)
sprint.w.df = read.table(
  file="https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/sprint.w.txt", 
  sep="\t", quote="", header=TRUE)

More practice with data frame computations

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q5. Practice with grouping

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE