Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: mastering the pipe operator %>%, practicing dplyr verbs, and pivoting using tidyr.

Loading the tidyverse

Now we’ll load the tidyverse suite of packages. (You should already have tidyverse installed from the last lab; but if for some reason you still need to install again, then you can just look back at the last lab’s instructions.) This gives us access to the pipe operator %>% as well as the dplyr and tidyr packages needed to complete this lab.

Post edit. Loading the tidyverse package in its entirety includes plyr, and this can cause namespace issues with the dplyr package. Better to just load only what you need.

library(dplyr)
library(tidyr)
library(purrr)

Q1. Pipes to base R

For each of the following code blocks, which are written with pipes, write equivalent code in base R (to do the same thing).

letters %>%
  toupper %>%
  paste(collapse="+") 
## [1] "A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X+Y+Z"
# YOUR CODE GOES HERE
"     Ceci n'est pas une pipe     " %>% 
  gsub("une", "un", .) %>%
  trimws
## [1] "Ceci n'est pas un pipe"
# YOUR CODE GOES HERE
rnorm(1000) %>% 
  hist(breaks=30, main="N(0,1) draws", col="pink", prob=TRUE) 

# YOUR CODE GOES HERE
rnorm(1000) %>% 
  hist(breaks=30, plot=FALSE) %>%
  `[[`("density") %>%
  max
## [1] 0.465
# YOUR CODE GOES HERE

Q2. Base R to pipes

For each of the following code blocks, which are written in base R, write equivalent code with pipes (to do the same thing).

paste("Your grade is", sample(c("A","B","C","D","R"), size=1))
## [1] "Your grade is B"
# YOUR CODE GOES HERE
state.name[which.max(state.x77[,"Illiteracy"])] 
## [1] "Louisiana"
# YOUR CODE GOES HERE
str.url = "https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/king.txt"

lines = readLines(str.url)
text = paste(lines, collapse=" ")
words = strsplit(text, split="[[:space:]]|[[:punct:]]")[[1]]
wordtab = table(words)
wordtab = sort(wordtab, decreasing=TRUE)
head(wordtab, 10)
## words
##        of  the   to  and    a   be will that   is 
##  203   98   98   58   40   37   32   25   24   23
# YOUR CODE GOES HERE
lines = readLines(str.url)
text = paste(lines, collapse=" ")
words = strsplit(text, split="[[:space:]]|[[:punct:]]")[[1]]
words = words[words != ""]
wordtab = table(words)
wordtab = sort(wordtab, decreasing=TRUE)
head(wordtab, 10)
## words
##   of  the   to  and    a   be will that   is   we 
##   98   98   58   40   37   32   25   24   23   21
# YOUR CODE GOES HERE

Prostate cancer data set

Below we read in the prostate cancer data set, as visited in previous labs.

pros.df = 
  read.table("https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/pros.dat")

Q3. Practice with dplyr verbs

In the following, use pipes and dplyr verbs to answer questions on pros.df.

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
t.test.by.ind = function(x, ind) {
  stopifnot(all(ind %in% c(0, 1)))
  return(t.test(x[ind == 0], x[ind == 1]))
} 
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Fastest 100m sprint times

Below, we read in two data sets of the 1000 fastest times ever recorded for the 100m sprint, in men’s and women’s track, as seen in the last lab.

sprint.m.df = read.table(
  file="https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/sprint.m.txt", 
  sep="\t", quote="", header=TRUE)
sprint.w.df = read.table(
  file="https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/sprint.w.txt", 
  sep="\t", quote="", header=TRUE)

Q4. More practice with dplyr verbs

In the following, use pipes and dplyr verbs to answer questions on sprint.w.df.

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q5. Practice pivoting wider and longer

In the following, use pipes and dplyr and tidyr verbs to answer questions on sprint.m.df. In some parts, it might make more sense to use direct indexing, and that’s perfectly fine.

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE