Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: getting familiar with data frames; practicing how to use the apply family of functions.

States data set

Below we construct a data frame, of 50 states x 10 variables. The first 8 variables are numeric and the last 2 are factors. The numeric variables here come from the built-in state.x77 matrix, which records various demographic factors on 50 US states, measured in the 1970s. You can learn more about this state data set by typing ?state.x77 into your R console.

state.df = data.frame(state.x77, Region=state.region, Division=state.division)

Q1. Basic data frame manipulations

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Prostate cancer data set

Below we read in the prostate cancer data set that we looked in the last lab. You can remind yourself about what’s been measured by looking back at the lab.

pros.dat = 
  read.table("http://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/pros.dat")

Q2. Practice with the apply family

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
t.test.by.ind = function(x, ind) {
  stopifnot(all(ind %in% c(0, 1)))
  return(t.test(x[ind == 0], x[ind == 1]))
}
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Rio Olympics data set

Now we’re going to examine data from the 2016 Summer Olympics in Rio de Janeiro, taken from https://github.com/flother/rio2016 (complete data on the 2020 Summer Olympics in Tokyo doesn’t appear to be available yet). Below we read in the data and store it as rio.

rio = read.csv("http://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/data/rio.csv")

Q3. More practice with data frames and apply

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q4. Young and old folks

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
                  Youngest Oldest
athletics               14     41
archery                 17     44
athletics               16     47

You’ll notice that we set the row names according to the sports, and we also set appropriate column names. Hint: unlist() will unravel all the values in a list; and matrix(), as you’ve seen before, can be used to create a matrix from a vector of values. After you’ve converted the results to a matrix, print it to the console (and make sure its first 3 rows match those displayed above).

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q5. Sport by sport

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE