Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: practice writing functions and running simulations.

Q1. Basic random number generation

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# plot.ecdf: plots ECDFs along with the true CDF, for varying sample sizes
# Inputs:
# - rfun: function which generates n random draws, when called as rfun(n)
# - pfun: function which calculates the true CDF at x, when called as pfun(x)
# - sizes: a vector of sample sizes
# Output: none
  
plot.ecdf = function(rfun, pfun, sizes) {
  # Draw the random numbers
  ## samples = lapply(sizes, ??)

  # Calculate the grid for the CDF
  grid.min = min(sapply(samples, min))
  grid.max = max(sapply(samples, max))
  grid = seq(grid.min, grid.max, length=1000)

  # Calculate the ECDFs
  ## ecdfs = lapply(samples, ??)
  evals = lapply(ecdfs, function(f) f(grid))

  # Plot the true CDF
  ## plot(grid, ??, type="l", col="black", xlab="x", ylab = "P(X <= x)")

  # Plot the ECDFs on top
  n.sizes = length(sizes)
  cols = rainbow(n.sizes)
  for (i in 1:n.sizes) {
    lines(grid, evals[[i]], col=cols[i])
  }
  legend("bottomright", legend=sizes, col=cols, lwd=1)
}

Q2. Drug effect simulation

We’re going to continue studying the drug effect model that was discussed in the “Simulation” lecture. Recall, we suppose that there is a new drug that can be optionally given before chemotherapy. We believe those who aren’t given the drug experience a reduction in tumor size of percentage: \[ X_{\mathrm{no\,drug}} \sim 100 \cdot \mathrm{Exp}(\mathrm{mean}=R), \;\;\; R \sim \mathrm{Unif}(0,1), \] whereas those who were given the drug experience a reduction in tumor size of percentage: \[ X_{\mathrm{drug}} \sim 100 \cdot \mathrm{Exp}(\mathrm{mean}=2). \]

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q3. Running simulations, saving money

For the next few questions, we will work with this hypothetical: suppose we work for a drug company that wants to put this new drug out on the market. In order to get FDA approval, your company must demonstrate that the patients who had the drug had on average a reduction in tumor size at least 100 percent greater than those who didn’t receive the drug, or in math: \[ \overline{X}_{\mathrm{drug}} - \overline{X}_{\mathrm{no\,drug}} \geq 100. \] Your drug company wants to spend as little money as possible. They want the smallest number \(n\) such that, if they were to run a clinical trial with \(n\) patients in each of the drug / no drug groups, they would likely succeed in demonstrating that the effect size (as above) is at least 100. Of course, the result of a clinical trial is random; your drug company is willing to take “likely” to mean successful with probability 0.95, i.e., successful in 190 of 200 hypothetical clinical trials (though only 1 will be run in reality).

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q4. AB testing

A common task in modern data science is to analyze the results of an AB test. AB tests are essentially controlled experiments: we obtain data from two different conditions, such as the different versions of a website we want to show to users, to try to determine which condition gives better results.

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE