lm()
allows you to fit a linear model by specifying a
formula, in terms of column names of a given data framecoef()
, fitted()
,
residuals()
, summary()
, plot()
,
predict()
are very handy and should be used over manual
access tricksglm()
with family="binomial"
and all the same utility functionsgam()
and utility functionsDebugging basics
The original name for glitches and unexpected defects: dates back to at least Edison in 1876, but better story from Grace Hopper in 1947:
(From Wikipedia)
Debugging is a the process of locating, understanding, and removing bugs from your code
Why should we care to learn about this?
Debugging is (largely) a process of differential diagnosis. Stages of debugging:
Step 0: make if happen again
Step 1: figure out if it’s a pervasive/big problem
Step 2: find out exactly where things are going wrong
traceback()
, and also
cat()
, print()
browser()
Sometimes error messages are easier to decode, sometimes they’re harder; this can make locating the bug easier or harder
my.plotter = function(x, y, my.list=NULL) {
if (!is.null(my.list))
plot(my.list, main="A plot from my.list!")
else
plot(x, y, main="A plot from x, y!")
}
my.plotter(my.list=list(x=-10:10, y=(-10:10)^3))
my.plotter() # Easy to understand error message
## Error in plot(x, y, main = "A plot from x, y!"): argument "x" is missing, with no default
my.plotter(my.list=list(x=-10:10, Y=(-10:10)^3)) # Not as clear
## Error in xy.coords(x, y, xlabel, ylabel, log): 'x' is a list, but does not have components 'x' and 'y'
Who called xy.coords()
? (Not us, at least not
explicitly!) And why is it saying ‘x’ is a list? (We never set it to be
so!)
traceback()
Calling traceback()
, after an error: traces back through
all the function calls leading to the error
If you run
my.plotter(my.list=list(x=-10:10, Y=(-10:10)^3))
in the
console, then call traceback()
, you’ll see:
> traceback()
5: stop("'x' is a list, but does not have components 'x' and 'y'")
4: xy.coords(x, y, xlabel, ylabel, log)
3: plot.default(my.list, main = "A plot from my.list!")
2: plot(my.list, main = "A plot from my.list!") at #2
1: my.plotter(my.list = list(x = -10:10, Y = (-10:10)^3))
We can see that my.plotter()
is calling
plot()
is calling plot.default()
is calling
xy.coords()
, and this last function is throwing the
error
Why? Its first argument x
is being set to
my.list
, which is OK, but then it’s expecting this list to
have components named x
and y
(ours are named
x
and Y
)
Debugging tools
cat()
, print()
Most primitive strategy: manually call cat()
or
print()
at various points, to print out the state of
variables, to help you localize the error
This is the “stone knives and bear skins” approach to debugging; it is still very popular among some people (actual quote from stackoverflow):
I’ve been a software developer for over twenty years … I’ve never had a problem I could not debug using some careful thought, and well-placed debugging print statements. Many people say that my techniques are primitive, and using a real debugger in an IDE is much better. Yet from my observation, IDE users don’t appear to debug faster or more successfully than I can, using my stone knives and bear skins.
R provides you with many debugging tools. Why should we use them, and
move past our handy cat()
or print()
statements?
Let’s see what our primitive hunter found on stackoverflow, after a receiving bunch of suggestions in response to his quote:
Sweet! … Very illuminating. Debuggers can help me do ad hoc inspection or alteration of variables, code, or any other aspect of the runtime environment, whereas manual debugging requires me to stop, edit, and re-execute.
browser()
One of the simplest but most powerful built-in debugging tools:
browser()
. Place a call to browser()
at any
point in your function that you want to debug. As in:
my.fun = function(arg1, arg2, arg3) {
# Some initial code
browser()
# Some final code
}
Then redefine the function in the console, and run it. Once execution
gets to the line with browser()
, you’ll enter an
interactive debug mode
While in the interactive debug mode granted to you by
browser()
, you can type any normal R code into the console,
to be executed within in the function environment, so you can, e.g.,
investigate the values of variables defined in the function
You can also type:
(To print any variables named n
, s
,
f
, c
, or Q
, defined in the
function environment, use print(n)
, print(s)
,
etc.)
You have buttons to click that do the same thing as “n”, “s”, “f”, “c”, “Q” in the “Console” panel; you can see the locally defined variables in the “Environment” panel; the traceback in the “Traceback” panel
As with cat()
, print()
,
traceback()
, used for debugging, you should only run
browser()
in the console, never in an Rmd code chunk that
is supposed to be evaluated when knitting
But, to keep track of your debugging code (that you’ll run in the
console), you can still use code chunks in Rmd, you just have to specify
eval=FALSE
# As an example, here's a code chunk that we can keep around in this Rmd doc,
# but that will never be evaluated (because eval=FALSE) in the Rmd file, take
# a look at it!
big.mat = matrix(rnorm(1000)^3, 1000, 1000)
big.mat
# Note that the output of big.mat is not printed to the console, and also
# that big.mat was never actually created! (This code was not evaluated)
Testing
Testing is the systematic writing of additional code to ensure your functions behave properly. We’ll focus on two aspects
Benefits of testing:
Of course, this requires you to spend more time upfront, but it is often worth it (saves time spent debugging later)
Assertions are checks to ensure that the inputs to your function are properly formatted
# Function to create n x n matrix of 0s
create.matrix.simple = function(n){
matrix(0, n, n)
}
# Not meaningful errors!
create.matrix.simple(4)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
create.matrix.simple(4.1)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
create.matrix.simple("asdf")
## Error in matrix(0, n, n): non-numeric matrix extent
assert_that()
We’ll be using assert_that()
function in the
assertthat
package to make assertions: allows us to write
custom, meaningful error messages
library(assertthat)
create.matrix = function(n){
assert_that(length(n) == 1 && is.numeric(n) &&
n > 0 && n %% 1 == 0,
msg="n is not a positive integer")
matrix(0, n, n)
}
# Errors are now meaningful
create.matrix(4)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
create.matrix(4.1)
## Error: n is not a positive integer
create.matrix("asdf")
## Error: n is not a positive integer
# Function that performs linear regression
run.lm.simple = function(dat){
res = lm(X1 ~ ., data = dat)
coef(res)
}
mat = matrix(rnorm(20), 10, 2)
colnames(mat) = paste0("X", 1:2)
dat = as.data.frame(mat)
# Not meaningful errors
run.lm.simple(dat)
## (Intercept) X2
## 0.09957273 0.21538540
run.lm.simple(mat)
## Error in model.frame.default(formula = X1 ~ ., data = dat, drop.unused.levels = TRUE): 'data' must be a data.frame, not a matrix or an array
# Meaningful errors
run.lm = function(dat){
assert_that(is.data.frame(dat),
msg="dat must be a data frame")
res = lm(X1 ~ ., data = dat)
coef(res)
}
run.lm(dat)
## (Intercept) X2
## 0.09957273 0.21538540
run.lm(mat)
## Error: dat must be a data frame
Unit tests are used to check that your code passes basic sanity
checks at various stages of development. We’ll be using
test_that()
function in the testthat
package
to do unit tests: you’ll learn the details in lab
Some high-level tips:
traceback()
, cat()
, print()
:
manual debugging toolsbrowser()
: interactive debugging toolassert_that()
, test_that()
: tools for
assertions and unit tests