Statistical Computing, 36-350
Tuesday October 11, 2022
nchar()
, substr()
: functions for substring
extractions and replacementsstrsplit()
, paste()
: functions for
splitting and combining stringstable()
: function to get word counts, useful way of
summarizing text dataPlot basics
Base R has a set of powerful plotting tools. An overview:
plot()
: generic plotting functionpoints()
: add points to an existing plotlines()
, abline()
: add lines to an
existing plottext()
, legend()
: add text to an existing
plotrect()
, polygon()
: add shapes to an
existing plothist()
, image()
: histogram and
heatmapheat.colors()
, topo.colors()
, etc: create
a color vectordensity()
: estimate density, which can be plottedcontour()
: draw contours, or add to existing plotcurve()
: draw a curve, or add to existing plotThe ggplot2
package also provides very nice (and very
different) plotting tools; we won’t cover it in this course (it tends to
be the focus in Statistical Graphics, 36-315)
To make a scatter plot of one variable versus another, use
plot()
The type
argument controls the plot type. Default is
p
for points; set it to l
for lines
Try also b
or o
, for both points and
lines
The main
argument controls the title; xlab
and ylab
are the x and y labels
Use the pch
argument to control point type
Try also 20
for small filled circles, or
"."
for single pixels
Use the lty
argument to control the line type, and
lwd
to control the line width
Use the col
argument to control the color. Can be:
The function colors()
returns a string vector of the
available colors
To set up a plotting grid of arbitrary dimension, use the
par()
function, with the argument mfrow
. Note:
in general this will affect all following plots! (Except in separate R
Markdown code chunks …)
Default margins in R are large (and ugly); to change them, use the
par()
function, with the argument mar
. Note:
in general this will affect all following plots! (Except in separate R
Markdown code chunks …)
Use the pdf()
function to save a pdf file of your plot,
in your R working directory. Use getwd()
to get the working
directory, and setwd()
to set it
## [1] "/Users/ale/Dropbox/Teaching/36-350/36-350_F22/lectures/plotting"
pdf(file="noisy_cubics.pdf", height=7, width=7) # Height, width are in inches
par(mfrow=c(2,2), mar=c(4,4,2,0.5))
plot(x, y, main="Red cubic", pch=20, col="red")
plot(x, y, main="Blue cubic", pch=20, col="blue")
plot(rev(x), y, main="Flipped green", pch=20, col="green")
plot(rev(x), y, main="Flipped purple", pch=20, col="purple")
graphics.off()
Also, use the jpg()
and png()
functions to
save jpg and png files
The main tools for this are:
points()
: add points to an existing plotlines()
, abline()
: add lines to an
existing plottext()
, legend()
: add text to an existing
plotrect()
, polygon()
: add shapes to an
existing plotYou’ll get practice with this on lab. Pay attention to layers—they work just like they would if you were painting a picture by hand
Histograms and heatmaps
To plot a histogram of a numeric vector, use hist()
Several options are available as arguments to hist()
,
such as col
, freq
, breaks
,
xlab
, ylab
, main
To add a histogram to an existing plot (say, another histogram), use
hist()
with add=TRUE
To estimate a density from a numeric vector, use
density()
. This returns a list; it has components
x
and y
, so we can actually call
lines()
directly on the returned object
## [1] "density"
## [1] "x" "y" "bw" "n" "call" "data.name" "has.na"
To plot a heatmap of a numeric matrix, use image()
## [,1] [,2] [,3] [,4] [,5]
## [1,] 6 7 8 9 10
## [2,] 12 14 16 18 20
## [3,] 18 21 24 27 30
## [4,] 24 28 32 36 40
## [5,] 30 35 40 45 50
image()
The orientation of image()
is to plot the heatmap
according to the following order, in terms of the matrix elements:
\[\begin{array}{cccc} (1,\text{ncol}) & (2, \text{ncol}) & \ldots & (\text{nrow},\text{ncol}) \\ \vdots & & & \\ (1,2) & (2,2) & \ldots & (\text{nrow},2) \\ (1,1) & (2,1) & \ldots & (\text{nrow},1) \end{array}\]
This is a 90 degrees counterclockwise rotation of the “usual” printed order for a matrix:
\[\begin{array}{cccc} (1,1) & (1,2) & \ldots & (1,\text{ncol}) \\ (2,1) & (2,2) & \ldots & (2,\text{ncol}) \\ \vdots & & & \\ (\text{nrow},1) & (\text{nrow},2) & \ldots & (\text{nrow},\text{ncol}) \end{array}\]
Therefore, if you want the displayed heatmap to follow the usual
order, you must rotate the matrix 90 degrees clockwise before
passing it in to image()
. (Equivalently: reverse the row
order, then take the transpose.) Convenient way of doing so:
The default is to use a red-to-white color scale in
image()
. But the col
argument can take any
vector of colors. Built-in functions gray.colors()
,
rainbow()
, heat.colors()
,
topo.colors()
, terrain.colors()
,
cm.colors()
all return continguous color vectors of given
length
To draw contour lines from a numeric matrix, use
contour()
; to add contours to an existing plot (like, a
heatmap), use contour()
with add=TRUE
Curves, surfaces, and colors
To draw a curve of a function, use curve()
To add a curve to an existing plot, use curve()
with
add=TRUE
To add a rug to an existing plot (just tick marks, for where the x
points occur), use rug()
To draw a surface, use surface()
, available at https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/lectures/surface.R.
(This is a function written by Professor Ryan Tibshirani, relying
on the built-in persp()
function)
To add points to a surface, save the output of
surface()
. Then use trans3d()
, to transform
(x,y,z) coordinates to (x,y) coordinates that you can pass to
points()
Color palettes are functions for creating vectors of contiguous
colors, just like gray.colors()
, rainbow()
,
heat.colors()
, terrain.colors()
,
topo.colors()
, cm.colors()
. Given a number n,
each of these functions just returns a vector of colors (names, stored
as strings) of length n
n = 50
plot(0, 0, type="n", xlim=c(1,n), ylim=c(1,6))
points(1:n, rep(6,n), col=gray.colors(n), pch=19)
points(1:n, rep(5,n), col=rainbow(n), pch=19)
points(1:n, rep(4,n), col=heat.colors(n), pch=19)
points(1:n, rep(3,n), col=terrain.colors(n), pch=19)
points(1:n, rep(2,n), col=topo.colors(n), pch=19)
points(1:n, rep(1,n), col=cm.colors(n), pch=19)
To create a custom palette, that interpolates between a set of base
colors, colorRampPalette()
## [1] "function"
Coloring points according to the value of some variable can just be done with a bit of indexing, and the tools you already know about colors
# Function to retrieve a color according to a value
# - val: the value in question
# - lim: a vector of length 2, lower and upper limits for possible values
# - col.vec: the color vector to choose from
get.col.from.val = function(val, lim, col.vec) {
col.vec[(val-lim[1])/(lim[2]-lim[1]) * (length(col.vec)-1) + 1]
}
# Let's color points according to y value
col.vec = heat.colors(30)
lim = c(-1, 1)
theta = seq(0, 6*pi, length=200)
plot(theta, sin(theta), type="o", pch=19,
col=get.col.from.val(sin(theta), lim, col.vec))
# Another example, now in 3d
persp.mat = surface(x^3 + y^3, from.x=-3, to.x=3, from.y=-3, to.y=3,
theta=25, phi=15, col=rgb(1,1,1,alpha=0.2),
ticktype="detailed", mar=c(2,2,2,2))
# Let's color points according to z value
col.vec = terrain.colors(30)
lim = c(min(z), max(z))
xy.list = trans3d(x, y, z, persp.mat)
points(xy.list, pch=20, col=get.col.from.val(z, lim, col.vec))
plot()
: generic plotting functionpoints()
: add points to an existing plotlines()
, abline()
: add lines to an
existing plottext()
, legend()
: add text to an existing
plotrect()
, polygon()
: add shapes to an
existing plothist()
, image()
: histogram and
heatmapheat.colors()
, topo.colors()
, etc: create
a color vectordensity()
: estimate density, which can be plottedcontour()
: draw contours, or add to existing plotcurve()
: draw a curve, or add to existing plot