Statistical Computing, 36-350
Tuesday August 30, 2022
Shiny is a R package that allows to write interactive web applications. Check out the Shiny Gallery
This is a very nice problem suggested as an example by Prof. Ryan Tibshirani.
Assume a \(n \times n\) grid of empty squares, with an intruder hiding behind a square, unknown to us. At each time point:
Our goal is to locate the intruder. Notice that, as times goes by, using brute force we are able to dynamically maintain a list of potential locations for the intruder, with the size of the list fluctuating over time. (Coding exercise: how would create and update such list?)
There are several questions we could ask:
We can deploy simulations to attempt to tackle the above questions. Absent any theoretical answers (which, to the best of my knowledge, have not been worked out yet), that is the best we can do. Take a look at the R code used for simulations, written by Prof. Ryan Tibshirani. By the end of this course (in fact, hopefully sooner) you should be able to read and understand it.
source("https://www.stat.cmu.edu/~arinaldo/Teaching/36350/F22/lectures/intruder.R")
intruder.sim(n=30, p=0.1)
## [1] 7
Data types, operators, variables
Two basic types of things/objects: data and functions
log
,
+
(takes two arguments), <
(two),
%%
(two), and mean
(one)A function is a machine which turns input objects, or arguments, into an output object, or a return value (possibly with side effects), according to a definite rule
The trick to good programming is to take a big transformation and break it down into smaller ones, and then break those down, until you come to tasks which are easy (using built-in functions)
At base level, all data can represented in binary format, by bits (i.e., TRUE/FALSE, YES/NO, 1/0). Basic data types:
TRUE
or FALSE
in RNA
,
NaN
, etc.-
for arithmetic negation, !
for Boolean negation+
,
-
, *
, and /
(though this is only
a partial operator). Also, %%
(for mod), and ^
(again partial)## [1] -7
## [1] 12
## [1] 2
These are also binary operators; they take two objects, and give back a Boolean
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
Warning: ==
is a comparison operator, =
is
not!
These basic ones are &
(and) and |
(or)
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] TRUE
Note: The double forms &&
and ||
are different! We’ll see them later
typeof()
function returns the data typeis.foo()
functions return Booleans for whether the
argument is of type fooas.foo()
(tries to) “cast” its argument to type
foo, to translate it sensibly into such a value## [1] "double"
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] "0.833333333333333"
## [1] 0.8333333
## [1] 5
## [1] FALSE
We can give names to data objects; these give us variables. Some variables are built-in:
## [1] 3.141593
Variables can be arguments to functions or operators, just like constants:
## [1] 31.41593
## [1] -1
We create variables with the assignment operator,
<-
or =
## [1] 3.142857
## [1] 31.42857
The assignment operator also changes values:
## [1] 31.42857
## [1] 30
What variables have you defined?
## [1] "approx.pi" "circumference" "diameter"
Getting rid of variables:
## [1] "approx.pi" "diameter"
## character(0)
Data structures
## [1] 7 8 10 45
## [1] TRUE
c()
function returns a vector containing all its
arguments in specified order1:5
is shorthand for c(1,2,3,4,5)
, and so
onx[1]
would be the first element, x[4]
the
fourth element, and x[-4]
is a vector containing all
but the fourth elementvector(length=n)
returns an empty vector of length
n; helpful for filling things up later
## [1] FALSE FALSE FALSE FALSE FALSE
## [1] 0 0 0 0 8
Arithmetic operator apply to vectors in a “componentwise” fashion
## [1] 0 0 0 0
## [1] -49 -64 -100 -2025
Recycling repeat elements in shorter vector when combined with a longer one
## [1] 0 0 3 37
## [1] 7.000000 1.000000 0.100000 6.708204
Single numbers are vectors of length 1 for purposes of recycling:
## [1] 14 16 20 90
Can do componentwise comparisons with vectors:
## [1] FALSE FALSE TRUE TRUE
Logical operators also work elementwise:
## [1] FALSE FALSE TRUE FALSE
To compare whole vectors, best to use identical()
or
all.equal()
:
## [1] TRUE TRUE TRUE TRUE
## [1] TRUE
## [1] FALSE
## [1] TRUE
Note: these functions are slightly different; we’ll see more later
Many functions can take vectors as arguments:
mean()
, median()
, sd()
,
var()
, max()
, min()
,
length()
, and sum()
return single numberssort()
returns a new vectorhist()
takes a vector of numbers and produces a
histogram, a highly structured object, with the side effect of making a
plotecdf()
similarly produces a cumulative-density-function
objectsummary()
gives a five-number summary of numerical
vectorsany()
and all()
are useful on Boolean
vectorsVector of indices:
## [1] 8 45
Vector of negative indices:
## [1] 8 45
Boolean vector:
## [1] 10 45
## [1] -10 -45
which()
gives the elements of a Boolean vector that are
TRUE
:
## [1] 3 4
## [1] -10 -45
We can give names to elements/components of vectors, and index vectors accordingly
## [1] "v1" "v2" "v3" "fred"
## fred v1
## 45 7
Note: here R is printing the labels, these are not additional
components of x
names()
returns another vector (of characters):
## [1] "fred" "v1" "v2" "v3"
## [1] 4
An array is a multi-dimensional generalization of vectors
## [,1] [,2]
## [1,] 7 10
## [2,] 8 45
dim
says how many rows and columns; filled by
columnsdim
is vector of
arbitrary lengthSome properties of our array:
## [1] 2 2
## [1] FALSE
## [1] TRUE
## [1] "double"
Can access a 2d array either by pairs of indices or by the underlying vector (column-major order):
## [1] 10
## [1] 10
Omitting an index means “all of it”:
## [1] 10 45
## [1] 10 45
## [,1]
## [1,] 10
## [2,] 45
Note: the optional third argument drop=FALSE
ensures
that the result is still an array, not a vector
Many functions applied to an array will just boil things down to the underlying vector:
## [1] 3 4
This happens unless the function is set up to handle arrays specifically
And there are several functions/operators that do preserve array structure:
## [,1] [,2]
## [1,] 0 0
## [2,] 0 0
A matrix is a specialization of a 2d array
## [,1] [,2]
## [1,] 40 60
## [2,] 1 3
## [1] TRUE
## [1] TRUE
ncol
for the number of columnsbyrow=TRUE
z.mat/3
)Matrices have its own special multiplication operator, written
%*%
:
## [,1] [,2] [,3]
## [1,] 7 7 7
## [2,] 7 7 7
## [,1] [,2] [,3]
## [1,] 700 700 700
## [2,] 28 28 28
Can also multiply a matrix and a vector
Row/column sums, or row/column means:
## [1] 100 4
## [1] 41 63
## [1] 50 2
## [1] 20.5 31.5
The diag()
function can be used to extract the diagonal
entries of a matrix:
## [1] 40 3
It can also be used to change the diagonal:
## [,1] [,2]
## [1,] 35 60
## [2,] 1 4
Finally, diag()
can be used to create a diagonal
matrix:
## [,1] [,2]
## [1,] 3 0
## [2,] 0 4
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
Transpose:
## [,1] [,2]
## [1,] 35 1
## [2,] 60 4
Determinant:
## [1] 80
Inverse:
## [,1] [,2]
## [1,] 0.0500 -0.7500
## [2,] -0.0125 0.4375
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
rownames()
and colnames()
names()
for vectorsA list is sequence of values, but not necessarily all of the same type
## [[1]]
## [1] "exponential"
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] FALSE
Most of what you can do with vectors you can also do with lists
[ ]
as with vectors[[ ]]
, but only with a single index
[[ ]]
drops names and structures, [ ]
does
not## [[1]]
## [1] 7
## [1] 7
## [1] 49
Add to lists with c()
(also works with vectors):
## [[1]]
## [1] "exponential"
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] FALSE
##
## [[4]]
## [1] 9
Chop off the end of a list by setting the length to something smaller (also works with vectors):
## [1] 4
## [[1]]
## [1] "exponential"
##
## [[2]]
## [1] 7
##
## [[3]]
## [1] FALSE
Pluck out all but one piece of a list (also works with vectors):
## [[1]]
## [1] "exponential"
##
## [[2]]
## [1] FALSE
We can name some or all of the elements of a list:
## $family
## [1] "exponential"
##
## $mean
## [1] 7
##
## $is.symmetric
## [1] FALSE
## [1] "exponential"
## $family
## [1] "exponential"
Lists have a special shortcut way of using names, with
$
:
## [1] "exponential"
## [1] "exponential"
Creating a list with names:
Adding named elements:
Removing a named list element, by assigning it the value
NULL
:
family
,
we can look that up by name, without caring where it is (in what
position it lies) in the listrowSums()
, summary()
,
apply()
)## v1 v2
## [1,] 35 10
## [2,] 8 4
## [1] 35 8
## v1 v2 logicals
## 1 35 10 TRUE
## 2 8 4 FALSE
## [1] 35 8
## [1] 35 8
## v1 v2 logicals
## 1 35 10 TRUE
## v1 v2 logicals
## 21.5 7.0 0.5
We can add rows or columns to an array or data frame with
rbind()
and cbind()
, but be careful about
forced type conversions
## v1 v2 logicals
## 1 35 10 TRUE
## 2 8 4 FALSE
## 3 -3 -5 TRUE
## v1 v2 logicals
## 1 35 10 1
## 2 8 4 0
## 3 3 4 6
Much more on data frames a bit later in the course …
So far, every list element has been a single data value. List elements can be other data structures, e.g., vectors and matrices, even other lists:
## $z.mat
## [,1] [,2]
## [1,] 35 60
## [2,] 1 4
##
## $my.lucky.num
## [1] 13
##
## $my.dist
## $my.dist$family
## [1] "exponential"
##
## $my.dist$mean
## [1] 7
##
## $my.dist$is.symmetric
## [1] FALSE
##
## $my.dist$last.updated
## [1] "2021-01-01"