Statistical Computing, 36-350
Tuesday September 6, 2022
Indexing
There are 3 ways to index a vector, matrix, data frame, or list in R:
Note: in general, we have to set the names ourselves. Use
names()
for vectors and lists, and rownames()
,
colnames()
for matrices and data frames
The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:
set.seed(33) # For reproducibility
x.vec = rnorm(6) # Generate a vector of 6 random standard normals
x.vec
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
## [1] 1.010539
## [1] 1.0105390 -0.1582624 -2.1566375
## [1] 1.0105390 -0.1582624 -2.1566375
## [1] 1.0105390 -2.1566375 -0.1582624
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683
## [1] -0.13592452 -0.04079697 0.49864683
## [1] -0.13592452 -0.04079697 0.49864683
## [1] -0.13592452 -0.04079697 0.49864683
Examples for matrices:
x.mat = matrix(x.vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
# column major order
x.mat
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,] 1.01053901 0.4986468
## [1] -2.156638
## [1] -2.156638
## [1] -0.04079697 -2.15663750
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [1] -0.13592452 -0.04079697 1.01053901
## [1] -0.1582624 -2.1566375 0.4986468
Examples for lists:
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1] TRUE TRUE FALSE FALSE
## [1] TRUE TRUE FALSE FALSE
## [[1]]
## [1] TRUE TRUE FALSE FALSE
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
## [[1]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[2]]
## [1] TRUE TRUE FALSE FALSE
Note: you will get errors if you try to do either of above commands
with double brackets [[ ]]
This might appear a bit more tricky at first but is very useful, especially when we define a boolean vector “on-the-fly”. Examples for vectors:
## [1] 1.010539
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683
## [1] FALSE FALSE TRUE FALSE FALSE TRUE
## [1] 1.0105390 0.4986468
## [1] 1.0105390 0.4986468
Works the same way for lists; in lab, we’ll explore logical indexing for matrices
Indexing with names can also be quite useful. We must have names in
the first place; with vectors or lists, use names()
to set
the names
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
## $normals
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## $bools
## [1] TRUE TRUE FALSE FALSE
## [1] "list"
## [1] "numeric"
rownames()
and
colnames()
and named indexing with matricesControl flow (if, else, etc.)
Summary of the control flow tools in R:
if()
, else if()
, else
:
standard conditionalsifelse()
: conditional function that vectorizes
nicelyswitch()
: handy for deciding between several
optionsif()
and else
Use if()
and else
to decide whether to
evaluate one block of code or another, depending on a condition
## [1] 0.5
if()
needs to give one TRUE
or FALSE
valueelse
statement is optionalif (x >= 0) x else -x
else if()
We can use else if()
arbitrarily many times following an
if()
statement
## [1] 5
else if()
only gets considered if the conditions
above it were not TRUE
else
statement gets evaluated if none of the above
conditions were TRUE
else
statement is optionalIn the ifelse()
function we specify a condition, then a
value if the condition holds, and a value if the condition fails
## [1] 2
One advantage of ifelse()
is that it vectorizes nicely;
we’ll see this on the lab
Instead of an if()
statement followed by
elseif()
statements (and perhaps a final
else
), we can use switch()
. We pass a variable
to select on, then a value for each option
type.of.summary = "mode"
switch(type.of.summary,
mean=mean(x.vec),
median=median(x.vec),
histogram=hist(x.vec),
"I don't understand")
## [1] "I don't understand"
type.of.summary
to be a string,
either “mean”, “median”, or “histogram”; we specify what to do for
eachelse
clausetype.of.summary
above and see what
happensRemember our standard Boolean operators, &
and
|
. These combine terms elementwise
## [1] 0.54949775 -0.22561403 -0.72846986 0.80071515 0.13290531 -0.91453168 -0.02336149 -0.29755356 0.93932343 0.57915778
## [1] 0.5494977 999.0000000 -0.7284699 0.8007152 999.0000000 -0.9145317 999.0000000 999.0000000 0.9393234 0.5791578
In contrast to the standard Boolean operators,
&&
and ||
give just a single Boolean,
“lazily”: meaning we terminate evaluating the expression ASAP
## [1] FALSE
## [1] FALSE
&
and |
for
indexing or subsetting, and &&
and ||
for conditionalsIteration
Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming
Summary of the iteration methods in R:
for()
, while()
loops: standard loop
constructsfor()
loop, these are base R functionspurrr
packagefor()
A for()
loop increments a counter
variable along a vector. It repeatedly runs a code block, called the
body of the loop, with the counter set at its current
value, until it runs through the vector
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415 2.1972246 2.3025851
Here i
is the counter and the vector we are iterating
over is 1:n
. The body is the code in between the braces
We can break out of a for()
loop early
(before the counter has been iterated over the whole vector), using
break
n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("I'm outta here. I don't like numbers bigger than 2\n")
break
}
log.vec[i] = log(i)
}
## I'm outta here. I don't like numbers bigger than 2
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 0.0000000 0.0000000 0.0000000
for()
loopsMany different variations on standard for()
are
possible. Two common ones:
for()
loop can contain
another for()
loop (or several others)## Prof declined to comment
## Ale declined to comment
## Rinaldo declined to comment
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
while()
A while()
loop repeatedly runs a code block, again
called the body, until some condition is no longer
true
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
for()
versus while()
for()
is better when the number of times to repeat
(values to iterate over) is clear in advance
while()
is better when you can recognize when to
stop once you’re there, even if you can’t guess it to begin
with
while()
is more general, in that every
for()
could be replaced with a while()
(but
not vice versa)
while(TRUE)
or repeat
while(TRUE)
and repeat
: both do the same
thing, just repeat the body indefinitely, until something causes the
flow to break. Example (try running in your console):
for()
and while()
loops in R[ ]
and [[ ]]
)if()
, elseif()
, else
:
standard conditionalsifelse()
: shortcut for using if()
and
else
in combinationswitch()
: shortcut for using if()
,
elseif()
, and else
in combinationfor()
, while()
, repeat
:
standard loop constructsfor()
loops, vectorization is
your friend!apply()
and **ply()
: can also be very
useful (we’ll see them later)