Back when I introduced scripts I said that R starts at the top of the file and runs straight through to the end of the file. That was a tiny bit of a lie. It is true that unless you insert commands to explicitly alter how the script runs, that is what will happen. However, you actually have quite a lot of flexibility in this respect. Depending on how you write the script, you can have R repeat several commands, or skip over different commands, and so on. This topic is referred to as flow control.
The first kind of flow control that I want to talk about is a loop. The idea is simple: a loop is a block of code (i.e., a sequence of commands) that R will execute over and over again until some termination criterion is met. To illustrate the idea, here’s a schematic picture showing the difference between what R does with a script that contains a loop and one that doesn’t:
Looping is a very powerful idea, because it allows you to automate repetitive tasks. Much like my children, R will execute an continuous cycle of “are we there yet? are we there yet? are we there yet?” checks against the termination criterion, and it will keep going forever until it is finally there and can break out of the loop. Rather unlike my children, however, I find that this behaviour is actually helpful.
There are several different ways to construct a loop in R. There are two methods I’ll talk about here, one using the while
command and another using the for
command.
while
loopA while
loop is a simple thing. The basic format of the loop looks like this:
while ( CONDITION ) {
STATEMENT1
STATEMENT2
ETC
}
The code corresponding to condition
needs to produce a logical value, either TRUE
or FALSE
. Whenever R encounters a while statement, it checks to see if the condition is TRUE
. If it is, then R goes on to execute all of the commands inside the curly brackets, proceeding from top to bottom as usual. However, when it gets to the bottom of those statements, it moves back up to the while statement. Then, like the mindless automaton it is, it checks to see if the condition is TRUE
. If it is, then R goes on to execute all the commands inside … well, you get the idea. This continues endlessly until at some point the condition
turns out to be FALSE
. Once that happens, R jumps to the bottom of the loop (i.e., to the }
character), and then continues on with whatever commands appear next in the script.
To start with, let’s keep things simple, and use a while
loop to calculate the smallest multiple of 179 that is greater than or equal to 1000. This is of course a very silly example, since you can calculate it using simple arithmetic, but the point here isn’t to do something novel. The point is to show how to write a while
loop. Here’s the code in action:
x <- 0
while(x < 1000) {
x <- x + 179
}
print(x)
## [1] 1074
When we run this code, R starts at the top and creates a new variable called x
and assigns it a value of 0. It then moves down to the loop, and “notices” that the condition here is x < 1000
. Since the current value of x
is zero, the condition is TRUE
, so it enters the body of the loop (inside the curly braces). There’s only one command here, which instructs R to increase the value of x
by 179. R then returns to the top of the loop, and rechecks the condition. The value of x
is now 179, but that’s still less than 1000, so the loop continues. Here’s a visual representation of that:
To see this in action, we can move the print
statement inside the body of the loop. By doing that, R will print out the value of x
every time it gets updated. Let’s watch:
x <- 0
while(x < 1000) {
x <- x + 179
print(x)
}
## [1] 179
## [1] 358
## [1] 537
## [1] 716
## [1] 895
## [1] 1074
Truly fascinating stuff. 🤔
To give you a sense of how you can use a while
loop in a more complex situation, let’s write a simple script to simulate the progression of a mortgage. Suppose we have a nice young couple who borrow $300000 from the bank, at an annual interest rate of 5%.1 The mortgage is a 30 year loan, so they need to pay it off within 360 months total. Our happy couple decide to set their monthly mortgage payment at $1600 per month. Will they pay off the loan in time or not? Only time will tell.
Or, alternatively, we could simulate the whole process and get R to tell us. The code to run this is a little more complicated, so we’ll keep it all in a script called mortgage.R.
# setup
month <- 0 # count the number of months
balance <- 300000 # initial mortgage balance
payments <- 1600 # monthly payments
interest <- 0.05 # 5% interest rate per year
total_paid <- 0 # track what you’ve paid the bank
# convert annual interest to a monthly multiplier
monthly_multiplier <- (1 + interest) ^ (1/12)
# keep looping until the loan is paid off...
while(balance > 0){
# do the calculations for this month
month <- month + 1# one more month
balance <- balance * monthly_multiplier # add the interest
balance <- balance - payments # make the payments
total_paid <- total_paid + payments # track the total paid
# print the results on screen
cat("month", month, ": balance", round(balance), "\n")
}
# print the total payments at the end
cat("total payments made", total_paid, "\n")
To explain what’s going on, let’s go through this script line by line. In the first part of the code, all we’re doing is specifying the variables that define the problem:
month <- 0 # count the number of months
balance <- 300000 # initial mortgage balance
payments <- 1600 # monthly payments
interest <- 0.05 # 5% interest rate per year
total_paid <- 0 # track what you’ve paid the bank
The loan starts with a balance
of $300,000 owed to the bank on month
zero, and at that point in time the total_paid
money is nothing. The couple is making monthly payments
of $1600, at an annual interest
rate of 5%.
The next step is to convert the annual percentage interest into a monthly multiplier. That’s this line:
monthly_multiplier <- (1 + interest) ^ (1/12)
If you’re like me and somehow have a panic attack every time someone asks you to do calculations with money, it might help to unpack this line a bit. What we’re trying to calculate here is a number (monthly_multiplier
) that we would have to multiply the current balance by each month in order to produce an annual interest rate of 5%. An annual interest rate of 5% implies that, if no payments were made over 12 months the balance would end up being 1.05 times what it was originally, so the annual multiplier is 1.05. To calculate the monthly multiplier, we need to calculate the 12th root of 1.05 (i.e., raise 1.05 to the power of 1/12). We store this value in as the monthly_multiplier
variable, which as it happens corresponds to a value of about 1.004. All of which is a rather long winded way of saying that the annual interest rate of 5% corresponds to a monthly interest rate of about 0.4%.
Have I mentioned how much I hate financial calculations?
Anyway… all of that is really just setting the stage. The interesting part of the script is the loop. The while
statement tells R that it needs to keep looping until the balance
falls below zero. If we strip out the frills, the loop looks like this:
while(balance > 0){
month <- month + 1 # one more month
balance <- balance * monthly_multiplier # add the interest
balance <- balance - payments # make the payments
total_paid <- total_paid + payments # track the total paid
}
Firstly we increase the value month
by 1. Next, the bank charges the interest, so the balance
goes up. Then, the couple makes their monthly payment and the balance
goes down. Finally, we keep track of the total amount of money that the couple has paid so far. After having done this number crunching, we tell R to issue the couple with a very terse monthly statement (that’s one of the frills I removed), indicating indicates how many months they’ve been paying the loan and how much money they still owe the bank. Which is rather rude of us really. I’ve grown attached to this couple and I really feel they deserve better. I guess that’s banks for you.
In any case, the key thing here is the tension between the increase in balance
due to interest and the decrease due to repayments. As long as the decrease is bigger, then the balance will eventually drop to zero and the loop will eventually terminate. If not, the loop will continue forever! This is very bad programming on my part: I should have included something to force R to stop if this goes on too long. For now, we’ll just have to hope that the author has rigged the example so that the loop actually terminates. Hm. I wonder what the odds of that are?
Anyway, assuming that the loop does eventually terminate, there’s one last line of code that prints the total amount of money that the couple handed over to the bank over the lifetime of the loan.
Now that I’ve explained everything in the script in tedious detail, let’s run it and see what happens:
source("./scripts/mortgage.R")
## month 1 : balance 299622
## month 2 : balance 299243
## month 3 : balance 298862
## month 4 : balance 298480
## month 5 : balance 298096
## month 6 : balance 297710
## month 7 : balance 297323
## month 8 : balance 296934
## month 9 : balance 296544
## month 10 : balance 296152
BLAH BLAH BLAH
BLAH BLAH BLAH
## month 352 : balance 4806
## month 353 : balance 3226
## month 354 : balance 1639
## month 355 : balance 46
## month 356 : balance -1554
## total payments made 569600
So our nice young couple have paid off the $300,000 loan just 4 months shy of the 30 year term of their mortgage, all at a bargain basement price of $568,046.
A happy ending! 🎉 🎈 🍰
for
loopThe for
loop is also pretty simple, though not quite as simple as the while
loop. The basic format of this loop goes like this:
for ( VAR in VECTOR ) {
STATEMENT1
STATEMENT2
ETC
}
In a for
loop, R runs a fixed number of iterations. We have a vector which has several elements, each one corresponding to a possible value of the variable var
. In the first iteration of the loop, var
is given a value corresponding to the first element of vector; in the second iteration of the loop var
gets a value corresponding to the second value in vector; and so on. Once we’ve exhausted all of the values in the vector, the loop terminates and the flow of the program continues down the script.
When I was a kid we used to have multiplication tables hanging on the walls at home, so I’d end up memorising the all the multiples of small numbers. I was okay at this as long as all the numbers were smaller than 10. Anything above that and I got lazy. So as a first example we’ll get R to print out the multiples of 137. Let’s say I want to it to calculate \(137 \times 1\), then \(137 \times 2\), and so on until we reach \(137 \times 10\). In other words what we want to do is calculate 137 * value
for every value
within the range spanned by 1:10
, and then print the answer to the console. Because we have a fixed range of values that we want to loop over, this situation is well-suited to a for
loop. Here’s the code:
for(value in 1:10) {
answer <- 137 * value
print(answer)
}
## [1] 137
## [1] 274
## [1] 411
## [1] 548
## [1] 685
## [1] 822
## [1] 959
## [1] 1096
## [1] 1233
## [1] 1370
The intuition here is that R starts by setting value
to 1. It then computes and prints 137 * value
, then moves back to the top of the loop. When it gets there, it increases value
by 1, and then repeats the calculation. It keeps doing this until the value
reaches 10 and then it stops. That intuition is essentially correct, but it’s worth unpacking it a bit further using a different example where R loops over something other than a sequence of numbers…
In the example above, the for
loop was defined over the numbers from 1 to 10, specified using the R code 1:10
. However, it’s worth keeping in mind that as far as R is concerned, 1:10
is actually a vector:
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
So in the previous example, the intuition about the for
loop is slightly misleading. When R gets to the top of the loop the action it takes is “assigning value
equal to the next element of the vector”. In this case it turns out that this action causes R to “increase value
by 1”, but that’s not true in general. To illustrate that, here’s an example in which a for
loop iterates over a character vector. First, I’ll create a vector of words
:
words <- c("it", "was", "the", "dirty", "end", "of", "winter")
Now what I’ll do is create a for
loop using this vector. For every word in the vector of words
R will do three things:
Here it is:
for(this_word in words) {
n_letters <- nchar(this_word)
block_word <- toupper(this_word)
cat(block_word, "has", n_letters, "letters\n")
}
## IT has 2 letters
## WAS has 3 letters
## THE has 3 letters
## DIRTY has 5 letters
## END has 3 letters
## OF has 2 letters
## WINTER has 6 letters
From the perspective of the R interpreter this is what the code four the for
loop is doing. It’s pretty similar to the while
loop, but not quite the same:
Of course, there are ways of doing this that don’t require you to write the loop manually. Because many functions in R operate naturally on vectors, you can take advantage of this. Code that bypasses the need to write loops is called vectorised code, and there are some good reasons to do this (sometimes) once you’re comfortable working in R. Here’s an example:
chars <- nchar(words)
names(chars) <- toupper(words)
print(chars)
## IT WAS THE DIRTY END OF WINTER
## 2 3 3 5 3 2 6
Sometimes vectorised code is easy to write and easy to read. I think the example above is pretty simple, for instance. It’s not always so easy though!
When you go out into the wider world of R programming you’ll probably encounter a lot of examples of people talking about how to vectorise your code to produce better performance. My advice for novices is not to worry about that right now. Loops are perfectly fine, and it’s often more intuitive to write code using loops than using vectors. Eventually you’ll probably want to think about these topics but it’s something that you can leave for a later date!
One of the things I used to find annoying about writing notes for programming classes is that programs are fundamentally dynamic things. They do stuff, and if you want to have a good feel for how they work you really do need to see the action happening. In contrast, lecture notes are - traditionally - static things. It’s just a fixed set of words. Except… this is the internet, so I can include animations!
If you recall, back when I introduced packages I installed the TurtleGraphics package from my GitHub page. Turtle graphics is a classic teaching tool in computer science, originally invented in the 1960s and reimplemnented over and over again in different programming languages. Let’s load that package.
library(TurtleGraphics)
Here’s the idea. You have a turtle, and she lives in a nice warm terrarium:
turtle_init()
Your job is to give her instructions, to program her to undertake certain actions. So for example, you can use the turtle_forward
command to get her to walk forwards, the turtle_left
command to get her to rotate to the left, and so on. For example, if I used this command my turtle would walk forward 10 steps and leave a trail behind her showing the path she took.
turtle_forward(distance = 10)
That seems simple enough, but what if I want my turtle to draw a more complicated shape? Let’s say I want her to draw a hexagon. There are six sides to the hexagon, so the most natural way to write code for this is to write a for
loop that loops over the sides! At each iteration within the loop, I’ll have the turtle walk fowards, and then turn 60 degrees to the left. Here’s what happens:
turtle_init()
for(side in 1:6) {
turtle_forward(distance = 10)
turtle_left(angle = 60)
}
Yay for turtles! Everybody loves turtles! 🐢🐢🐢🐢🐢🐢
To start with, here are some exercises with for
loops and turtles:
angle
rather than you having to manually work it out?As an exercise in using a while
loop, consider this vector:
telegram <- c("All","is","well","here","STOP","This","is","fine")
while
loop that prints words from telegram
until it reaches STOP
. When it encounters the word STOP
, end the loop. So what you want is output that looks like this.## [1] "All"
## [1] "is"
## [1] "well"
## [1] "here"
## [1] "STOP"
The solutions for these exercises are here.
I love that I wrote this when living in a city where $300000 felt like a semi-plausible number, rather than way, way too small. Sydney real estate is absurd.↩︎