When I introduced variables earlier, I showed you how we can use them to store a single number. In this section, we’ll extend this idea and look at how to store multiple numbers within the one variable. In R the name for a variable that can store multiple values is a vector. So let’s create one.

6.1 Character vectors

Let’s return to the example we were working with in the previous section on variables. We’re designing a survey, and we want to keep track of the responses that a participant has given. This time, let’s imagine that we’ve finished running the survey and we’re examining the data. Suppose we’ve administered the Depression, Anxiety and Stress Scale (DASS) and as a consequence every participant has scores for on the depression, anxiety and stress scales provided by the DASS. One thing we might want to do is create a single variable called scale_name that identifies the three scales. The simplest way to do this in R is to use the combine function, c.1 To do so, all we have to do is type the values we want to store in a comma separated list, like this

scale_name <- c("depression","anxpiety","stress")
## [1] "depression" "anxpiety"   "stress"

To use the correct terminology here, we have a single variable here called scale_name: this variable is a vector that has three elements. Because the vector contains text, it is a character vector. You can use the length function to check the length, and the class function to check what kind of vector it is:

## [1] 3
## [1] "character"

6.2 Numeric vectors

As you might expect, we can define numeric or logical variables in the same way. For instance, we could define the raw scores on the three DASS scales like so:

raw_score <- c(12, 3, 8)
## [1] 12  3  8

We’ll talk about logical vectors in a moment.

6.3 Extracting an element

If I want to extract the first element from the vector, all I have to do is refer to the relevant numerical index, using square brackets to do so. For example, to get the first element of scale_name I would type this

## [1] "depression"

The second element of the vector is

## [1] "anxpiety"

You get the idea.2

6.4 Extracting multiple elements

There are a few ways to extract multiple elements of a vector. The first way is to specify a vector that contains the indices of the variables that you want to keep. To extract the first two scale names:

## [1] "depression" "anxpiety"

Alternatively, R provides a convenient shorthand notation in which 1:2 is a vector containing the nubmers from 1 to 2, and similarly 1:10 is a vector containing the numbers from 1 to 10. So this is also the same:

## [1] "depression" "anxpiety"

Notice that order matters here. So if I do this

## [1] "anxpiety"   "depression"

I get the same numbers, but in the reverse order.

6.5 Removing elements

Finally, when working with vectors, R allows us to use negative numbers to indicate which elements to remove. So this is yet another way of doing the same thing:

## [1] "depression" "anxpiety"

Notice that done of this has changed the original variable. The scale_name itself has remained completely untouched.

## [1] "depression" "anxpiety"   "stress"

6.6 Editing vectors

Sometimes you’ll want to change the values stored in a vector. Imagine my surprise when a student points out that "anxpiety" is not in fact a real thing. I should probably fix that 😳. One possibility would be to assign the whole vector again from the beginning, using c. But that’s a lot of typing. Also, it’s a little wasteful: why should R have to redefine the names for all three scales, when only the second one is wrong? Fortunately, we can tell R to change only the second element, using this trick:

scale_name[2] <- "anxiety"
## [1] "depression" "anxiety"    "stress"

That’s better.

Another way to edit variables in is to use the edit function. I won’t go into that here, but if you’re curious, try typing a command like this:


6.7 Naming elements

One very handy thing in R is that it lets you assign meaningul names to the different elements in a vector. For example, the raw_scores vector that we introduced earlier contains the actual data from a study but when you print it out on its own

## [1] 12  3  8

its not obvious what each of the scores corresponds to. There are several different ways of making this a little more meaningful (and we’ll talk about them later) but for now I want to show one simple trick. Ideally, what we’d like to do is have R remember that the first element of the raw_score is the “depression” score, the second is “anxiety” and the third is “stress”. We can do that like this:

names(raw_score) <- scale_name

This is a bit of an unusual looking assignment statement. Usually, whenever we use <- the thing on the left hand side is the variable itself (i.e., raw_score) but this time around the left hand side refers to the names. To see what this command has done, let’s get R to print out the raw_score variable now:

## depression    anxiety     stress 
##         12          3          8

That’s a little nicer. Element names don’t just look nice, they’re functional too. You can refer to the elements of a vector using their names, like so:

## anxiety 
##       3

6.8 Vector operations

One really nice thing about vectors is that a lot of R functions and operators will work on the whole vector at once. For instance, suppose I want to normalise the raw scores from the DASS. Each scale of the DASS is constructed from 14 questions that are rated on a 0-3 scale, so the minimum possible score is 0 and the maximum is 42. Suppose I wanted to rescale the raw scores to lie on a scale from 0 to 1. I can create the scaled_score variable like this:

scaled_score <- raw_score / 42
## depression    anxiety     stress 
## 0.28571429 0.07142857 0.19047619

In other words, when you divide a vector by a single number, all elements in the vector get divided. The same is true for addition, subtraction, multiplicattion and taking powers. So that’s neat.

Suppose it later turned out that I’d made a mistake. I hadn’t in fact administered the complete DASS, only the first page. As noted in the DASS website, it’s possible to fix this mistake (sort of). First, I have to recognise that my scores are actually out of 21 not 42, so the calculation I should have done is this:

scaled_score <- raw_score / 21
## depression    anxiety     stress 
##  0.5714286  0.1428571  0.3809524

Then, it turns out that page 1 of the full DASS is almost the same as the short form of the DASS, but there’s a correction factor you have to apply. The depression score needs to be multiplied by 1.04645, the anxiety score by 1.02284, and stress by 0.98617

correction_factor <- c(1.04645, 1.02284, 0.98617)
corrected_score <- scaled_score * correction_factor
## depression    anxiety     stress 
##  0.5979714  0.1461200  0.3756838

What this has done is multiply the first element of scaled_score by the first element of correction_factor, multiply the second element of scaled_score by the second element of correction_factor, and so on.

I’ll talk more about calculations involving vectors later, because they come up a lot. In particular R has a thing called the recycling rule that is worth knowing about.3 But that’s enough detail for now.

6.9 Logical vectors

I mentioned earlier that we can define vectors of logical values in the same way that we can store vectors of numbers and vectors of text, again using the c function to combine multiple values. Logical vectors can be useful as data in their own right, but the thing that they’re expecially useful for is extracting elements of another vector, which is referred to as logical indexing.

Here’s a simple example. Suppose I decide that the stress scale is not very useful for my study, and I only want to keep the first two elements, depression and anxiety. One way to do this is to define a logical vector that indicates which values to keep:

keep <- c(TRUE, TRUE, FALSE) 

In this instance the keep vector indicates that it is TRUE that I want to retain the first two elements, and FALSE that I want to keep the third. So if I type this

## depression    anxiety 
##  0.5979714  0.1461200

R prints out the corrected scores for the two variables only. As usual, note that this hasn’t changed the original variable. If I print out the original vector…

## depression    anxiety     stress 
##  0.5979714  0.1461200  0.3756838

… all three values are still there. If I do want to create a new variable, I need to explicitly assign the results of my previous command to a variable.

Let’s suppose that I want to call the new variable short_score, indicating that I’ve only retained some of the scales. Here’s how I do that:

short_score <- corrected_score[keep]
## depression    anxiety 
##  0.5979714  0.1461200

6.10 Comment

At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data. For intance, I might want to extract the scores of the adult participants in a study, which would probably involve a command like scores[age > 18]. The operation age > 18 would return a vector of TRUE and FALSE values, and so the the full command scores[age > 18] would return only the scores for participants with age > 18. It does take practice to become completely comfortable using logical indexing, so it’s a good idea to play around with these sorts of commands. Practice makes perfect, and it’s only by practicing logical indexing that you’ll perfect the art of yelling frustrated insults at your computer.

6.11 Exercises

  • Use the combine function c to create a numeric vector called age that lists the ages of four people (e.g., 19, 34, 7 and 67)
  • Use the square brackets [] to print out the age of the second person.
  • Use the square brackets [] to print out the age of the second person and third persons
  • Use the combine function c to create a character vector called gender that lists the gender of those four people
  • Create a logical vector adult that indicates whether each participant was 18 or older. Instead of using c, try using a logical operator like > or >= to automatically create adult from age
  • Test your logical indexing skills. Print out the gender of all the adult participants.

The solutions for these exercises are here

  1. Notice that I didn’t specify any argument names here. The c function is one of those cases where we don’t use names. We just type all the numbers, and R just dumps them all in a single variable.↩︎

  2. Note that the square brackets here are used to index the elements of the vector, and that this is the same notation that we see in the R output. That’s not accidental: when R prints [1] "depression" to the screen what it’s saying is that "depression" is the first element of the output. When the output is long enough, you’ll often see other numbers at the start of each line of the output.↩︎

  3. The recycling rule: if two vectors are of unequal length, the values of shorter one will be “recycled”. To get a feel for how this works, try setting x <- c(1,1,1,1,1) and y <- c(2,7) and then getting R to evaluate x + y↩︎