- Vectors

When I introduced variables earlier, I showed you how we can use them to store a single number. In this section, we’ll extend this idea and look at how to store multiple numbers within the one variable. In R the name for a variable that can store multiple values is a **vector**. So let’s create one.

Let’s return to the example we were working with in the previous section on variables. We’re designing a survey, and we want to keep track of the responses that a participant has given. This time, let’s imagine that we’ve finished running the survey and we’re examining the data. Suppose we’ve administered the Depression, Anxiety and Stress Scale (DASS) and as a consequence every participant has scores for on the *depression*, *anxiety* and *stress* scales provided by the DASS. One thing we might want to do is create a single variable called `scale_name`

that identifies the three scales. The simplest way to do this in R is to use the *combine* function, `c`

.^{1} To do so, all we have to do is type the values we want to store in a comma separated list, like this

```
scale_name <- c("depression","anxpiety","stress")
scale_name
```

`## [1] "depression" "anxpiety" "stress"`

To use the correct terminology here, we have a single variable here called `scale_name`

: this variable is a **vector** that has three **elements**. Because the vector contains text, it is a character vector. You can use the `length`

function to check the length, and the `class`

function to check what kind of vector it is:

`length(scale_name)`

`## [1] 3`

`class(scale_name)`

`## [1] "character"`

As you might expect, we can define numeric or logical variables in the same way. For instance, we could define the raw scores on the three DASS scales like so:

```
raw_score <- c(12, 3, 8)
raw_score
```

`## [1] 12 3 8`

We’ll talk about logical vectors in a moment.

If I want to extract the first element from the vector, all I have to do is refer to the relevant numerical index, using square brackets to do so. For example, to get the first element of `scale_name`

I would type this

`scale_name[1]`

`## [1] "depression"`

The second element of the vector is

`scale_name[2]`

`## [1] "anxpiety"`

You get the idea.^{2}

There are a few ways to extract multiple elements of a vector. The first way is to specify a vector that contains the indices of the variables that you want to keep. To extract the first two scale names:

`scale_name[c(1,2)]`

`## [1] "depression" "anxpiety"`

Alternatively, R provides a convenient shorthand notation in which `1:2`

is a vector containing the nubmers from 1 to 2, and similarly `1:10`

is a vector containing the numbers from 1 to 10. So this is also the same:

`scale_name[1:2]`

`## [1] "depression" "anxpiety"`

Notice that order matters here. So if I do this

`scale_name[c(2,1)]`

`## [1] "anxpiety" "depression"`

I get the same numbers, but in the reverse order.

Finally, when working with vectors, R allows us to use negative numbers to indicate which elements to remove. So this is yet another way of doing the same thing:

`scale_name[-3]`

`## [1] "depression" "anxpiety"`

Notice that done of this has changed the original variable. The `scale_name`

itself has remained completely untouched.

`scale_name`

`## [1] "depression" "anxpiety" "stress"`

Sometimes you’ll want to change the values stored in a vector. Imagine my surprise when a student points out that `"anxpiety"`

is not in fact a real thing. I should probably fix that 😳. One possibility would be to assign the whole vector again from the beginning, using `c`

. But that’s a lot of typing. Also, it’s a little wasteful: why should R have to redefine the names for all three scales, when only the second one is wrong? Fortunately, we can tell R to change only the second element, using this trick:

```
scale_name[2] <- "anxiety"
scale_name
```

`## [1] "depression" "anxiety" "stress"`

That’s better.

Another way to edit variables in is to use the `edit`

function. I won’t go into that here, but if you’re curious, try typing a command like this:

`edit(scale_name)`

One very handy thing in R is that it lets you assign meaningul *names* to the different elements in a vector. For example, the `raw_scores`

vector that we introduced earlier contains the actual data from a study but when you print it out on its own

`raw_score`

`## [1] 12 3 8`

its not obvious what each of the scores corresponds to. There are several different ways of making this a little more meaningful (and we’ll talk about them later) but for now I want to show one simple trick. Ideally, what we’d like to do is have R remember that the first element of the `raw_score`

is the “depression” score, the second is “anxiety” and the third is “stress”. We can do that like this:

`names(raw_score) <- scale_name`

This is a bit of an unusual looking assignment statement. Usually, whenever we use `<-`

the thing on the left hand side is the variable itself (i.e., `raw_score`

) but this time around the left hand side refers to the names. To see what this command has done, let’s get R to print out the `raw_score`

variable now:

`raw_score`

```
## depression anxiety stress
## 12 3 8
```

That’s a little nicer. Element names don’t just look nice, they’re functional too. You can refer to the elements of a vector using their names, like so:

`raw_score["anxiety"]`

```
## anxiety
## 3
```

One really nice thing about vectors is that a lot of R functions and operators will work on the whole vector at once. For instance, suppose I want to normalise the raw scores from the DASS. Each scale of the DASS is constructed from 14 questions that are rated on a 0-3 scale, so the minimum possible score is 0 and the maximum is 42. Suppose I wanted to rescale the raw scores to lie on a scale from 0 to 1. I can create the `scaled_score`

variable like this:

```
scaled_score <- raw_score / 42
scaled_score
```

```
## depression anxiety stress
## 0.28571429 0.07142857 0.19047619
```

In other words, when you divide a vector by a single number, all elements in the vector get divided. The same is true for addition, subtraction, multiplicattion and taking powers. So that’s neat.

Suppose it later turned out that I’d made a mistake. I hadn’t in fact administered the complete DASS, only the first page. As noted in the DASS website, it’s possible to fix this mistake (sort of). First, I have to recognise that my scores are actually out of 21 not 42, so the calculation I should have done is this:

```
scaled_score <- raw_score / 21
scaled_score
```

```
## depression anxiety stress
## 0.5714286 0.1428571 0.3809524
```

Then, it turns out that page 1 of the full DASS is *almost* the same as the short form of the DASS, but there’s a correction factor you have to apply. The depression score needs to be multiplied by 1.04645, the anxiety score by 1.02284, and stress by 0.98617

```
correction_factor <- c(1.04645, 1.02284, 0.98617)
corrected_score <- scaled_score * correction_factor
corrected_score
```

```
## depression anxiety stress
## 0.5979714 0.1461200 0.3756838
```

What this has done is multiply the first element of `scaled_score`

by the first element of `correction_factor`

, multiply the second element of `scaled_score`

by the second element of `correction_factor`

, and so on.

I’ll talk more about calculations involving vectors later, because they come up a lot. In particular R has a thing called the *recycling rule* that is worth knowing about.^{3} But that’s enough detail for now.

I mentioned earlier that we can define vectors of logical values in the same way that we can store vectors of numbers and vectors of text, again using the `c`

function to combine multiple values. Logical vectors can be useful as data in their own right, but the thing that they’re expecially useful for is extracting elements of another vector, which is referred to as *logical indexing*.

Here’s a simple example. Suppose I decide that the stress scale is not very useful for my study, and I only want to keep the first two elements, depression and anxiety. One way to do this is to define a logical vector that indicates which values to `keep`

:

```
keep <- c(TRUE, TRUE, FALSE)
keep
```

`## [1] TRUE TRUE FALSE`

In this instance the `keep`

vector indicates that it is `TRUE`

that I want to retain the first two elements, and `FALSE`

that I want to keep the third. So if I type this

`corrected_score[keep]`

```
## depression anxiety
## 0.5979714 0.1461200
```

R prints out the corrected scores for the two variables only. As usual, note that this hasn’t changed the original variable. If I print out the original vector…

`corrected_score`

```
## depression anxiety stress
## 0.5979714 0.1461200 0.3756838
```

… all three values are still there. If I *do* want to create a new variable, I need to explicitly assign the results of my previous command to a variable.

Let’s suppose that I want to call the new variable `short_score`

, indicating that I’ve only retained some of the scales. Here’s how I do that:

```
short_score <- corrected_score[keep]
short_score
```

```
## depression anxiety
## 0.5979714 0.1461200
```

At this point, I hope you can see why logical indexing is such a useful thing. It’s a very basic, yet very powerful way to manipulate data. For intance, I might want to extract the scores of the adult participants in a study, which would probably involve a command like `scores[age > 18]`

. The operation `age > 18`

would return a vector of `TRUE`

and `FALSE`

values, and so the the full command `scores[age > 18]`

would return only the `scores`

for participants with `age > 18`

. It does take practice to become completely comfortable using logical indexing, so it’s a good idea to play around with these sorts of commands. Practice makes perfect, and it’s only by practicing logical indexing that you’ll perfect the art of yelling frustrated insults at your computer.

- Use the combine function
`c`

to create a numeric vector called`age`

that lists the ages of four people (e.g., 19, 34, 7 and 67) - Use the square brackets
`[]`

to print out the`age`

of the second person. - Use the square brackets
`[]`

to print out the`age`

of the second person and third persons - Use the combine function
`c`

to create a character vector called`gender`

that lists the gender of those four people - Create a logical vector
`adult`

that indicates whether each participant was 18 or older. Instead of using`c`

, try using a logical operator like`>`

or`>=`

to automatically create`adult`

from`age`

- Test your logical indexing skills. Print out the
`gender`

of all the`adult`

participants.

The solutions for these exercises are here

Notice that I didn’t specify any argument names here. The

`c`

function is one of those cases where we don’t use names. We just type all the numbers, and R just dumps them all in a single variable.↩Note that the square brackets here are used to index the elements of the vector, and that this is the same notation that we see in the R output. That’s not accidental: when R prints

`[1] "depression"`

to the screen what it’s saying is that`"depression"`

is the first element of the output. When the output is long enough, you’ll often see other numbers at the start of each line of the output.↩The recycling rule: if two vectors are of unequal length, the values of shorter one will be “recycled”. To get a feel for how this works, try setting

`x <- c(1,1,1,1,1)`

and`y <- c(2,7)`

and then getting R to evaluate`x + y`

↩