Week 2

Revision from last week

In week 1 we talked about getting started in R, the role it can play for psychology, and made our first attempt to learn how to use the language. We went through these slides and sections, and the “homework” exercise was to try to make sure we all have a basic understanding of these sections:

So the place we’ll start this week is with “revision” (which in my experience is a terrible name to describe something important… which isn’t just about revisiting something you already know, but also a mechanism for talking about stuff that didn’t make sense the first time). Here are a few exercises that I’d like you to try:

Exercise 1

Write a script that does the following

Calculate the number of seconds in a year, stores to variable & prints it
Calculate the number of hours (approx) since the 0AD, stores to variable & prints it
Which number is bigger? Use logical operations test this and print the answer on screen
Add some comments to your script so make it easier to

Save your script to a file like week2_ex1.R

Exercise 2

Write a new script called week2_ex2.R (or whatever) that does the following

Stores the names of your family members as a character vector called names
Stores the ages of the family members as a numeric vector called ages
Use logical indexing to print your age… age[names == “dani”]

Exercise 3

Make sure that the TurtleGraphics package is installed on your machine, by typing library(TurtleGraphics) at the console. If it works, great! Move on to Exercise 4. If it does not, here are the commands we need

install.packages("devtools")
library(devtools)
install_github("djnavarro/TurtleGraphics")
library(TurtleGraphics)

Exercise 4

Create a new script called week2_turtle.R. It should do this:

line 1: load the TurtleGraphics package
line 2: initialise the turtle using turtle_init()
line 3: move the turtle forward a distance of 5 units

A question you should consider: why did I ask you to include line 1, given that you’ve already done exercise 3???

Programming concepts

The main goal this week is to cover some key R concepts and programming ideas. We’ll go through these sections in turn, each of which ends with some exercises.

At the end of this, we reach the point that the turtle draws a pretty picture. Your main exercise here is to try to modify what picture it draws!

Getting started with data!

I’m not sure how this will all work out time-wise, but if we do have time we’ll follow this up by getting started on the “working with data” section of the notes. We’ll start at the prelude, and talk briefly about some of the data types.

Here are some additional exercises:

library(tidyverse)

Reading a data frame from an online CSV file. We’re taking a tidyverse approach so strictly speaking we have a tibble rather than a pure data frame!

books <- read_csv(file = "http://psyr.djnavarro.net/data/booksales.csv")

## 
## ── Column specification ────────────────────────────────────────
## cols(
##   Month = col_character(),
##   Days = col_double(),
##   Sales = col_double(),
##   Stock.Levels = col_character()
## )

class(books)

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

head(books)

## # A tibble: 6 x 4
##   Month     Days Sales Stock.Levels
##   <chr>    <dbl> <dbl> <chr>       
## 1 January     31     0 high        
## 2 February    28   100 high        
## 3 March       31   200 low         
## 4 April       30    50 out         
## 5 May         31     0 out         
## 6 June        30     0 high

“Inside” a data frame are just regular vectors:

print(books$Sales)

##  [1]   0 100 200  50   0   0   0   0   0   0   0   0

To see how data frames are just regular vectors bound together, create one:

names <- c("Granny","Nanny","Magrat")
ages <- c(70, 70, 30)

family <- tibble(names, ages)
print(family)

## # A tibble: 3 x 2
##   names   ages
##   <chr>  <dbl>
## 1 Granny    70
## 2 Nanny     70
## 3 Magrat    30

Extension: Using these skills to “check” data sets

A data manipulation exercise… data from multiple people, but might be missing cases! First read one data set to take a look:

subj1 <- read_csv(file = "http://psyr.djnavarro.net/data/subj1.csv")

## 
## ── Column specification ────────────────────────────────────────
## cols(
##   response = col_double(),
##   word = col_character()
## )

print(subj1)

## # A tibble: 10 x 2
##    response word 
##       <dbl> <chr>
##  1        1 blah 
##  2        2 blah 
##  3        3 blah 
##  4        4 blah 
##  5        5 blah 
##  6        6 blah 
##  7        7 blah 
##  8        8 blah 
##  9        9 blah 
## 10       10 blah

Next, define a function to “check” if a data frame has the correct number of cases

check_file <- function(dataset) {
  n_cases <- dim(dataset)[1] # number of cases in the data frame
  is_okay <- n_cases == 10 # file is okay if it has 10 observations
  return(is_okay)
}

Create a vector listing the files we want to check

file_list <- c(
  "http://psyr.djnavarro.net/data/subj1.csv",
  "http://psyr.djnavarro.net/data/subj2.csv",
  "http://psyr.djnavarro.net/data/subj3.csv",
  "http://psyr.djnavarro.net/data/subj4.csv"
)

Write a loop that checks the functions one at a time

for(file in file_list) {
  dat <- read_csv(file)
  is_okay <- check_file(dat)
  if( !is_okay ) {
    print(file)
  }
}

## [1] "http://psyr.djnavarro.net/data/subj3.csv"

Make sure all three code fragments are in a single file and run it!

Further extension

Eek what if there are 400 files! (there actually are!). I refuse to type all that into a long list, use text manipulation

file_list <- paste0("http://psyr.djnavarro.net/data/subj", 1:20, ".csv")

for(file in file_list) {
  dat <- read_csv(file)
  is_okay <- check_file(dat)
  if( !is_okay ) {
    print(file)
  } else{ 
    print("ok")}
}