I confess I’ve never found a nice home for this chapter. Understanding how to interact with the file system on your computer is something that no-one finds interesting. It’s not complicated, or profound, but it is fiddly and annoying. Since we’ve finished talking about the introduction to programming, and we’re about to start the next section on working with data, I might as well put it here.
Prepare to be bored.
Once upon a time everyone who used computers could safely be assumed to understand how the file system worked, because it was impossible to successfully use a computer if you didn’t! However, modern operating systems are much more user friendly, and as a consequence of this they go to great lengths to hide the file system from users. So these days it’s not at all uncommon for people to have used computers most of their life and not be familiar with the way that computers organise files. If you already know this stuff, skip straight to the next section. Otherwise, read on. I’ll try to give a brief introduction that will be useful for those of you who have never been forced to learn how to navigate around a computer using a DOS or Unix shell.
In this section I describe the basic idea behind file locations and file paths. Regardless of whether you’re using Window, Mac OS or Linux, every file on the computer is assigned a (fairly) human readable address, and every address has the same basic structure: it describes a path that starts from a root location, through as series of folders (or if you’re an old-school computer user, directories), and finally ends up at the file.
On a Windows computer the root is the physical drive (well, partition technically) on which the file is stored, and for most home computers the name of the hard drive that stores all your files is C:
and therefore most file names on Windows begin with C:
. After that comes the folders, and on Windows the folder names are separated by a \
symbol. So, the complete path to the Learning Statistics with R book on my Windows computer might be something like this:
C:\Users\dan\Rbook\LSR.pdf
and what that means is that the book is called LSR.pdf
, and it’s in a folder called Rbook
which itself is in a folder called dan
which itself is … well, you get the idea. On Linux, Unix and Mac OS systems, the addresses look a little different, but they’re more or less identical in spirit. Instead of using the backslash, folders are separated using a forward slash, and unlike Windows, they don’t treat the physical drive as being the root of the file system. So, the path to the LSR book on my Mac might be something like this:
/Users/dan/Rbook/LSR.pdf
So that’s what we mean by the “path” to a file.
The next concept to grasp is the idea of a working directory and how to change it. For those of you who have used command line interfaces previously, this should be obvious already. But if not, here’s what I mean. The working directory is just “whatever folder I’m currently looking at”. Suppose that I’m currently looking for files in Explorer (if you’re using Windows) or using Finder (on a Mac). The folder I currently have open is my user directory (i.e., C:\Users\dan
or /Users/dan
). That’s my current working directory.
The fact that we can imagine that the program is “in” a particular directory means that we can talk about moving from our current location to a new one. What that means is that we might want to specify a new location in relation to our current location. To do so, we need to introduce two new conventions. Regardless of what operating system you’re using, we use .
to refer to the current working directory, and ..
to refer to the directory above it (the parent directory). This allows us to specify a path to a new location in relation to our current location, as the following examples illustrate. Let’s assume that I’m using my Windows computer, and my working directory is C:\Users\dan\Rbook
). The table below shows several addresses in relation to my current one:
The fact that we can imagine that the program is “in” a particular directory means that we can talk about moving from our current location to a new one. What that means is that we might want to specify a new location in relation to our current location. To do so, we need to introduce two new conventions. Regardless of what operating system you’re using, we use .
to refer to the current working directory, and ..
to refer to the directory above it. This allows us to specify a path to a new location in relation to our current location, as the following examples illustrate. Let’s assume that I’m using my Windows computer, and my working directory is C:\Users\dan\Rbook
). The table below shows several addresses in relation to my current one:
absolute path | relative path |
---|---|
(i.e., from root) | (i.e. from C:\Users\dan\Rbook ) |
C:\Users\dan |
.. |
C:\Users |
..\.. |
C:\Users\dan\Rbook\source |
.\source |
C:\Users\dan\nerdstuff |
..\nerdstuff |
There’s one last thing I want to call attention to: the ~
directory. I normally wouldn’t bother, but R makes reference to this concept sometimes. It’s quite common on computers that have multiple users to define ~
to be the user’s home directory. On my Mac, for instance, the home directory ~
for the “dan” user is \Users\dan\
.1 And so, not surprisingly, it is possible to define other directories in terms of their relationship to the home directory. For example, an alternative way to describe the location of the LSR.pdf
file on my Mac would be
~\Rbook\LSR.pdf
That’s about all you really need to know about file paths. And since this section already feels too long, it’s time to look at how to navigate the file system in R.
Let’s suppose I’m on Windows. As before, I can find out what my current working directory is like this:
getwd()
[1] "C:/Users/dan/
This seems about right, but you might be wondering why R is displaying a Windows path using the wrong type of slash. The answer is slightly complicated, and has to do with the fact that R treats the \
character as “special” (I’ll talk about this later when introducing text manipulation). If you’re deeply wedded to the idea of specifying a path using the Windows style slashes, then you need to type \\
whenever you mean \
. In other words, if you want to specify the working directory on a Windows computer, you need to use one of the following commands:
setwd( "C:/Users/dan" )
setwd( "C:\\Users\\dan" )
Annoying.
Okay, you might be asking, what if I’m writing code and I don’t know what machine it will be running on? How do I specify a path that doesn’t require me to know ahead of time what the operating system on that machine uses? I’m so glad you asked. There’s a function called file.path
that lets you do exactly that:
file.path("Users","dan","Rbook","LSR.pdf")
## [1] "Users/dan/Rbook/LSR.pdf"
The file.path
function works out how to construct the path it needs by inspecting the .Platform
variable (try typing that at the console if you want to see what information it stores) that the local R system uses to keep track of information about the operating system. If you use file.path
to specify locations, then you don’t have to worry about particulars of the operating system because R will do that for you.
Now, what you might be thinking is that this only half solves the problem. What if a user downloads all your files to some place on their machine and you don’t know where it has ended up. It’s a bit beyond the scope of this resource to talk about solutions to that problem but I’ll quickly mention the here R package that I find really useful. I wrote a blog post about it here, and at some point I’ll probably fold some of that content into these notes.
You might notice that my computer is the only person still allowed to deadname me 😀 – the user home directory seems to be tangled with so many things on a computer that I’m afraid to rename this. Not that it bothers me - I think “Dan” is a perfectly sensible nickname for “Danielle” and I’m pretty sure my computer isn’t trying to be mean to me! Well, not about this anyway.↩︎