1 Introduction

This post is about how you set up an adequate project environment. By this I mean the folders you should create, and how you should save your files. The structure introduced here will help you to keep your project structured and to keep an overview about your work, but also to make it easier to share your project with others.

In all, whenever you start a new programming project you should set up the infrastructure described below. Such project could be a term paper, a research endeavor, or just the code to create some visualizations. Later you might find that some aspects of the infrastructure below feel like a bit of an overkill, especially for very small undertakings. But especially in the beginning its better to be save than sorry and to set up the whole project as described below.

In all, setting up a good working environment includes the following steps:

  1. Find a good place for the project on your computer.
  2. Create a directory with an R project
  3. Create the relevant sub-directories

Then you should always familiarize yourself with how to use the here-package with your project.

There are some additional steps one might to take, such as initiating a Git repository or setting up a renv environment . Moreover, for larger projects you might also want to add a README.md. But for now the steps mentioned above are sufficient. But before going through them one by one, we need to clarify two important technical concepts:

  1. the concept of a working directory and
  2. the distinction between absolute and relative paths

2 Preface: Working directory and paths

The working directory is a folder on the computer which R uses as an default anchor for all file paths used to accesses input, such as data sets, or to store output. The default working directory the user directory, but it can be changed. We can display the current working directory using the getwd() function. In my case the working directory looks like this:

/Users/graebnerc/Teaching/DataScience22/

Now assume we produced a plot in our current session and want to save it using the function ggplot2::ggsave(). As we already learned, this function takes, among others, an argument filename that specifies the name of the file that is meant to contain the plot. Now if I were to tell R to save the plot under the name test.pdf like this:

ggplot2::ggsave(filename = "test.pdf")

R would save it in the following location:

/Users/graebnerc/Teaching/DataScience22/test.pdf

As you can see R uses the current working directory as an ‘anchor’, and all paths provided are relative to this anchor. This means that, assuming that in our working directory exists a folder called output, we could save our file test.pdf in this folder by making the following function call:

ggplot2::ggsave(filename = "output/test.pdf")

Viewed upon from a global perspective the file is saved here:

/Users/graebnerc/Teaching/DataScience22/output/test.pdf

Since the path provided is relative to the working directory, we call paths such as those we would have passed to ggsave() above as relative paths.

Alternatively, we could also provide R directly with the absolute path. In this case, we would need to type the complete path, starting from the root directory of the computer. Rather then assuming the absolute path implicitly as above, we would need to do the following call:

ggplot2::ggsave(
  filename = "/Users/graebnerc/Teaching/DataScience22/output/test.pdf"
  )

When we use absolute paths, we can save a file at any position on the computer we want. For instance, we do the following

ggplot2::ggsave(
  filename = "/Users/graebnerc/GreatPlots/test.pdf"
  )

to save the file here:

/Users/graebnerc/GreatPlots/test.pdf

While it seems to be attractive to use absolute paths because of their expressive power, i.e. the possibility to save files anywhere we want, I can only advice against using them. In fact, absolute paths are something that you might use in your console when you want to save a file quickly during a private programming session. But you should never use absolute paths in scripts.

A central argument in favor of relative paths is that code using relative paths can function when executed on different computers. Absolute paths look different on every computer, so they will always produce errors when being transferred across computers. Have a look at the following path from above:

/Users/graebnerc/Teaching/DataScience22/output/test.pdf

I hope you agree that it is highly unlikely that a path involving my account name exists on your computer. Thus, if I sent you a script that contains a reference to this path, it will produce an error once you execute it. Thus, we will always use relative paths below.

Of course, one problem is that the ‘anchor’ from which the relative path will be evaulated on my and your computer must somehow be harmonized. As we will learn below, this can be achieved through the use of R project files and the package here.

3 Step 1: Find a good place for the project on your computer

First of all you have to decide on a place on your computer in which all data related to your project, i.e. data, scripts, images, etc., should be saved. It is usually a good idea to avoid places such as the Desktop or your Download folder.

4 Step 2: Create an R-project

After having identified the right place for our project on our computer, we will now create an R-project at exactly this place. To this place, open R-Studio, and either click on File/New Project, or on the blue botton in the upper left part of the pane, directly to the right of the New File button. You should now see the ‘New Project Wizard’:

We click on New Directory1 and then on New Project. Then we should see the following: