We already learned that everything in R that exists is an
*object*. You most likely already noted that there are different
types of objects: `2`

, for instance, was a number, but
`assign`

was a function.^{1} As you might have guessed, there are many
more types of objects. To understand the fundamental object types in R
is a very essential prerequisite to master more complicated programming
challenges than those we have encountered so far. Thus, this post is
among those that will introduce you to the most important object types
that you will encounter in R.

These data types are summarized in the following figure:

This post will be about functions. Different types of vectors are covered in the upcoming posts.

Functions are algorithms that apply a certain routine on an
*input*, thereby producing (almost always) an *output.*
The function `log()`

, for instance, takes as input a number
and returns as output another number, namely the logarithm of the
input:

`log(2)`

`## [1] 0.6931472`

**Calling a function**

There are, in principle, four different ways to call a function in R. Only two of them, however, are practically relevant for our purposes.

The by far most important variant is the so called *prefix
form*. Here you first write down the name of the function. Then you
open brackets, write down all the *arguments* of the function,
which you separate by commas, then then you close the brackets. In the
following example, the name of the function is `assign`

, and
its arguments are `"test"`

and `2`

:

`assign("test", 2)`

The second relevant way to call a function is the so called *infix
form*. Here, the name of the function is written *between*
the arguments of the function. This form is less common than the
*prefix form*, but frequently used for mathematical operations,
such as `+`

, `-`

or `/`

.

Strictly speaking, it is only a shortcut, since very function call using the infix form can also be written as a function call in the prefix form:

`2 + 3`

`## [1] 5`

``+`(2,3)`

`## [1] 5`

Both function calls are, in the end, equivalent, but in this context the infix form is clearly easier to read.

**The arguments of a function**

The arguments of a function usually provide the *input* of the
function, and might also specify how the underlying routine should be
executed exactly.

The function `sum`

, for instance, takes as arguments an
arbitrary number of numbers (its ‘input’) and returns the sum of these
numbers:

`sum(1,2,3,4)`

`## [1] 10`

Moreover, `sum()`

also acceps an *optional
argument*, which is called `na.rm`

. This optional
argument can take the value `TRUE`

or `FALSE`

. The
letters `na`

stand for “not available”, and refer to missing
values. If we do not set the optional argument explicitly, it takes its
default value. In this case, the defaults value is `FALSE`

.
You can get this information by calling the help-function:
`help(sum)`

.

Optional arguments are no input in the classical sense, but they
allow you to control the routine that the function applies. In the
present case this means that if `na.rm`

takes the value
`TRUE`

, all missing values in the input to `sum()`

will be ignored:

`sum(1,2,3,4,NA) `

`## [1] NA`

`sum(1,2,3,4,NA, na.rm = TRUE) `

`## [1] 10`

If you want to know what arguments you can give to a function, you
should call the function `help()`

to have a look at the
function documentation. In case of `sum()`

we learn that in
addition to its input, `sum()`

accepts one additional
argument, `na.rm`

, which by default takes the value
`FALSE`

To change optional arguments, we always need to (or better: should)
specify the name of the optional argument. For the standard input this
is not necessary, but sometimes still useful. Information about the
names of the input, as well as the optional arguments, can always be
obtained via the function `help()`

.

**Define your own functions**

Defining functions our own is incredibly useful. We can do this by
using the keyword `function`

. To illustrate how to define our
own functions, we will now define a function that we will call
`pythagoras`

and that takes as arguments the length of the
two catheti or a right triangle, and calculate the length of the
hypotenuse using the Pythagorean
theorem.

```
<- function(cathetus_1, cathetus_2){
pythagoras <- cathetus_1**2 + cathetus_2**2
hypo_squared <- sqrt(hypo_squared) # sqrt() takes square root
hypotenuse return(hypotenuse)
}
```

We always define a new function by using the function
`function()`

. We start our definition by associating the new
function with a name (here: ‘pythagoras’) so that we can use it
later.

The argumtents to `function()`

are then arguments that our
new function should accept. In the present case there are two such
arguments: `cathetus_1`

and `cathetus_2`

. After
that comes the so called ‘function body’. It contains all the routines
that the function should execute when called. The function body is
always enclosed by curly brackets. In the example above we first compute
the sum of the squares of the two catheti, and save this intermediate
result as `hypo_squared`

. This is the part of the Pythagorean
theorem that you might know as \(c^2=a^2 +
b^2\). Since we are interested in the ‘normal’ length of the
hypotenuse, we then use the function `sqrt()`

to get the
square root of `hypo_squared`

. This is also the value that we
wish our function would return to us. To make this explicit, we use the
keyword `return`

to specify the return value of the
function.^{2}

If we now call our function with the correct arguments, the routine above will be executed:

`pythagoras(2, 4)`

`## [1] 4.472136`

Note: we could also call the arguments of our new function explicitly, which can be useful for transparency reasons when you call more complex functions:

`pythagoras(cathetus_1=2, cathetus_2=4) # Also works the other way around`

`## [1] 4.472136`

Note: you can see the source code of a function (mostly) by typing the name of the function withough the brackets:

` pythagoras`

```
## function(cathetus_1, cathetus_2){
## hypo_squared <- cathetus_1**2 + cathetus_2**2
## hypotenuse <- sqrt(hypo_squared) # sqrt() takes square root
## return(hypotenuse)
## }
## <bytecode: 0x7ff756463c78>
```

Note that all object names used within the function body are lost
after the function has been called. The deeper reason is that functions
have their own environment.
Because of this behavior, R produces an error in the following example,
in which `hypo_squared`

exists *only* within the
function body.

```
<- function(cathetus_1, cathetus_2){
pythagoras <- cathetus_1**2 + cathetus_2**2
hypo_squared <- sqrt(hypo_squared) # sqrt() takes square root
hypotenuse return(hypotenuse)
}<- pythagoras(2, 4)
x hypo_squared
```

`## Error in eval(expr, envir, enclos): object 'hypo_squared' not found`

Note that this behavior is intentional: otherwise, there would
quickly be a lot of associations that are hard to keep track of. It is,
however, of utmost importance to always remember this behavior,
otherwise, very confusing errors might emerge, as in the following
example in which you might have expected `hypo_squared`

to
take the value of the length of the hypothenuse squared within your
right triangle:

```
<- 120
hypo_squared <- pythagoras(2, 4)
x hypo_squared
```

`## [1] 120`

It is always a good idea to **document** your own
functions. This is not only (but especially) the case if you want to
share it with others: also, if you want to use your function after a
while has past, you will be extremely grateful to your previous You for
explaining to you how the function works and what arguments it takes.
Or, in other words, nothing is more frustrating than getting back to
your code after a few weeks and being forced to invest many hours to
encypher the code you have written previously.^{3}

While you might document functions using simple comments at the end of each line I strongly recommend to get used to follow these conventions right from the start. Documenting our little function from above would look like this:

```
#' Computes the length of the hypothenuse in a right triangle
#'
#' This function takes the length of the two catheti of a right triangle as
#' arguments and computes the length of the hypothenuse.
#' @param cathetus_1 The length of the first cathetus
#' @param cathetus_2 The length of the second cathetus
#' @return The length of the hypothenuse of the right triangle as defined by
#' `cathetus_1` and `cathetus_2`.
<- function(cathetus_1, cathetus_2){
pythagoras <- cathetus_1**2 + cathetus_2**2
hypo_squared <- sqrt(hypo_squared) # sqrt() takes square root
hypotenuse return(hypotenuse)
}
```

The documentation of a function must come immediately before the
function definition and each line of the documentation starts with
`#'`

. In the first line you provide a title, which must not
be longer than 80 characetrs.

Then, after inserting a blank line you describe what the function
does in a bit more detail. Then, you describe each argument by using the
decorator `@param`

at the beginning of the lines. Finally,
after another blank line, you describe the output of the function after
starting the line with the decorator `@return`

.

Thus, any documentation of a function should at least include the arguments and the kind of output.

**Why you should use functions in the first place**

Defining your own functions is extremely helpful in practice. It is recommended to enclose routines that you use several times into a function. There are several reasons for doing so:

**Code becomes more concise and transparent**It is easier to document code that uses functions because of the documentation conventions introduced above. Moreover, the code becomes shorter and easier to read. As a rule of thumb, after pasting and slightly adjusting some of your code twice, consider turning it into a function.**Functions help to structure your code**Functions summarize, on a higher level of abstraction, your idea of how to solve a certain problem. Because you do not want to think about how to do this every time you encounter the same problem, it is better to summarize your thoughts in one place - the function.**Functions facilitate debugging**Imagine you encounter a mistake in your implementation of a routine in your code. If this routine is used ten times in your code and you did not use functions, you must correct your mistake ten times. If you used a function, you had to correct the mistake only once. Needless to say that functions reduce the likelihood for a mistake by providing you with the incentive to document the code, and because they avoid incidental mistakes that will always happen if you write the same code many times in different places.

While these three reasons should already suffice to convince you of using functions whenever possible, there are even more reasons to use them. Many of them are related to the fundamental programming principle DRY (“Don’t Repeat Yourself”).

In fact, we will learn below that

`2`

is not really a number, but a vector or length 1. Only in a next step,`2`

counts as a ‘number’, or, more precisely as a ‘double’.↩︎Using

`return`

is, strictly speaking, not necessary, but I always use it for the sake of readability and transparency. An interesting debate about whether you should use`return`

or not can be found here.↩︎Or, as the well-known R developer Hadley Wickham puts it: “You are always coorpering with at least one other persion: future-you.”↩︎